scBKAP: a clustering model for single-cell RNA-seq data based on bisecting K-means

Type
Article

Authors
Wang, Xiaolin
Gao, Hongli
Qi, Ren
Zheng, Ruiqing
Gao, Xin
Yu, Bin

KAUST Department
Computer Science Program
Computational Bioscience Research Center (CBRC)
Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division

KAUST Grant Number
FCC/1/1976-17
FCC/1/1976-23
FCC/1/1976-26
REI/1/0018-01-01
REI/1/4473-01-01
URF/1/3412-01
URF/1/3450-01
URF/1/4098-01-01

Online Publication Date
2022-12-19

Print Publication Date
2023-05-01

Date
2022-12-19

Abstract
Advances in single-cell RNA sequencing (scRNA-seq) technologies allow researchers to analyze the genome-wide transcription profile and to solve biological problems at the individual-cell resolution. However, existing clustering methods on scRNA-seq suffer from high dropout rate and curse of dimensionality in the data. Here, we propose a novel pipeline, scBKAP, the cornerstone of which is a single-cell bisecting K-means clustering method based on an autoencoder network and a dimensionality reduction model MPDR. Specially, scBKAP utilizes an autoencoder network to reconstruct gene expression values from scRNA-seq data to alleviate the dropout issue, and the MPDR model composed of the M3Drop feature selection algorithm and the PHATE dimensionality reduction algorithm to reduce the dimensions of reconstructed data. The dimensionality-reduced data are then fed into the bisecting K-means clustering algorithm to identify the clusters of cells. Comprehensive experiments demonstrate scBKAP's superior performance over nine state-of-the-art single-cell clustering methods on 21 public scRNA-seq datasets and simulated datasets.

Citation
Wang, X., Gao, H., Qi, R., Zheng, R., Gao, X., & Yu, B. (2022). scBKAP: a clustering model for single-cell RNA-seq data based on bisecting K-means. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1–10. https://doi.org/10.1109/tcbb.2022.3230098

Acknowledgements
We thank anonymous reviewers for valuable suggestions and comments. This work was supported by the National Natural Science Foundation of China (No. 62172248), the Natural Science Foundation of Shandong Province of China (No. ZR2021MF098), and the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. FCC/1/1976-17, FCC/1/1976-23, FCC/1/1976-26, URF/1/3450-01, URF/1/3412-01, URF/1/4098-01-01, REI/1/0018-01-01, and REI/1/4473-01-01.

Publisher
IEEE

Journal
IEEE/ACM Transactions on Computational Biology and Bioinformatics

DOI
10.1109/TCBB.2022.3230098

Additional Links
https://ieeexplore.ieee.org/document/9991252/
https://ieeexplore.ieee.org/document/9991252/
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9991252

Relations
Is Supplemented By:

Permanent link to this record