Impact of data preprocessing on cell-type clustering based on single-cell RNA-seq data
Type
DatasetAuthors
Wang, ChunxiangGao, Xin

Liu, Juntao
KAUST Department
Computational Bioscience Research Center (CBRC)Computer Science Program
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Structural and Functional Bioinformatics Group
Date
2020Permanent link to this record
http://hdl.handle.net/10754/665917
Metadata
Show full item recordAbstract
Abstract Background Advances in single-cell RNA-seq technology have led to great opportunities for the quantitative characterization of cell types, and many clustering algorithms have been developed based on single-cell gene expression. However, we found that different data preprocessing methods show quite different effects on clustering algorithms. Moreover, there is no specific preprocessing method that is applicable to all clustering algorithms, and even for the same clustering algorithm, the best preprocessing method depends on the input data. Results We designed a graph-based algorithm, SC3-e, specifically for discriminating the best data preprocessing method for SC3, which is currently the most widely used clustering algorithm for single cell clustering. When tested on eight frequently used single-cell RNA-seq data sets, SC3-e always accurately selects the best data preprocessing method for SC3 and therefore greatly enhances the clustering performance of SC3. Conclusion The SC3-e algorithm is practically powerful for discriminating the best data preprocessing method, and therefore largely enhances the performance of cell-type clustering of SC3. It is expected to play a crucial role in the related studies of single-cell clustering, such as the studies of human complex diseases and discoveries of new cell types.Citation
Chunxiang Wang, Gao, X., & Juntao Liu. (2020). Impact of data preprocessing on cell-type clustering based on single-cell RNA-seq data. figshare. https://doi.org/10.6084/M9.FIGSHARE.C.5145646Publisher
figshareRelations
Is Supplement To:- [Article]
Wang, C., Gao, X., & Liu, J. (2020). Impact of data preprocessing on cell-type clustering based on single-cell RNA-seq data. BMC Bioinformatics, 21(1). doi:10.1186/s12859-020-03797-8. DOI: 10.1186/s12859-020-03797-8 HANDLE: 10754/665618
ae974a485f413a2113503eed53cd6c53
10.6084/m9.figshare.c.5145646
Scopus Count
Related items
Showing items related by title, author, creator and subject.
-
What is the right sequencing approach? Solo VS extended family analysis in consanguineous populationsAlfares, Ahmed; Alsubaie, Lamia; Aloraini, Taghrid; Alaskar, Aljoharah; Althagafi, Azza Th.; Alahmad, Ahmed; Rashid, Mamoon; Alswaid, Abdulrahman; Alothaim, Ali; Eyaid, Wafaa; Ababneh, Faroug; Albalwi, Mohammed; Alotaibi, Raniah; Almutairi, Mashael; Altharawi, Nouf; Alsamer, Alhanouf; Abdelhakim, Marwa; Kafkas, Senay; Mineta, Katsuhiko; Cheung, Nicole; Abdallah, Abdallah; Büchmann-Møller, Stine; Fukasawa, Yoshinori; Zhao, Xiang; Rajan, Issaac; Hoehndorf, Robert; Al Mutairi, Fuad; Gojobori, Takashi; Alfadhel, Majid (figshare, 2020) [Dataset]Abstract Background Testing strategies is crucial for genetics clinics and testing laboratories. In this study, we tried to compare the hit rate between solo and trio and trio plus testing and between trio and sibship testing. Finally, we studied the impact of extended family analysis, mainly in complex and unsolved cases. Methods Three cohorts were used for this analysis: one cohort to assess the hit rate between solo, trio and trio plus testing, another cohort to examine the impact of the testing strategy of sibship genome vs trio-based analysis, and a third cohort to test the impact of an extended family analysis of up to eight family members to lower the number of candidate variants. Results The hit rates in solo, trio and trio plus testing were 39, 40, and 41%, respectively. The total number of candidate variants in the sibship testing strategy was 117 variants compared to 59 variants in the trio-based analysis. We noticed that the average number of coding candidate variants in trio-based analysis was 1192 variants and 26,454 noncoding variants, and this number was lowered by 50–75% after adding additional family members, with up to two coding and 66 noncoding homozygous variants only, in families with eight family members. Conclusion There was no difference in the hit rate between solo and extended family members. Trio-based analysis was a better approach than sibship testing, even in a consanguineous population. Finally, each additional family member helped to narrow down the number of variants by 50–75%. Our findings could help clinicians, researchers and testing laboratories select the most cost-effective and appropriate sequencing approach for their patients. Furthermore, using extended family analysis is a very useful tool for complex cases with novel genes.
-
Coral microbiome composition along the northern Red Sea suggests high plasticity of bacterial and specificity of endosymbiotic dinoflagellate communitiesOsman, Eslam O.; Suggett, David J.; Voolstra, Christian R.; Pettay, D. Tye; Clark, Dave R.; Pogoreutz, Claudia; Sampayo, Eugenia M.; Warner, Mark E.; Smith, David J. (figshare, 2020) [Dataset]Abstract Background The capacity of reef-building corals to tolerate (or adapt to) heat stress is a key factor determining their resilience to future climate change. Changes in coral microbiome composition (particularly for microalgal endosymbionts and bacteria) is a potential mechanism that may assist corals to thrive in warm waters. The northern Red Sea experiences extreme temperatures anomalies, yet corals in this area rarely bleach suggesting possible refugia to climate change. However, the coral microbiome composition, and how it relates to the capacity to thrive in warm waters in this region, is entirely unknown. Results We investigated microbiomes for six coral species (Porites nodifera, Favia favus, Pocillopora damicornis, Seriatopora hystrix, Xenia umbellata, and Sarcophyton trocheliophorum) from five sites in the northern Red Sea spanning 4° of latitude and summer mean temperature ranges from 26.6 °C to 29.3 °C. A total of 19 distinct dinoflagellate endosymbionts were identified as belonging to three genera in the family Symbiodiniaceae (Symbiodinium, Cladocopium, and Durusdinium). Of these, 86% belonged to the genus Cladocopium, with notably five novel types (19%). The endosymbiont community showed a high degree of host-specificity despite the latitudinal gradient. In contrast, the diversity and composition of bacterial communities of the surface mucus layer (SML)—a compartment particularly sensitive to environmental change—varied significantly between sites, however for any given coral was species-specific. Conclusion The conserved endosymbiotic community suggests high physiological plasticity to support holobiont productivity across the different latitudinal regimes. Further, the presence of five novel algal endosymbionts suggests selection of certain genotypes (or genetic adaptation) within the semi-isolated Red Sea. In contrast, the dynamic composition of bacteria associated with the SML across sites may contribute to holobiont function and broaden the ecological niche. In doing so, SML bacterial communities may aid holobiont local acclimatization (or adaptation) by readily responding to changes in the host environment. Our study provides novel insight about the selective and endemic nature of coral microbiomes along the northern Red Sea refugia.
-
Additional file 2: of Silica diatom shells tailored with Au nanoparticles enable sensitive analysis of molecules for biological, safety and environment applicationsOnesto, V.; Villani, M.; Coluccio, M. L.; Majewska, R.; Alabastri, A.; Battista, E.; Schirato, A.; Calestani, D.; Coppedé, N.; Cesarelli, M.; Amato, F.; Di Fabrizio, Enzo M.; Gentile, F. (figshare, 2018) [Data File]Notes on the diatomaceous earth used in this study. (DOCX 18Â kb)