Highlighting nonlinear patterns in population genetics datasets

Handle URI:
http://hdl.handle.net/10754/344117
Title:
Highlighting nonlinear patterns in population genetics datasets
Authors:
Alanis Lobato, Gregorio ( 0000-0001-9339-4229 ) ; Cannistraci, Carlo Vittorio; Eriksson, Anders ( 0000-0003-3436-3726 ) ; Manica, Andrea; Ravasi, Timothy ( 0000-0002-9950-465X )
Abstract:
Detecting structure in population genetics and case-control studies is important, as it exposes phenomena such as ecoclines, admixture and stratification. Principal Component Analysis (PCA) is a linear dimension-reduction technique commonly used for this purpose, but it struggles to reveal complex, nonlinear data patterns. In this paper we introduce non-centred Minimum Curvilinear Embedding (ncMCE), a nonlinear method to overcome this problem. Our analyses show that ncMCE can separate individuals into ethnic groups in cases in which PCA fails to reveal any clear structure. This increased discrimination power arises from ncMCE's ability to better capture the phylogenetic signal in the samples, whereas PCA better reflects their geographic relation. We also demonstrate how ncMCE can discover interesting patterns, even when the data has been poorly pre-processed. The juxtaposition of PCA and ncMCE visualisations provides a new standard of analysis with utility for discovering and validating significant linear/nonlinear complementary patterns in genetic data.
KAUST Department:
Integrative Systems Biology Lab; Biological and Environmental Sciences and Engineering (BESE) Division; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division; Computational Bioscience Research Center (CBRC)
Citation:
Alanis-Lobato, G., Cannistraci, C. V., Eriksson, A., Manica, A., & Ravasi, T. (2015). Highlighting nonlinear patterns in population genetics datasets. Sci. Rep., 5. doi: 10.1038/srep08140
Publisher:
Nature Publishing Group
Journal:
Scientific Reports
Issue Date:
30-Jan-2015
DOI:
10.1038/srep08140
Type:
Article
ISSN:
2045-2322
Additional Links:
http://www.nature.com/doifinder/10.1038/srep08140
Appears in Collections:
Articles; Computational Bioscience Research Center (CBRC); Biological and Environmental Sciences and Engineering (BESE) Division; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorAlanis Lobato, Gregorioen
dc.contributor.authorCannistraci, Carlo Vittorioen
dc.contributor.authorEriksson, Andersen
dc.contributor.authorManica, Andreaen
dc.contributor.authorRavasi, Timothyen
dc.date.accessioned2015-02-04T06:15:33Z-
dc.date.available2015-02-04T06:15:33Z-
dc.date.issued2015-01-30en
dc.identifier.citationAlanis-Lobato, G., Cannistraci, C. V., Eriksson, A., Manica, A., & Ravasi, T. (2015). Highlighting nonlinear patterns in population genetics datasets. Sci. Rep., 5. doi: 10.1038/srep08140en
dc.identifier.issn2045-2322en
dc.identifier.doi10.1038/srep08140en
dc.identifier.urihttp://hdl.handle.net/10754/344117en
dc.description.abstractDetecting structure in population genetics and case-control studies is important, as it exposes phenomena such as ecoclines, admixture and stratification. Principal Component Analysis (PCA) is a linear dimension-reduction technique commonly used for this purpose, but it struggles to reveal complex, nonlinear data patterns. In this paper we introduce non-centred Minimum Curvilinear Embedding (ncMCE), a nonlinear method to overcome this problem. Our analyses show that ncMCE can separate individuals into ethnic groups in cases in which PCA fails to reveal any clear structure. This increased discrimination power arises from ncMCE's ability to better capture the phylogenetic signal in the samples, whereas PCA better reflects their geographic relation. We also demonstrate how ncMCE can discover interesting patterns, even when the data has been poorly pre-processed. The juxtaposition of PCA and ncMCE visualisations provides a new standard of analysis with utility for discovering and validating significant linear/nonlinear complementary patterns in genetic data.en
dc.language.isoenen
dc.publisherNature Publishing Groupen
dc.relation.urlhttp://www.nature.com/doifinder/10.1038/srep08140en
dc.rightsThis work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder in order to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/en
dc.subjectMachine learningen
dc.subjectPopulation geneticsen
dc.titleHighlighting nonlinear patterns in population genetics datasetsen
dc.typeArticleen
dc.contributor.departmentIntegrative Systems Biology Laben
dc.contributor.departmentBiological and Environmental Sciences and Engineering (BESE) Divisionen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.contributor.departmentComputational Bioscience Research Center (CBRC)en
dc.identifier.journalScientific Reportsen
dc.eprint.versionPublisher's Version/PDFen
dc.contributor.institutionDivision of Medical Genetics, Department of Medicine, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093 USAen
dc.contributor.institutionBiomedical Cybernetics Group, Biotechnology Center (BIOTEC), Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germanyen
dc.contributor.institutionDepartment of Zoology, University of Cambridge, Cambridge CB2 3EJ, Englanden
dc.contributor.affiliationKing Abdullah University of Science and Technology (KAUST)en
kaust.authorEriksson, Andersen
kaust.authorRavasi, Timothyen
kaust.authorAlanis Lobato, Gregorioen
This item is licensed under a Creative Commons License
Creative Commons
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.