Automatic Peak Selection by a Benjamini-Hochberg-Based Algorithm

Handle URI:
http://hdl.handle.net/10754/325309
Title:
Automatic Peak Selection by a Benjamini-Hochberg-Based Algorithm
Authors:
Abbas, Ahmed; Kong, Xin-Bing; Liu, Zhi; Jing, Bing-Yi; Gao, Xin ( 0000-0002-7108-3574 )
Abstract:
A common issue in bioinformatics is that computational methods often generate a large number of predictions sorted according to certain confidence scores. A key problem is then determining how many predictions must be selected to include most of the true predictions while maintaining reasonably high precision. In nuclear magnetic resonance (NMR)-based protein structure determination, for instance, computational peak picking methods are becoming more and more common, although expert-knowledge remains the method of choice to determine how many peaks among thousands of candidate peaks should be taken into consideration to capture the true peaks. Here, we propose a Benjamini-Hochberg (B-H)-based approach that automatically selects the number of peaks. We formulate the peak selection problem as a multiple testing problem. Given a candidate peak list sorted by either volumes or intensities, we first convert the peaks into p-values and then apply the B-H-based algorithm to automatically select the number of peaks. The proposed approach is tested on the state-of-the-art peak picking methods, including WaVPeak [1] and PICKY [2]. Compared with the traditional fixed number-based approach, our approach returns significantly more true peaks. For instance, by combining WaVPeak or PICKY with the proposed method, the missing peak rates are on average reduced by 20% and 26%, respectively, in a benchmark set of 32 spectra extracted from eight proteins. The consensus of the B-H-selected peaks from both WaVPeak and PICKY achieves 88% recall and 83% precision, which significantly outperforms each individual method and the consensus method without using the B-H algorithm. The proposed method can be used as a standard procedure for any peak picking method and straightforwardly applied to some other prediction selection problems in bioinformatics. The source code, documentation and example data of the proposed method is available at http://sfb.kaust.edu.sa/pages/software.aspx. © 2013 Abbas et al.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Citation:
Abbas A, Kong X-B, Liu Z, Jing B-Y, Gao X (2013) Automatic Peak Selection by a Benjamini-Hochberg-Based Algorithm. PLoS ONE 8: e53112. doi:10.1371/journal.pone.0053112.
Publisher:
Public Library of Science (PLoS)
Journal:
PLoS ONE
Issue Date:
7-Jan-2013
DOI:
10.1371/journal.pone.0053112
PubMed ID:
23308147
PubMed Central ID:
PMC3538655
Type:
Article
ISSN:
19326203
Appears in Collections:
Articles; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorAbbas, Ahmeden
dc.contributor.authorKong, Xin-Bingen
dc.contributor.authorLiu, Zhien
dc.contributor.authorJing, Bing-Yien
dc.contributor.authorGao, Xinen
dc.date.accessioned2014-08-27T09:46:17Zen
dc.date.available2014-08-27T09:46:17Zen
dc.date.issued2013-01-07en
dc.identifier.citationAbbas A, Kong X-B, Liu Z, Jing B-Y, Gao X (2013) Automatic Peak Selection by a Benjamini-Hochberg-Based Algorithm. PLoS ONE 8: e53112. doi:10.1371/journal.pone.0053112.en
dc.identifier.issn19326203en
dc.identifier.pmid23308147en
dc.identifier.doi10.1371/journal.pone.0053112en
dc.identifier.urihttp://hdl.handle.net/10754/325309en
dc.description.abstractA common issue in bioinformatics is that computational methods often generate a large number of predictions sorted according to certain confidence scores. A key problem is then determining how many predictions must be selected to include most of the true predictions while maintaining reasonably high precision. In nuclear magnetic resonance (NMR)-based protein structure determination, for instance, computational peak picking methods are becoming more and more common, although expert-knowledge remains the method of choice to determine how many peaks among thousands of candidate peaks should be taken into consideration to capture the true peaks. Here, we propose a Benjamini-Hochberg (B-H)-based approach that automatically selects the number of peaks. We formulate the peak selection problem as a multiple testing problem. Given a candidate peak list sorted by either volumes or intensities, we first convert the peaks into p-values and then apply the B-H-based algorithm to automatically select the number of peaks. The proposed approach is tested on the state-of-the-art peak picking methods, including WaVPeak [1] and PICKY [2]. Compared with the traditional fixed number-based approach, our approach returns significantly more true peaks. For instance, by combining WaVPeak or PICKY with the proposed method, the missing peak rates are on average reduced by 20% and 26%, respectively, in a benchmark set of 32 spectra extracted from eight proteins. The consensus of the B-H-selected peaks from both WaVPeak and PICKY achieves 88% recall and 83% precision, which significantly outperforms each individual method and the consensus method without using the B-H algorithm. The proposed method can be used as a standard procedure for any peak picking method and straightforwardly applied to some other prediction selection problems in bioinformatics. The source code, documentation and example data of the proposed method is available at http://sfb.kaust.edu.sa/pages/software.aspx. © 2013 Abbas et al.en
dc.language.isoenen
dc.publisherPublic Library of Science (PLoS)en
dc.rightsThis is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.en
dc.rightsArchived with thanks to PLoS ONEen
dc.subjectproteinen
dc.subjectalgorithmen
dc.subjectautomatic peak selectionen
dc.subjectBenjamini Hochberg based algorithmen
dc.subjectbioinformaticsen
dc.subjectmathematical analysisen
dc.subjectmathematical computingen
dc.subjectnuclear magnetic resonanceen
dc.subjectpredictionen
dc.subjectprotein structureen
dc.subjectscoring systemen
dc.subjectAlgorithmsen
dc.subjectComputational Biologyen
dc.subjectNuclear Magnetic Resonance, Biomolecularen
dc.subjectProteinsen
dc.subjectSoftwareen
dc.titleAutomatic Peak Selection by a Benjamini-Hochberg-Based Algorithmen
dc.typeArticleen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.identifier.journalPLoS ONEen
dc.identifier.pmcidPMC3538655en
dc.eprint.versionPublisher's Version/PDFen
dc.contributor.institutionDepartment of Statistics, Fudan University, Shanghai, Chinaen
dc.contributor.institutionDepartment of Mathematics, Faculty of Science and Technology, University of Macau, Taipa, Macauen
dc.contributor.institutionDepartment of Mathematics, Hong Kong University of Science and Technology, Kowloon, Hong Kongen
dc.contributor.affiliationKing Abdullah University of Science and Technology (KAUST)en
kaust.authorAbbas, Ahmeden
kaust.authorGao, Xinen

Related articles on PubMed

All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.