EcmPred: Prediction of extracellular matrix proteins based on random forest with maximum relevance minimum redundancy feature selection

Handle URI:
http://hdl.handle.net/10754/562580
Title:
EcmPred: Prediction of extracellular matrix proteins based on random forest with maximum relevance minimum redundancy feature selection
Authors:
Kandaswamy, Krishna Kumar Umar; Ganesan, Pugalenthi; Kalies, Kai Uwe; Hartmann, Enno; Martinetz, Thomas M.
Abstract:
The extracellular matrix (ECM) is a major component of tissues of multicellular organisms. It consists of secreted macromolecules, mainly polysaccharides and glycoproteins. Malfunctions of ECM proteins lead to severe disorders such as marfan syndrome, osteogenesis imperfecta, numerous chondrodysplasias, and skin diseases. In this work, we report a random forest approach, EcmPred, for the prediction of ECM proteins from protein sequences. EcmPred was trained on a dataset containing 300 ECM and 300 non-ECM and tested on a dataset containing 145 ECM and 4187 non-ECM proteins. EcmPred achieved 83% accuracy on the training and 77% on the test dataset. EcmPred predicted 15 out of 20 experimentally verified ECM proteins. By scanning the entire human proteome, we predicted novel ECM proteins validated with gene ontology and InterPro. The dataset and standalone version of the EcmPred software is available at http://www.inb.uni-luebeck.de/tools-demos/Extracellular_matrix_proteins/EcmPred. © 2012 Elsevier Ltd.
KAUST Department:
Biosciences Core Lab; Core Labs
Publisher:
Elsevier
Journal:
Journal of Theoretical Biology
Issue Date:
Jan-2013
DOI:
10.1016/j.jtbi.2012.10.015
PubMed ID:
23123454
Type:
Article
ISSN:
00225193
Sponsors:
This work was supported by the Graduate School for Computing in Medicine and Life Sciences funded by Germany's Excellence Initiative [DFG GSC 235/1]. KKK acknowledges Dr. Bianca Habermann, Max Planck Institute for Biology of Ageing, Germany for her support.
Appears in Collections:
Articles; Biosciences Core Lab

Full metadata record

DC FieldValue Language
dc.contributor.authorKandaswamy, Krishna Kumar Umaren
dc.contributor.authorGanesan, Pugalenthien
dc.contributor.authorKalies, Kai Uween
dc.contributor.authorHartmann, Ennoen
dc.contributor.authorMartinetz, Thomas M.en
dc.date.accessioned2015-08-03T10:43:35Zen
dc.date.available2015-08-03T10:43:35Zen
dc.date.issued2013-01en
dc.identifier.issn00225193en
dc.identifier.pmid23123454en
dc.identifier.doi10.1016/j.jtbi.2012.10.015en
dc.identifier.urihttp://hdl.handle.net/10754/562580en
dc.description.abstractThe extracellular matrix (ECM) is a major component of tissues of multicellular organisms. It consists of secreted macromolecules, mainly polysaccharides and glycoproteins. Malfunctions of ECM proteins lead to severe disorders such as marfan syndrome, osteogenesis imperfecta, numerous chondrodysplasias, and skin diseases. In this work, we report a random forest approach, EcmPred, for the prediction of ECM proteins from protein sequences. EcmPred was trained on a dataset containing 300 ECM and 300 non-ECM and tested on a dataset containing 145 ECM and 4187 non-ECM proteins. EcmPred achieved 83% accuracy on the training and 77% on the test dataset. EcmPred predicted 15 out of 20 experimentally verified ECM proteins. By scanning the entire human proteome, we predicted novel ECM proteins validated with gene ontology and InterPro. The dataset and standalone version of the EcmPred software is available at http://www.inb.uni-luebeck.de/tools-demos/Extracellular_matrix_proteins/EcmPred. © 2012 Elsevier Ltd.en
dc.description.sponsorshipThis work was supported by the Graduate School for Computing in Medicine and Life Sciences funded by Germany's Excellence Initiative [DFG GSC 235/1]. KKK acknowledges Dr. Bianca Habermann, Max Planck Institute for Biology of Ageing, Germany for her support.en
dc.publisherElsevieren
dc.subjectExtracellular proteinsen
dc.subjectHuman proteomeen
dc.subjectMaximum relevance minimum redundancy (mRMR)en
dc.subjectRandom foresten
dc.subjectSequence propertiesen
dc.titleEcmPred: Prediction of extracellular matrix proteins based on random forest with maximum relevance minimum redundancy feature selectionen
dc.typeArticleen
dc.contributor.departmentBiosciences Core Laben
dc.contributor.departmentCore Labsen
dc.identifier.journalJournal of Theoretical Biologyen
dc.contributor.institutionInstitute for Neuro- and Bioinformatics, University of Luebeck, Germanyen
dc.contributor.institutionGraduate School for Computing in Medicine and Life Sciences, University of Luebeck, Germanyen
dc.contributor.institutionMax Planck Institute for Biology of Ageing, Germanyen
dc.contributor.institutionCentre for Structural and Cell Biology in Medicine, Institute of Biology, University of Luebeck, Germanyen
kaust.authorGanesan, Pugalenthien

Related articles on PubMed

All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.