EcmPred: Prediction of extracellular matrix proteins based on random forest with maximum relevance minimum redundancy feature selection
Type
ArticleAuthors
Kandaswamy, Krishna Kumar UmarGanesan, Pugalenthi
Kalies, Kai Uwe
Hartmann, Enno
Martinetz, Thomas M.
KAUST Department
Bioscience Core LabCore Labs
Date
2013-01Permanent link to this record
http://hdl.handle.net/10754/562580
Metadata
Show full item recordAbstract
The extracellular matrix (ECM) is a major component of tissues of multicellular organisms. It consists of secreted macromolecules, mainly polysaccharides and glycoproteins. Malfunctions of ECM proteins lead to severe disorders such as marfan syndrome, osteogenesis imperfecta, numerous chondrodysplasias, and skin diseases. In this work, we report a random forest approach, EcmPred, for the prediction of ECM proteins from protein sequences. EcmPred was trained on a dataset containing 300 ECM and 300 non-ECM and tested on a dataset containing 145 ECM and 4187 non-ECM proteins. EcmPred achieved 83% accuracy on the training and 77% on the test dataset. EcmPred predicted 15 out of 20 experimentally verified ECM proteins. By scanning the entire human proteome, we predicted novel ECM proteins validated with gene ontology and InterPro. The dataset and standalone version of the EcmPred software is available at http://www.inb.uni-luebeck.de/tools-demos/Extracellular_matrix_proteins/EcmPred. © 2012 Elsevier Ltd.Citation
Kandaswamy, K. K., Pugalenthi, G., Kalies, K.-U., Hartmann, E., & Martinetz, T. (2013). EcmPred: Prediction of extracellular matrix proteins based on random forest with maximum relevance minimum redundancy feature selection. Journal of Theoretical Biology, 317, 377–383. doi:10.1016/j.jtbi.2012.10.015Sponsors
This work was supported by the Graduate School for Computing in Medicine and Life Sciences funded by Germany's Excellence Initiative [DFG GSC 235/1]. KKK acknowledges Dr. Bianca Habermann, Max Planck Institute for Biology of Ageing, Germany for her support.Publisher
Elsevier BVJournal
Journal of Theoretical BiologyPubMed ID
23123454ae974a485f413a2113503eed53cd6c53
10.1016/j.jtbi.2012.10.015
Scopus Count
Related articles
- SPRED: A machine learning approach for the identification of classical and non-classical secretory proteins in mammalian genomes.
- Authors: Kandaswamy KK, Pugalenthi G, Hartmann E, Kalies KU, Möller S, Suganthan PN, Martinetz T
- Issue date: 2010 Jan 15
- An ensemble method with hybrid features to identify extracellular matrix proteins.
- Authors: Yang R, Zhang C, Gao R, Zhang L
- Issue date: 2015
- A machine learning based method for the prediction of secretory proteins using amino acid composition, their order and similarity-search.
- Authors: Garg A, Raghava GP
- Issue date: 2008
- Prediction of RNA-binding residues in proteins from primary sequence using an enriched random forest model with a novel hybrid feature.
- Authors: Ma X, Guo J, Wu J, Liu H, Yu J, Xie J, Sun X
- Issue date: 2011 Apr
- Unraveling the human bone microenvironment beyond the classical extracellular matrix proteins: a human bone protein library.
- Authors: Alves RD, Demmers JA, Bezstarosti K, van der Eerden BC, Verhaar JA, Eijken M, van Leeuwen JP
- Issue date: 2011 Oct 7