BLProt: Prediction of bioluminescent proteins based on support vector machine and relieff feature selection

Handle URI:
http://hdl.handle.net/10754/325467
Title:
BLProt: Prediction of bioluminescent proteins based on support vector machine and relieff feature selection
Authors:
Kandaswamy, Krishna Kumar; Pugalenthi, Ganesan; Hazrati, Mehrnaz Khodam; Kalies, Kai-Uwe; Martinetz, Thomas
Abstract:
Background: Bioluminescence is a process in which light is emitted by a living organism. Most creatures that emit light are sea creatures, but some insects, plants, fungi etc, also emit light. The biotechnological application of bioluminescence has become routine and is considered essential for many medical and general technological advances. Identification of bioluminescent proteins is more challenging due to their poor similarity in sequence. So far, no specific method has been reported to identify bioluminescent proteins from primary sequence.Results: In this paper, we propose a novel predictive method that uses a Support Vector Machine (SVM) and physicochemical properties to predict bioluminescent proteins. BLProt was trained using a dataset consisting of 300 bioluminescent proteins and 300 non-bioluminescent proteins, and evaluated by an independent set of 141 bioluminescent proteins and 18202 non-bioluminescent proteins. To identify the most prominent features, we carried out feature selection with three different filter approaches, ReliefF, infogain, and mRMR. We selected five different feature subsets by decreasing the number of features, and the performance of each feature subset was evaluated.Conclusion: BLProt achieves 80% accuracy from training (5 fold cross-validations) and 80.06% accuracy from testing. The performance of BLProt was compared with BLAST and HMM. High prediction accuracy and successful prediction of hypothetical proteins suggests that BLProt can be a useful approach to identify bioluminescent proteins from sequence information, irrespective of their sequence similarity. 2011 Kandaswamy et al; licensee BioMed Central Ltd.
KAUST Department:
Biosciences Core Lab; Structural and Functional Bioinformatics Group
Citation:
Kandaswamy K, Pugalenthi G, Hazrati M, Kalies K-U, Martinetz T (2011) BLProt: prediction of bioluminescent proteins based on support vector machine and relieff feature selection. BMC Bioinformatics 12: 345. doi:10.1186/1471-2105-12-345.
Publisher:
BioMed Central
Journal:
BMC Bioinformatics
Issue Date:
17-Aug-2011
DOI:
10.1186/1471-2105-12-345
PubMed ID:
21849049
PubMed Central ID:
PMC3176267
Type:
Article
ISSN:
14712105
Appears in Collections:
Articles; Biosciences Core Lab; Structural and Functional Bioinformatics Group

Full metadata record

DC FieldValue Language
dc.contributor.authorKandaswamy, Krishna Kumaren
dc.contributor.authorPugalenthi, Ganesanen
dc.contributor.authorHazrati, Mehrnaz Khodamen
dc.contributor.authorKalies, Kai-Uween
dc.contributor.authorMartinetz, Thomasen
dc.date.accessioned2014-08-27T09:52:37Z-
dc.date.available2014-08-27T09:52:37Z-
dc.date.issued2011-08-17en
dc.identifier.citationKandaswamy K, Pugalenthi G, Hazrati M, Kalies K-U, Martinetz T (2011) BLProt: prediction of bioluminescent proteins based on support vector machine and relieff feature selection. BMC Bioinformatics 12: 345. doi:10.1186/1471-2105-12-345.en
dc.identifier.issn14712105en
dc.identifier.pmid21849049en
dc.identifier.doi10.1186/1471-2105-12-345en
dc.identifier.urihttp://hdl.handle.net/10754/325467en
dc.description.abstractBackground: Bioluminescence is a process in which light is emitted by a living organism. Most creatures that emit light are sea creatures, but some insects, plants, fungi etc, also emit light. The biotechnological application of bioluminescence has become routine and is considered essential for many medical and general technological advances. Identification of bioluminescent proteins is more challenging due to their poor similarity in sequence. So far, no specific method has been reported to identify bioluminescent proteins from primary sequence.Results: In this paper, we propose a novel predictive method that uses a Support Vector Machine (SVM) and physicochemical properties to predict bioluminescent proteins. BLProt was trained using a dataset consisting of 300 bioluminescent proteins and 300 non-bioluminescent proteins, and evaluated by an independent set of 141 bioluminescent proteins and 18202 non-bioluminescent proteins. To identify the most prominent features, we carried out feature selection with three different filter approaches, ReliefF, infogain, and mRMR. We selected five different feature subsets by decreasing the number of features, and the performance of each feature subset was evaluated.Conclusion: BLProt achieves 80% accuracy from training (5 fold cross-validations) and 80.06% accuracy from testing. The performance of BLProt was compared with BLAST and HMM. High prediction accuracy and successful prediction of hypothetical proteins suggests that BLProt can be a useful approach to identify bioluminescent proteins from sequence information, irrespective of their sequence similarity. 2011 Kandaswamy et al; licensee BioMed Central Ltd.en
dc.language.isoenen
dc.publisherBioMed Centralen
dc.rightsThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.en
dc.rights.urihttp://creativecommons.org/licenses/by/2.0en
dc.subjectBiotechnological applicationsen
dc.subjectHypothetical proteinen
dc.subjectPhysicochemical propertyen
dc.subjectPrediction accuracyen
dc.subjectProminent featuresen
dc.subjectSequence informationsen
dc.subjectSequence similarityen
dc.subjectTechnological advancesen
dc.subjectBiologyen
dc.subjectForecastingen
dc.subjectPhosphorescenceen
dc.subjectProteinsen
dc.subjectSupport vector machinesen
dc.subjectBioluminescenceen
dc.subjectFungien
dc.subjectHexapodaen
dc.subjectphotoproteinen
dc.subjectchemistryen
dc.subjectcomputer programen
dc.subjectprobabilityen
dc.subjectsupport vector machineen
dc.subjectLuminescent Proteinsen
dc.subjectMarkov Chainsen
dc.subjectSoftwareen
dc.subjectSupport Vector Machinesen
dc.titleBLProt: Prediction of bioluminescent proteins based on support vector machine and relieff feature selectionen
dc.typeArticleen
dc.contributor.departmentBiosciences Core Laben
dc.contributor.departmentStructural and Functional Bioinformatics Groupen
dc.identifier.journalBMC Bioinformaticsen
dc.identifier.pmcidPMC3176267en
dc.eprint.versionPublisher's Version/PDFen
dc.contributor.institutionInstitute for Neuro- and Bioinformatics, University of Lbeck, 23538 Lbeck, Germanyen
dc.contributor.institutionGraduate School for Computing in Medicine and Life Sciences, University of Lbeck, 23538 Lbeck, Germanyen
dc.contributor.institutionInstitute for Signal Processing, University of Lbeck, 23538 Lbeck, Germanyen
dc.contributor.institutionCentre for Structural and Cell Biology in Medicine, Institute of Biology, University of Lbeck, Germanyen
dc.contributor.affiliationKing Abdullah University of Science and Technology (KAUST)en
kaust.authorGanesan, Pugalenthien
This item is licensed under a Creative Commons License
Creative Commons
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.