DEEPre: sequence-based enzyme EC number prediction by deep learning

Handle URI:
http://hdl.handle.net/10754/625965
Title:
DEEPre: sequence-based enzyme EC number prediction by deep learning
Authors:
Li, Yu; Wang, Sheng; Umarov, Ramzan; Xie, Bingqing; Fan, Ming; Li, Lihua; Gao, Xin ( 0000-0002-7108-3574 )
Abstract:
Annotation of enzyme function has a broad range of applications, such as metagenomics, industrial biotechnology, and diagnosis of enzyme deficiency-caused diseases. However, the time and resource required make it prohibitively expensive to experimentally determine the function of every enzyme. Therefore, computational enzyme function prediction has become increasingly important. In this paper, we develop such an approach, determining the enzyme function by predicting the Enzyme Commission number.We propose an end-to-end feature selection and classification model training approach, as well as an automatic and robust feature dimensionality uniformization method, DEEPre, in the field of enzyme function prediction. Instead of extracting manuallycrafted features from enzyme sequences, our model takes the raw sequence encoding as inputs, extracting convolutional and sequential features from the raw encoding based on the classification result to directly improve the prediction performance. The thorough cross-fold validation experiments conducted on two large-scale datasets show that DEEPre improves the prediction performance over the previous state-of-the-art methods. In addition, our server outperforms five other servers in determining the main class of enzymes on a separate low-homology dataset. Two case studies demonstrate DEEPre's ability to capture the functional difference of enzyme isoforms.The server could be accessed freely at http://www.cbrc.kaust.edu.sa/DEEPre.
KAUST Department:
Computational Bioscience Research Center (CBRC); Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Citation:
Li Y, Wang S, Umarov R, Xie B, Fan M, et al. (2017) DEEPre: sequence-based enzyme EC number prediction by deep learning. Bioinformatics. Available: http://dx.doi.org/10.1093/bioinformatics/btx680.
Publisher:
Oxford University Press (OUP)
Journal:
Bioinformatics
KAUST Grant Number:
URF/1/1976-04; URF/1/3007-01
Issue Date:
20-Oct-2017
DOI:
10.1093/bioinformatics/btx680
Type:
Article
ISSN:
1367-4803; 1460-2059
Sponsors:
We would like to thank Prof. Kuo-Chen Chou for kindly providing the KNN dataset. This publication is based upon work supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No URF/1/1976-04 and URF/1/3007-01, National Natural Science Foundation of China (61401131 and 61731008).
Additional Links:
https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btx680/4562505/DEEPre-sequencebased-enzyme-EC-number-prediction
Appears in Collections:
Articles; Computational Bioscience Research Center (CBRC); Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorLi, Yuen
dc.contributor.authorWang, Shengen
dc.contributor.authorUmarov, Ramzanen
dc.contributor.authorXie, Bingqingen
dc.contributor.authorFan, Mingen
dc.contributor.authorLi, Lihuaen
dc.contributor.authorGao, Xinen
dc.date.accessioned2017-10-30T07:55:29Z-
dc.date.available2017-10-30T07:55:29Z-
dc.date.issued2017-10-20en
dc.identifier.citationLi Y, Wang S, Umarov R, Xie B, Fan M, et al. (2017) DEEPre: sequence-based enzyme EC number prediction by deep learning. Bioinformatics. Available: http://dx.doi.org/10.1093/bioinformatics/btx680.en
dc.identifier.issn1367-4803en
dc.identifier.issn1460-2059en
dc.identifier.doi10.1093/bioinformatics/btx680en
dc.identifier.urihttp://hdl.handle.net/10754/625965-
dc.description.abstractAnnotation of enzyme function has a broad range of applications, such as metagenomics, industrial biotechnology, and diagnosis of enzyme deficiency-caused diseases. However, the time and resource required make it prohibitively expensive to experimentally determine the function of every enzyme. Therefore, computational enzyme function prediction has become increasingly important. In this paper, we develop such an approach, determining the enzyme function by predicting the Enzyme Commission number.We propose an end-to-end feature selection and classification model training approach, as well as an automatic and robust feature dimensionality uniformization method, DEEPre, in the field of enzyme function prediction. Instead of extracting manuallycrafted features from enzyme sequences, our model takes the raw sequence encoding as inputs, extracting convolutional and sequential features from the raw encoding based on the classification result to directly improve the prediction performance. The thorough cross-fold validation experiments conducted on two large-scale datasets show that DEEPre improves the prediction performance over the previous state-of-the-art methods. In addition, our server outperforms five other servers in determining the main class of enzymes on a separate low-homology dataset. Two case studies demonstrate DEEPre's ability to capture the functional difference of enzyme isoforms.The server could be accessed freely at http://www.cbrc.kaust.edu.sa/DEEPre.en
dc.description.sponsorshipWe would like to thank Prof. Kuo-Chen Chou for kindly providing the KNN dataset. This publication is based upon work supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No URF/1/1976-04 and URF/1/3007-01, National Natural Science Foundation of China (61401131 and 61731008).en
dc.publisherOxford University Press (OUP)en
dc.relation.urlhttps://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btx680/4562505/DEEPre-sequencebased-enzyme-EC-number-predictionen
dc.rightsThis is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.comen
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/en
dc.titleDEEPre: sequence-based enzyme EC number prediction by deep learningen
dc.typeArticleen
dc.contributor.departmentComputational Bioscience Research Center (CBRC)en
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.identifier.journalBioinformaticsen
dc.eprint.versionPublisher's Version/PDFen
dc.contributor.institutionIllinois Institute of Technology, Computer Science Department, 10 West 35th Street, Chicago, IL 60616, USA.en
dc.contributor.institutionInstitute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou, 310018, China.en
kaust.authorLi, Yuen
kaust.authorWang, Shengen
kaust.authorUmarov, Ramzanen
kaust.authorGao, Xinen
kaust.grant.numberURF/1/1976-04en
kaust.grant.numberURF/1/3007-01en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.