Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions

Handle URI:
http://hdl.handle.net/10754/622856
Title:
Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions
Authors:
Najibi, Seyed Morteza; Maadooliat, Mehdi; Zhou, Lan; Huang, Jianhua Z.; Gao, Xin ( 0000-0002-7108-3574 )
Abstract:
Recently, the study of protein structures using angular representations has attracted much attention among structural biologists. The main challenge is how to efficiently model the continuous conformational space of the protein structures based on the differences and similarities between different Ramachandran plots. Despite the presence of statistical methods for modeling angular data of proteins, there is still a substantial need for more sophisticated and faster statistical tools to model the large-scale circular datasets. To address this need, we have developed a nonparametric method for collective estimation of multiple bivariate density functions for a collection of populations of protein backbone angles. The proposed method takes into account the circular nature of the angular data using trigonometric spline which is more efficient compared to existing methods. This collective density estimation approach is widely applicable when there is a need to estimate multiple density functions from different populations with common features. Moreover, the coefficients of adaptive basis expansion for the fitted densities provide a low-dimensional representation that is useful for visualization, clustering, and classification of the densities. The proposed method provides a novel and unique perspective to two important and challenging problems in protein structure research: structure-based protein classification and angular-sampling-based protein loop structure prediction.
KAUST Department:
Computational Bioscience Research Center (CBRC); Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Citation:
Najibi SM, Maadooliat M, Zhou L, Huang JZ, Gao X (2017) Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions. Computational and Structural Biotechnology Journal. Available: http://dx.doi.org/10.1016/j.csbj.2017.01.011.
Publisher:
Elsevier BV
Journal:
Computational and Structural Biotechnology Journal
KAUST Grant Number:
URF/1/1976-04
Issue Date:
8-Feb-2017
DOI:
10.1016/j.csbj.2017.01.011
Type:
Article
ISSN:
2001-0370
Sponsors:
We are grateful to Professor Roland L. Dunbrack for providing the data set for the neighbor-dependent Ramachandran distribution application, and to Amelie Stein for help with the implementation of Rosetta. The research reported in this publication was supported by the King Abdullah University of Science and Technology (KAUST)Office of Sponsored Research (OSR) under Award No. URF/1/1976-04.
Additional Links:
http://www.sciencedirect.com/science/article/pii/S2001037016300885
Appears in Collections:
Articles; Computational Bioscience Research Center (CBRC); Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorNajibi, Seyed Mortezaen
dc.contributor.authorMaadooliat, Mehdien
dc.contributor.authorZhou, Lanen
dc.contributor.authorHuang, Jianhua Z.en
dc.contributor.authorGao, Xinen
dc.date.accessioned2017-02-09T12:55:03Z-
dc.date.available2017-02-09T12:55:03Z-
dc.date.issued2017-02-08en
dc.identifier.citationNajibi SM, Maadooliat M, Zhou L, Huang JZ, Gao X (2017) Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions. Computational and Structural Biotechnology Journal. Available: http://dx.doi.org/10.1016/j.csbj.2017.01.011.en
dc.identifier.issn2001-0370en
dc.identifier.doi10.1016/j.csbj.2017.01.011en
dc.identifier.urihttp://hdl.handle.net/10754/622856-
dc.description.abstractRecently, the study of protein structures using angular representations has attracted much attention among structural biologists. The main challenge is how to efficiently model the continuous conformational space of the protein structures based on the differences and similarities between different Ramachandran plots. Despite the presence of statistical methods for modeling angular data of proteins, there is still a substantial need for more sophisticated and faster statistical tools to model the large-scale circular datasets. To address this need, we have developed a nonparametric method for collective estimation of multiple bivariate density functions for a collection of populations of protein backbone angles. The proposed method takes into account the circular nature of the angular data using trigonometric spline which is more efficient compared to existing methods. This collective density estimation approach is widely applicable when there is a need to estimate multiple density functions from different populations with common features. Moreover, the coefficients of adaptive basis expansion for the fitted densities provide a low-dimensional representation that is useful for visualization, clustering, and classification of the densities. The proposed method provides a novel and unique perspective to two important and challenging problems in protein structure research: structure-based protein classification and angular-sampling-based protein loop structure prediction.en
dc.description.sponsorshipWe are grateful to Professor Roland L. Dunbrack for providing the data set for the neighbor-dependent Ramachandran distribution application, and to Amelie Stein for help with the implementation of Rosetta. The research reported in this publication was supported by the King Abdullah University of Science and Technology (KAUST)Office of Sponsored Research (OSR) under Award No. URF/1/1976-04.en
dc.publisherElsevier BVen
dc.relation.urlhttp://www.sciencedirect.com/science/article/pii/S2001037016300885en
dc.rightsNOTICE: this is the author’s version of a work that was accepted for publication in Computational and Structural Biotechnology Journal. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Computational and Structural Biotechnology Journal, [, , (2017-02-08)] DOI: 10.1016/j.csbj.2017.01.011 . © 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/en
dc.subjectBivariate splinesen
dc.subjectLog-spline density estimationen
dc.subjectProtein Structureen
dc.subjectRamachandran distributionen
dc.subjectRoughness penaltyen
dc.subjectTrigonometric B-splineen
dc.subjectProtein Classificationen
dc.subjectSCOPen
dc.titleProtein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributionsen
dc.typeArticleen
dc.contributor.departmentComputational Bioscience Research Center (CBRC)en
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.identifier.journalComputational and Structural Biotechnology Journalen
dc.eprint.versionPost-printen
dc.contributor.institutionDepartment of Statistics, Persian Gulf University, Bushehr, 75169, Iranen
dc.contributor.institutionDepartment of Mathematics, Statistics and Computer Science, Marquette University, Wisconsin, 53201-1881, USAen
dc.contributor.institutionDepartment of Statistics, Texas A&M University, Texas, 77843-3143, USAen
kaust.authorGao, Xinen
kaust.grant.numberURF/1/1976-04en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.