Collective estimation of multiple bivariate density functions with application to angular-sampling-based protein loop modeling

Handle URI:
http://hdl.handle.net/10754/583052
Title:
Collective estimation of multiple bivariate density functions with application to angular-sampling-based protein loop modeling
Authors:
Maadooliat, Mehdi; Zhou, Lan; Najibi, Seyed Morteza; Gao, Xin ( 0000-0002-7108-3574 ) ; Huang, Jianhua Z.
Abstract:
This paper develops a method for simultaneous estimation of density functions for a collection of populations of protein backbone angle pairs using a data-driven, shared basis that is constructed by bivariate spline functions defined on a triangulation of the bivariate domain. The circular nature of angular data is taken into account by imposing appropriate smoothness constraints across boundaries of the triangles. Maximum penalized likelihood is used to fit the model and an alternating blockwise Newton-type algorithm is developed for computation. A simulation study shows that the collective estimation approach is statistically more efficient than estimating the densities individually. The proposed method was used to estimate neighbor-dependent distributions of protein backbone dihedral angles (i.e., Ramachandran distributions). The estimated distributions were applied to protein loop modeling, one of the most challenging open problems in protein structure prediction, by feeding them into an angular-sampling-based loop structure prediction framework. Our estimated distributions compared favorably to the Ramachandran distributions estimated by fitting a hierarchical Dirichlet process model; and in particular, our distributions showed significant improvements on the hard cases where existing methods do not work well.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Citation:
Collective estimation of multiple bivariate density functions with application to angular-sampling-based protein loop modeling 2015:00 Journal of the American Statistical Association
Publisher:
Informa UK Limited
Journal:
Journal of the American Statistical Association
Issue Date:
21-Oct-2015
DOI:
10.1080/01621459.2015.1099535
Type:
Article
ISSN:
0162-1459; 1537-274X
Additional Links:
http://www.tandfonline.com/doi/full/10.1080/01621459.2015.1099535
Appears in Collections:
Articles; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorMaadooliat, Mehdien
dc.contributor.authorZhou, Lanen
dc.contributor.authorNajibi, Seyed Mortezaen
dc.contributor.authorGao, Xinen
dc.contributor.authorHuang, Jianhua Z.en
dc.date.accessioned2015-12-01T13:40:13Zen
dc.date.available2015-12-01T13:40:13Zen
dc.date.issued2015-10-21en
dc.identifier.citationCollective estimation of multiple bivariate density functions with application to angular-sampling-based protein loop modeling 2015:00 Journal of the American Statistical Associationen
dc.identifier.issn0162-1459en
dc.identifier.issn1537-274Xen
dc.identifier.doi10.1080/01621459.2015.1099535en
dc.identifier.urihttp://hdl.handle.net/10754/583052en
dc.description.abstractThis paper develops a method for simultaneous estimation of density functions for a collection of populations of protein backbone angle pairs using a data-driven, shared basis that is constructed by bivariate spline functions defined on a triangulation of the bivariate domain. The circular nature of angular data is taken into account by imposing appropriate smoothness constraints across boundaries of the triangles. Maximum penalized likelihood is used to fit the model and an alternating blockwise Newton-type algorithm is developed for computation. A simulation study shows that the collective estimation approach is statistically more efficient than estimating the densities individually. The proposed method was used to estimate neighbor-dependent distributions of protein backbone dihedral angles (i.e., Ramachandran distributions). The estimated distributions were applied to protein loop modeling, one of the most challenging open problems in protein structure prediction, by feeding them into an angular-sampling-based loop structure prediction framework. Our estimated distributions compared favorably to the Ramachandran distributions estimated by fitting a hierarchical Dirichlet process model; and in particular, our distributions showed significant improvements on the hard cases where existing methods do not work well.en
dc.language.isoenen
dc.publisherInforma UK Limiteden
dc.relation.urlhttp://www.tandfonline.com/doi/full/10.1080/01621459.2015.1099535en
dc.rightsThis is an Accepted Manuscript of an article published by Taylor & Francis in Journal of the American Statistical Association on 21 Oct 2015, available online: http://wwww.tandfonline.com/10.1080/01621459.2015.1099535.en
dc.subjectBivariate splinesen
dc.subjectLog-spline density estimationen
dc.subjectProtein structureen
dc.subjectRamachandran distributionen
dc.subjectRoughness penaltyen
dc.subjectTriangulationsen
dc.titleCollective estimation of multiple bivariate density functions with application to angular-sampling-based protein loop modelingen
dc.typeArticleen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.identifier.journalJournal of the American Statistical Associationen
dc.eprint.versionPost-printen
dc.contributor.institutionDepartment of Math- ematics, Statistics and Computer Science, Marquette University, Wisconsin, 53201-1881, USA.en
dc.contributor.institutionDepartment of Statistics, Texas A&M University, Texas, 77843-3143, USA.en
dc.contributor.institutionDepartment of Statistics, Persian Gulf University, Bushehr, 75169, Iran.en
dc.contributor.institutionInstitute of Applied Mathematics and Computational Science, Texas A&M University.en
dc.contributor.institutionDepartment of Mathematics, Statistics and Computer Science, Marquette University.en
dc.contributor.affiliationKing Abdullah University of Science and Technology (KAUST)en
kaust.authorGao, Xinen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.