Show simple item record

dc.contributor.authorDai, Hanjun
dc.contributor.authorUmarov, Ramzan
dc.contributor.authorKuwahara, Hiroyuki
dc.contributor.authorLi, Yu
dc.contributor.authorSong, Le
dc.contributor.authorGao, Xin
dc.date.accessioned2017-08-07T10:52:01Z
dc.date.available2017-08-07T10:52:01Z
dc.date.issued2017-07-26
dc.identifier.citationDai H, Umarov R, Kuwahara H, Li Y, Song L, et al. (2017) Sequence2Vec: A novel embedding approach for modeling transcription factor binding affinity landscape. Bioinformatics. Available: http://dx.doi.org/10.1093/bioinformatics/btx480.
dc.identifier.issn1367-4803
dc.identifier.issn1460-2059
dc.identifier.doi10.1093/bioinformatics/btx480
dc.identifier.urihttp://hdl.handle.net/10754/625301
dc.description.abstractMotivation: An accurate characterization of transcription factor (TF)-DNA affinity landscape is crucial to a quantitative understanding of the molecular mechanisms underpinning endogenous gene regulation. While recent advances in biotechnology have brought the opportunity for building binding affinity prediction methods, the accurate characterization of TF-DNA binding affinity landscape still remains a challenging problem. Results: Here we propose a novel sequence embedding approach for modeling the transcription factor binding affinity landscape. Our method represents DNA binding sequences as a hidden Markov model (HMM) which captures both position specific information and long-range dependency in the sequence. A cornerstone of our method is a novel message passing-like embedding algorithm, called Sequence2Vec, which maps these HMMs into a common nonlinear feature space and uses these embedded features to build a predictive model. Our method is a novel combination of the strength of probabilistic graphical models, feature space embedding and deep learning. We conducted comprehensive experiments on over 90 large-scale TF-DNA data sets which were measured by different high-throughput experimental technologies. Sequence2Vec outperforms alternative machine learning methods as well as the state-of-the-art binding affinity prediction methods.
dc.description.sponsorshipThe research reported in this publication was supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. URF/1/1976-04 and URF/1/3007-01. It was also supported in part by NSF IIS-1218749, NIH BIGDATA 1R01GM108341, NSF CAREER IIS-1350983, NSF IIS-1639792 EAGER, ONR N00014-15-1-2340, NVIDIA, Intel and Amazon AWS. This research made use of the resources of the computer clusters at KAUST.
dc.publisherOxford University Press (OUP)
dc.relation.urlhttps://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btx480#supplementary-data
dc.rightsThis is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/
dc.titleSequence2Vec: A novel embedding approach for modeling transcription factor binding affinity landscape
dc.typeArticle
dc.contributor.departmentComputational Bioscience Research Center (CBRC)
dc.contributor.departmentComputer Science Program
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
dc.identifier.journalBioinformatics
dc.eprint.versionPublisher's Version/PDF
dc.contributor.institutionCollege of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA.
kaust.personUmarov, Ramzan
kaust.personKuwahara, Hiroyuki
kaust.personLi, Yu
kaust.personGao, Xin
refterms.dateFOA2018-06-13T13:44:09Z


Files in this item

Thumbnail
Name:
btx480.pdf
Size:
2.257Mb
Format:
PDF
Description:
Main article
Thumbnail
Name:
btx480_Supp.pdf
Size:
8.466Mb
Format:
PDF
Description:
Supplemental files

This item appears in the following Collection(s)

Show simple item record

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
Except where otherwise noted, this item's license is described as This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com