DNA motif elucidation using belief propagation

Handle URI:
http://hdl.handle.net/10754/325454
Title:
DNA motif elucidation using belief propagation
Authors:
Wong, Ka-Chun; Chan, Tak-Ming; Peng, Chengbin ( 0000-0002-7445-2638 ) ; Li, Yue; Zhang, Zhaolei
Abstract:
Protein-binding microarray (PBM) is a high-throughout platform that can measure the DNA-binding preference of a protein in a comprehensive and unbiased manner. A typical PBM experiment can measure binding signal intensities of a protein to all the possible DNA k-mers (k = 8 ?10); such comprehensive binding affinity data usually need to be reduced and represented as motif models before they can be further analyzed and applied. Since proteins can often bind to DNA in multiple modes, one of the major challenges is to decompose the comprehensive affinity data into multimodal motif representations. Here, we describe a new algorithm that uses Hidden Markov Models (HMMs) and can derive precise and multimodal motifs using belief propagations. We describe an HMM-based approach using belief propagations (kmerHMM), which accepts and preprocesses PBM probe raw data into median-binding intensities of individual k-mers. The k-mers are ranked and aligned for training an HMM as the underlying motif representation. Multiple motifs are then extracted from the HMM using belief propagations. Comparisons of kmerHMM with other leading methods on several data sets demonstrated its effectiveness and uniqueness. Especially, it achieved the best performance on more than half of the data sets. In addition, the multiple binding modes derived by kmerHMM are biologically meaningful and will be useful in interpreting other genome-wide data such as those generated from ChIP-seq. The executables and source codes are available at the authors' websites: e.g. http://www.cs.toronto.edu/?wkc/kmerHMM. 2013 The Author(s).
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Citation:
Wong K-C, Chan T-M, Peng C, Li Y, Zhang Z (2013) DNA motif elucidation using belief propagation. Nucleic Acids Research 41: e153-e153. doi:10.1093/nar/gkt574.
Publisher:
Oxford University Press (OUP)
Journal:
Nucleic Acids Research
Issue Date:
29-Jun-2013
DOI:
10.1093/nar/gkt574
PubMed ID:
23814189
PubMed Central ID:
PMC3763557
Type:
Article
ISSN:
03051048
Appears in Collections:
Articles; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorWong, Ka-Chunen
dc.contributor.authorChan, Tak-Mingen
dc.contributor.authorPeng, Chengbinen
dc.contributor.authorLi, Yueen
dc.contributor.authorZhang, Zhaoleien
dc.date.accessioned2014-08-27T09:52:00Z-
dc.date.available2014-08-27T09:52:00Z-
dc.date.issued2013-06-29en
dc.identifier.citationWong K-C, Chan T-M, Peng C, Li Y, Zhang Z (2013) DNA motif elucidation using belief propagation. Nucleic Acids Research 41: e153-e153. doi:10.1093/nar/gkt574.en
dc.identifier.issn03051048en
dc.identifier.pmid23814189en
dc.identifier.doi10.1093/nar/gkt574en
dc.identifier.urihttp://hdl.handle.net/10754/325454en
dc.description.abstractProtein-binding microarray (PBM) is a high-throughout platform that can measure the DNA-binding preference of a protein in a comprehensive and unbiased manner. A typical PBM experiment can measure binding signal intensities of a protein to all the possible DNA k-mers (k = 8 ?10); such comprehensive binding affinity data usually need to be reduced and represented as motif models before they can be further analyzed and applied. Since proteins can often bind to DNA in multiple modes, one of the major challenges is to decompose the comprehensive affinity data into multimodal motif representations. Here, we describe a new algorithm that uses Hidden Markov Models (HMMs) and can derive precise and multimodal motifs using belief propagations. We describe an HMM-based approach using belief propagations (kmerHMM), which accepts and preprocesses PBM probe raw data into median-binding intensities of individual k-mers. The k-mers are ranked and aligned for training an HMM as the underlying motif representation. Multiple motifs are then extracted from the HMM using belief propagations. Comparisons of kmerHMM with other leading methods on several data sets demonstrated its effectiveness and uniqueness. Especially, it achieved the best performance on more than half of the data sets. In addition, the multiple binding modes derived by kmerHMM are biologically meaningful and will be useful in interpreting other genome-wide data such as those generated from ChIP-seq. The executables and source codes are available at the authors' websites: e.g. http://www.cs.toronto.edu/?wkc/kmerHMM. 2013 The Author(s).en
dc.language.isoenen
dc.publisherOxford University Press (OUP)en
dc.rightsThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.en
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/en
dc.subjectDNA binding proteinen
dc.subjecttranscription factoren
dc.subjectDNAen
dc.subjectDNA binding proteinen
dc.subjecttranscription factoren
dc.subjectalgorithmen
dc.subjectbelief propagationen
dc.subjectDNA binding motifen
dc.subjecthidden Markov modelen
dc.subjectintermethod comparisonen
dc.subjectmicroarray analysisen
dc.subjectprotein bindingen
dc.subjectprotein binding microarrayen
dc.subjectbinding siteen
dc.subjectchemistryen
dc.subjectDNA sequenceen
dc.subjectmetabolismen
dc.subjectmethodologyen
dc.subjectmouseen
dc.subjectnucleotide motifen
dc.subjectprobabilityen
dc.subjectprotein microarrayen
dc.subjectAlgorithmsen
dc.subjectBinding Sitesen
dc.subjectDNAen
dc.subjectDNA-Binding Proteinsen
dc.subjectMarkov Chainsen
dc.subjectMiceen
dc.subjectNucleotide Motifsen
dc.subjectProtein Array Analysisen
dc.subjectSequence Analysis, DNAen
dc.subjectTranscription Factorsen
dc.titleDNA motif elucidation using belief propagationen
dc.typeArticleen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.identifier.journalNucleic Acids Researchen
dc.identifier.pmcidPMC3763557en
dc.eprint.versionPublisher's Version/PDFen
dc.contributor.institutionDepartment of Computer Science, University of Toronto, Toronto, ON, Canadaen
dc.contributor.institutionTerrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canadaen
dc.contributor.institutionDepartment of Integrative Biology and Physiology, University of California Los Angeles, Los Angeles, CA, United Statesen
dc.contributor.institutionBanting and Best Department of Medical Research, University of Toronto, Toronto, ON, Canadaen
dc.contributor.institutionDepartment of Molecular Genetics, University of Toronto, Toronto, ON, Canadaen
dc.contributor.affiliationKing Abdullah University of Science and Technology (KAUST)en
kaust.authorPeng, Chengbinen
This item is licensed under a Creative Commons License
Creative Commons
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.