Discovering approximate-associated sequence patterns for protein-DNA interactions

Handle URI:
http://hdl.handle.net/10754/594175
Title:
Discovering approximate-associated sequence patterns for protein-DNA interactions
Authors:
Chan, Tak Ming; Wong, Ka Chun; Lee, Kin Hong; Wong, Man Hon; Lau, Chi Kong; Tsui, Stephen Kwok Wing; Leung, Kwong Sak
Abstract:
Motivation: The bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) are fundamental protein-DNA interactions in transcriptional regulation. Extensive efforts have been made to better understand the protein-DNA interactions. Recent mining on exact TF-TFBS-associated sequence patterns (rules) has shown great potentials and achieved very promising results. However, exact rules cannot handle variations in real data, resulting in limited informative rules. In this article, we generalize the exact rules to approximate ones for both TFs and TFBSs, which are essential for biological variations. Results: A progressive approach is proposed to address the approximation to alleviate the computational requirements. Firstly, similar TFBSs are grouped from the available TF-TFBS data (TRANSFAC database). Secondly, approximate and highly conserved binding cores are discovered from TF sequences corresponding to each TFBS group. A customized algorithm is developed for the specific objective. We discover the approximate TF-TFBS rules by associating the grouped TFBS consensuses and TF cores. The rules discovered are evaluated by matching (verifying with) the actual protein-DNA binding pairs from Protein Data Bank (PDB) 3D structures. The approximate results exhibit many more verified rules and up to 300% better verification ratios than the exact ones. The customized algorithm achieves over 73% better verification ratios than traditional methods. Approximate rules (64-79%) are shown statistically significant. Detailed variation analysis and conservation verification on NCBI records demonstrate that the approximate rules reveal both the flexible and specific protein-DNA interactions accurately. The approximate TF-TFBS rules discovered show great generalized capability of exploring more informative binding rules. © The Author 2010. Published by Oxford University Press. All rights reserved.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Citation:
Chan T-M, Wong K-C, Lee K-H, Wong M-H, Lau C-K, et al. (2010) Discovering approximate-associated sequence patterns for protein-DNA interactions. Bioinformatics 27: 471–478. Available: http://dx.doi.org/10.1093/bioinformatics/btq682.
Publisher:
Oxford University Press (OUP)
Journal:
Bioinformatics
Issue Date:
30-Dec-2010
DOI:
10.1093/bioinformatics/btq682
PubMed ID:
21193520
Type:
Article
ISSN:
1367-4803; 1460-2059
Sponsors:
The research is supported by the grant CUHK414708 from the Research Grants Council of the Hong Kong SAR, China, and Focused Investment Scheme D on Hong Kong Bioinformatics Centre (Project Number: 1904014) from The Chinese University of Hong Kong.
Appears in Collections:
Articles; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorChan, Tak Mingen
dc.contributor.authorWong, Ka Chunen
dc.contributor.authorLee, Kin Hongen
dc.contributor.authorWong, Man Honen
dc.contributor.authorLau, Chi Kongen
dc.contributor.authorTsui, Stephen Kwok Wingen
dc.contributor.authorLeung, Kwong Saken
dc.date.accessioned2016-01-19T13:23:12Zen
dc.date.available2016-01-19T13:23:12Zen
dc.date.issued2010-12-30en
dc.identifier.citationChan T-M, Wong K-C, Lee K-H, Wong M-H, Lau C-K, et al. (2010) Discovering approximate-associated sequence patterns for protein-DNA interactions. Bioinformatics 27: 471–478. Available: http://dx.doi.org/10.1093/bioinformatics/btq682.en
dc.identifier.issn1367-4803en
dc.identifier.issn1460-2059en
dc.identifier.pmid21193520en
dc.identifier.doi10.1093/bioinformatics/btq682en
dc.identifier.urihttp://hdl.handle.net/10754/594175en
dc.description.abstractMotivation: The bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) are fundamental protein-DNA interactions in transcriptional regulation. Extensive efforts have been made to better understand the protein-DNA interactions. Recent mining on exact TF-TFBS-associated sequence patterns (rules) has shown great potentials and achieved very promising results. However, exact rules cannot handle variations in real data, resulting in limited informative rules. In this article, we generalize the exact rules to approximate ones for both TFs and TFBSs, which are essential for biological variations. Results: A progressive approach is proposed to address the approximation to alleviate the computational requirements. Firstly, similar TFBSs are grouped from the available TF-TFBS data (TRANSFAC database). Secondly, approximate and highly conserved binding cores are discovered from TF sequences corresponding to each TFBS group. A customized algorithm is developed for the specific objective. We discover the approximate TF-TFBS rules by associating the grouped TFBS consensuses and TF cores. The rules discovered are evaluated by matching (verifying with) the actual protein-DNA binding pairs from Protein Data Bank (PDB) 3D structures. The approximate results exhibit many more verified rules and up to 300% better verification ratios than the exact ones. The customized algorithm achieves over 73% better verification ratios than traditional methods. Approximate rules (64-79%) are shown statistically significant. Detailed variation analysis and conservation verification on NCBI records demonstrate that the approximate rules reveal both the flexible and specific protein-DNA interactions accurately. The approximate TF-TFBS rules discovered show great generalized capability of exploring more informative binding rules. © The Author 2010. Published by Oxford University Press. All rights reserved.en
dc.description.sponsorshipThe research is supported by the grant CUHK414708 from the Research Grants Council of the Hong Kong SAR, China, and Focused Investment Scheme D on Hong Kong Bioinformatics Centre (Project Number: 1904014) from The Chinese University of Hong Kong.en
dc.publisherOxford University Press (OUP)en
dc.titleDiscovering approximate-associated sequence patterns for protein-DNA interactionsen
dc.typeArticleen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.identifier.journalBioinformaticsen
dc.contributor.institutionDepartment of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin N. T., Hong Kongen
dc.contributor.institutionSchool of Biomedical Sciences, The Chinese University of Hong Kong, Shatin N. T., Hong Kongen
dc.contributor.institutionHong Kong Bioinformatics Centre, Shatin N. T., Hong Kongen
kaust.authorWong, Ka Chunen

Related articles on PubMed

All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.