Modeling DNA affinity landscape through two-round support vector regression with weighted degree kernels

Handle URI:
http://hdl.handle.net/10754/575906
Title:
Modeling DNA affinity landscape through two-round support vector regression with weighted degree kernels
Authors:
Wang, Xiaolei; Kuwahara, Hiroyuki; Gao, Xin ( 0000-0002-7108-3574 )
Abstract:
Background: A quantitative understanding of interactions between transcription factors (TFs) and their DNA binding sites is key to the rational design of gene regulatory networks. Recent advances in high-throughput technologies have enabled high-resolution measurements of protein-DNA binding affinity. Importantly, such experiments revealed the complex nature of TF-DNA interactions, whereby the effects of nucleotide changes on the binding affinity were observed to be context dependent. A systematic method to give high-quality estimates of such complex affinity landscapes is, thus, essential to the control of gene expression and the advance of synthetic biology. Results: Here, we propose a two-round prediction method that is based on support vector regression (SVR) with weighted degree (WD) kernels. In the first round, a WD kernel with shifts and mismatches is used with SVR to detect the importance of subsequences with different lengths at different positions. The subsequences identified as important in the first round are then fed into a second WD kernel to fit the experimentally measured affinities. To our knowledge, this is the first attempt to increase the accuracy of the affinity prediction by applying two rounds of string kernels and by identifying a small number of crucial k-mers. The proposed method was tested by predicting the binding affinity landscape of Gcn4p in Saccharomyces cerevisiae using datasets from HiTS-FLIP. Our method explicitly identified important subsequences and showed significant performance improvements when compared with other state-of-the-art methods. Based on the identified important subsequences, we discovered two surprisingly stable 10-mers and one sensitive 10-mer which were not reported before. Further test on four other TFs in S. cerevisiae demonstrated the generality of our method. Conclusion: We proposed in this paper a two-round method to quantitatively model the DNA binding affinity landscape. Since the ability to modify genetic parts to fine-tune gene expression rates is crucial to the design of biological systems, such a tool may play an important role in the success of synthetic biology going forward.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division; Computer Science Program; Computational Bioscience Research Center (CBRC); Computer Science Program; Structural and Functional Bioinformatics Group; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division; Computer Science Program; Computational Bioscience Research Center (CBRC); Computer Science Program; Structural and Functional Bioinformatics Group
Publisher:
Springer Nature
Journal:
BMC Systems Biology
Issue Date:
12-Dec-2014
DOI:
10.1186/1752-0509-8-S5-S5
Type:
Article
ISSN:
1752-0509
Sponsors:
We thank Polly Fordyce for valuable discussions about the MITOMI2.0 datasets. This work and publication costs were supported by the grant number, FCC/1/1976-04-01, made by King Abdullah University of Science and Technology (KAUST).
Appears in Collections:
Articles; Structural and Functional Bioinformatics Group; Structural and Functional Bioinformatics Group; Computer Science Program; Computer Science Program; Computational Bioscience Research Center (CBRC); Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorWang, Xiaoleien
dc.contributor.authorKuwahara, Hiroyukien
dc.contributor.authorGao, Xinen
dc.date.accessioned2015-08-25T06:19:00Zen
dc.date.available2015-08-25T06:19:00Zen
dc.date.issued2014-12-12en
dc.identifier.issn1752-0509en
dc.identifier.doi10.1186/1752-0509-8-S5-S5en
dc.identifier.urihttp://hdl.handle.net/10754/575906en
dc.description.abstractBackground: A quantitative understanding of interactions between transcription factors (TFs) and their DNA binding sites is key to the rational design of gene regulatory networks. Recent advances in high-throughput technologies have enabled high-resolution measurements of protein-DNA binding affinity. Importantly, such experiments revealed the complex nature of TF-DNA interactions, whereby the effects of nucleotide changes on the binding affinity were observed to be context dependent. A systematic method to give high-quality estimates of such complex affinity landscapes is, thus, essential to the control of gene expression and the advance of synthetic biology. Results: Here, we propose a two-round prediction method that is based on support vector regression (SVR) with weighted degree (WD) kernels. In the first round, a WD kernel with shifts and mismatches is used with SVR to detect the importance of subsequences with different lengths at different positions. The subsequences identified as important in the first round are then fed into a second WD kernel to fit the experimentally measured affinities. To our knowledge, this is the first attempt to increase the accuracy of the affinity prediction by applying two rounds of string kernels and by identifying a small number of crucial k-mers. The proposed method was tested by predicting the binding affinity landscape of Gcn4p in Saccharomyces cerevisiae using datasets from HiTS-FLIP. Our method explicitly identified important subsequences and showed significant performance improvements when compared with other state-of-the-art methods. Based on the identified important subsequences, we discovered two surprisingly stable 10-mers and one sensitive 10-mer which were not reported before. Further test on four other TFs in S. cerevisiae demonstrated the generality of our method. Conclusion: We proposed in this paper a two-round method to quantitatively model the DNA binding affinity landscape. Since the ability to modify genetic parts to fine-tune gene expression rates is crucial to the design of biological systems, such a tool may play an important role in the success of synthetic biology going forward.en
dc.description.sponsorshipWe thank Polly Fordyce for valuable discussions about the MITOMI2.0 datasets. This work and publication costs were supported by the grant number, FCC/1/1976-04-01, made by King Abdullah University of Science and Technology (KAUST).en
dc.publisherSpringer Natureen
dc.titleModeling DNA affinity landscape through two-round support vector regression with weighted degree kernelsen
dc.typeArticleen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.contributor.departmentComputer Science Programen
dc.contributor.departmentComputational Bioscience Research Center (CBRC)en
dc.contributor.departmentComputer Science Programen
dc.contributor.departmentStructural and Functional Bioinformatics Groupen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.contributor.departmentComputer Science Programen
dc.contributor.departmentComputational Bioscience Research Center (CBRC)en
dc.contributor.departmentComputer Science Programen
dc.contributor.departmentStructural and Functional Bioinformatics Groupen
dc.identifier.journalBMC Systems Biologyen
kaust.authorKuwahara, Hiroyukien
kaust.authorGao, Xinen
kaust.authorKuwahara, Hiroyukien
kaust.authorGao, Xinen
kaust.authorWang, Xiaoleien
kaust.authorWang, Xiaoleien
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.