A novel method for improved accuracy of transcription factor binding site prediction
Type
ArticleAuthors
Khamis, Abdullah M.
Motwalli, Olaa Amin

Oliva, Romina

Jankovic, Boris R.
Medvedeva, Yulia
Ashoor, Haitham

Essack, Magbubah

Gao, Xin

Bajic, Vladimir B.

KAUST Department
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) DivisionComputer Science Program
Computational Bioscience Research Center (CBRC)
Applied Mathematics and Computational Science Program
KAUST Grant Number
BAS/1/1606-01-01BAS/1/1606-01-01
Date
2018-04-02Online Publication Date
2018-04-02Print Publication Date
2018-07-06Permanent link to this record
http://hdl.handle.net/10754/627486
Metadata
Show full item recordAbstract
Identifying transcription factor (TF) binding sites (TFBSs) is important in the computational inference of gene regulation. Widely used computational methods of TFBS prediction based on position weight matrices (PWMs) usually have high false positive rates. Moreover, computational studies of transcription regulation in eukaryotes frequently require numerous PWM models of TFBSs due to a large number of TFs involved. To overcome these problems we developed DRAF, a novel method for TFBS prediction that requires only 14 prediction models for 232 human TFs, while at the same time significantly improves prediction accuracy. DRAF models use more features than PWM models, as they combine information from TFBS sequences and physicochemical properties of TF DNA-binding domains into machine learning models. Evaluation of DRAF on 98 human ChIP-seq datasets shows on average 1.54-, 1.96- and 5.19-fold reduction of false positives at the same sensitivities compared to models from HOCOMOCO, TRANSFAC and DeepBind, respectively. This observation suggests that one can efficiently replace the PWM models for TFBS prediction by a small number of DRAF models that significantly improve prediction accuracy. The DRAF method is implemented in a web tool and in a stand-alone software freely available at http://cbrc.kaust.edu.sa/DRAF.Citation
Khamis AM, Motwalli O, Oliva R, Jankovic BR, Medvedeva YA, et al. (2018) A novel method for improved accuracy of transcription factor binding site prediction. Nucleic Acids Research. Available: http://dx.doi.org/10.1093/nar/gky237.Sponsors
The computational analysis for this study was performed on Dragon and Snapdragon compute clusters of the Computational Bioscience Research Center at KAUST. King Abdullah University of Science and Technology (KAUST) [BAS/1/1606-01-01 to V.B.B.]. Funding for open access charge: KAUST [BAS/1/1606-01-01].Publisher
Oxford University Press (OUP)Journal
Nucleic Acids ResearchPubMed ID
29617876ae974a485f413a2113503eed53cd6c53
10.1093/nar/gky237
Scopus Count
Except where otherwise noted, this item's license is described as This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
Related articles
- EMQIT: a machine learning approach for energy based PWM matrix quality improvement.
- Authors: Smolinska K, Pacholczyk M
- Issue date: 2017 Aug 1
- Tree-based position weight matrix approach to model transcription factor binding site profiles.
- Authors: Bi Y, Kim H, Gupta R, Davuluri RV
- Issue date: 2011
- LASAGNA: a novel algorithm for transcription factor binding site alignment.
- Authors: Lee C, Huang CH
- Issue date: 2013 Mar 24
- A DNA shape-based regulatory score improves position-weight matrix-based recognition of transcription factor binding sites.
- Authors: Yang J, Ramsey SA
- Issue date: 2015 Nov 1
- HOCOMOCO: a comprehensive collection of human transcription factor binding sites models.
- Authors: Kulakovskiy IV, Medvedeva YA, Schaefer U, Kasianov AS, Vorontsov IE, Bajic VB, Makeev VJ
- Issue date: 2013 Jan