Generalizing and learning protein-DNA binding sequence representations by an evolutionary algorithm

Type
Article

Authors
Wong, Ka Chun
Peng, Chengbin
Wong, Manhon
Leung, Kwongsak

KAUST Department
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Computer Science Program

Online Publication Date
2011-02-05

Print Publication Date
2011-08

Date
2011-02-05

Abstract
Protein-DNA bindings are essential activities. Understanding them forms the basis for further deciphering of biological and genetic systems. In particular, the protein-DNA bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) play a central role in gene transcription. Comprehensive TF-TFBS binding sequence pairs have been found in a recent study. However, they are in one-to-one mappings which cannot fully reflect the many-to-many mappings within the bindings. An evolutionary algorithm is proposed to learn generalized representations (many-to-many mappings) from the TF-TFBS binding sequence pairs (one-to-one mappings). The generalized pairs are shown to be more meaningful than the original TF-TFBS binding sequence pairs. Some representative examples have been analyzed in this study. In particular, it shows that the TF-TFBS binding sequence pairs are not presumably in one-to-one mappings. They can also exhibit many-to-many mappings. The proposed method can help us extract such many-to-many information from the one-to-one TF-TFBS binding sequence pairs found in the previous study, providing further knowledge in understanding the bindings between TFs and TFBSs. © 2011 Springer-Verlag.

Citation
Wong, K.-C., Peng, C., Wong, M.-H., & Leung, K.-S. (2011). Generalizing and learning protein-DNA binding sequence representations by an evolutionary algorithm. Soft Computing, 15(8), 1631–1642. doi:10.1007/s00500-011-0692-5

Acknowledgements
The authors are grateful to the anonymous reviewers for their valuable comments. They would like to thank Tak-Ming Chan for his help on surveying the related works. This research is partially supported by the grants from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project Nos. 414107 and 414708).

Publisher
Springer Nature

Journal
Soft Computing

DOI
10.1007/s00500-011-0692-5

Permanent link to this record