HOCOMOCO: A comprehensive collection of human transcription factor binding sites models

Handle URI:
http://hdl.handle.net/10754/325453
Title:
HOCOMOCO: A comprehensive collection of human transcription factor binding sites models
Authors:
Kulakovskiy, Ivan V.; Medvedeva, Yulia A.; Schaefer, Ulf; Kasianov, Artem S.; Vorontsov, Ilya E.; Bajic, Vladimir B. ( 0000-0001-5435-4750 ) ; Makeev, Vsevolod J.
Abstract:
Transcription factor (TF) binding site (TFBS) models are crucial for computational reconstruction of transcription regulatory networks. In existing repositories, a TF often has several models (also called binding profiles or motifs), obtained from different experimental data. Having a single TFBS model for a TF is more pragmatic for practical applications. We show that integration of TFBS data from various types of experiments into a single model typically results in the improved model quality probably due to partial correction of source specific technique bias. We present the Homo sapiens comprehensive model collection (HOCOMOCO, http://autosome.ru/HOCOMOCO/, http://cbrc.kaust.edu.sa/ hocomoco/) containing carefully hand-curated TFBS models constructed by integration of binding sequences obtained by both low- and high-throughput methods. To construct position weight matrices to represent these TFBS models, we used ChIPMunk software in four computational modes, including newly developed periodic positional prior mode associated with DNA helix pitch. We selected only one TFBS model per TF, unless there was a clear experimental evidence for two rather distinct TFBS models. We assigned a quality rating to each model. HOCOMOCO contains 426 systematically curated TFBS models for 401 human TFs, where 172 models are based on more than one data source. The Author(s) 2012.
KAUST Department:
Computational Bioscience Research Center (CBRC)
Citation:
Kulakovskiy IV, Medvedeva YA, Schaefer U, Kasianov AS, Vorontsov IE, et al. (2012) HOCOMOCO: a comprehensive collection of human transcription factor binding sites models. Nucleic Acids Research 41: D195-D202. doi:10.1093/nar/gks1089.
Publisher:
Oxford University Press (OUP)
Journal:
Nucleic Acids Research
Issue Date:
21-Nov-2012
DOI:
10.1093/nar/gks1089
PubMed ID:
23175603
PubMed Central ID:
PMC3531053
Type:
Article
ISSN:
03051048
Appears in Collections:
Articles; Computational Bioscience Research Center (CBRC)

Full metadata record

DC FieldValue Language
dc.contributor.authorKulakovskiy, Ivan V.en
dc.contributor.authorMedvedeva, Yulia A.en
dc.contributor.authorSchaefer, Ulfen
dc.contributor.authorKasianov, Artem S.en
dc.contributor.authorVorontsov, Ilya E.en
dc.contributor.authorBajic, Vladimir B.en
dc.contributor.authorMakeev, Vsevolod J.en
dc.date.accessioned2014-08-27T09:51:57Z-
dc.date.available2014-08-27T09:51:57Z-
dc.date.issued2012-11-21en
dc.identifier.citationKulakovskiy IV, Medvedeva YA, Schaefer U, Kasianov AS, Vorontsov IE, et al. (2012) HOCOMOCO: a comprehensive collection of human transcription factor binding sites models. Nucleic Acids Research 41: D195-D202. doi:10.1093/nar/gks1089.en
dc.identifier.issn03051048en
dc.identifier.pmid23175603en
dc.identifier.doi10.1093/nar/gks1089en
dc.identifier.urihttp://hdl.handle.net/10754/325453en
dc.description.abstractTranscription factor (TF) binding site (TFBS) models are crucial for computational reconstruction of transcription regulatory networks. In existing repositories, a TF often has several models (also called binding profiles or motifs), obtained from different experimental data. Having a single TFBS model for a TF is more pragmatic for practical applications. We show that integration of TFBS data from various types of experiments into a single model typically results in the improved model quality probably due to partial correction of source specific technique bias. We present the Homo sapiens comprehensive model collection (HOCOMOCO, http://autosome.ru/HOCOMOCO/, http://cbrc.kaust.edu.sa/ hocomoco/) containing carefully hand-curated TFBS models constructed by integration of binding sequences obtained by both low- and high-throughput methods. To construct position weight matrices to represent these TFBS models, we used ChIPMunk software in four computational modes, including newly developed periodic positional prior mode associated with DNA helix pitch. We selected only one TFBS model per TF, unless there was a clear experimental evidence for two rather distinct TFBS models. We assigned a quality rating to each model. HOCOMOCO contains 426 systematically curated TFBS models for 401 human TFs, where 172 models are based on more than one data source. The Author(s) 2012.en
dc.language.isoenen
dc.publisherOxford University Press (OUP)en
dc.rightsThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial reuse, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com.en
dc.rights.urihttp://creativecommons.org/licenses/by-nc/3.0en
dc.subjecttranscription factoren
dc.subjectbinding siteen
dc.subjectchromatin immunoprecipitationen
dc.subjectcomputer programen
dc.subjectdata baseen
dc.subjectDNA helixen
dc.subjectDNA sequenceen
dc.subjectgenetic proceduresen
dc.subjecthigh throughput sequencingen
dc.subjectHomo sapiens comprehensive model collection databaseen
dc.subjectlow throughput sequencingen
dc.subjectmathematical modelen
dc.subjectposition weight matrixen
dc.subjectprocess developmenten
dc.subjectreliabilityen
dc.subjectsystems biologyen
dc.subjecttranscription regulationen
dc.subjectBinding Sitesen
dc.subjectDatabases, Geneticen
dc.subjectInterneten
dc.subjectModels, Geneticen
dc.subjectPosition-Specific Scoring Matricesen
dc.subjectRegulatory Elements, Transcriptionalen
dc.subjectTranscription Factorsen
dc.subjectHomo sapiensen
dc.titleHOCOMOCO: A comprehensive collection of human transcription factor binding sites modelsen
dc.typeArticleen
dc.contributor.departmentComputational Bioscience Research Center (CBRC)en
dc.identifier.journalNucleic Acids Researchen
dc.identifier.pmcidPMC3531053en
dc.eprint.versionPublisher's Version/PDFen
dc.contributor.institutionLaboratory of Bioinformatics and Systems Biology, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Street 32, Moscow 119991, GSP-1, Russian Federationen
dc.contributor.institutionDepartment of Computational Systems Biology, Vavilov Institute of General Genetics, Russian Academy of Sciences, Gubkina Street 3, Moscow 119991, Russian Federationen
dc.contributor.institutionYandex Data Analysis School, Data Analysis Department, Moscow Institute of Physics and Technology, Leo Tolstoy Street 16, Moscow 119021, Russian Federationen
dc.contributor.institutionDepartment of Molecular and Biological Physics, Moscow Institute of Physics and Technology, 9 Institutskiy pereulok, Dolgoprudny, Moscow 141700, Russian Federationen
dc.contributor.affiliationKing Abdullah University of Science and Technology (KAUST)en
kaust.authorMedvedeva, Yuliaen
kaust.authorBajic, Vladimir B.en
kaust.authorSchaefer, Ulfen

Related articles on PubMed

This item is licensed under a Creative Commons License
Creative Commons
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.