HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models

Handle URI:
http://hdl.handle.net/10754/613302
Title:
HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models
Authors:
Kulakovskiy, Ivan V.; Vorontsov, Ilya E.; Yevshin, Ivan S.; Soboleva, Anastasiia V.; Kasianov, Artem S.; Ashoor, Haitham ( 0000-0003-2527-0317 ) ; Ba Alawi, Wail ( 0000-0002-2747-4703 ) ; Bajic, Vladimir B. ( 0000-0001-5435-4750 ) ; Medvedeva, Yulia A.; Kolpakov, Fedor A.; Makeev, Vsevolod J.
Abstract:
Models of transcription factor (TF) binding sites provide a basis for a wide spectrum of studies in regulatory genomics, from reconstruction of regulatory networks to functional annotation of transcripts and sequence variants. While TFs may recognize different sequence patterns in different conditions, it is pragmatic to have a single generic model for each particular TF as a baseline for practical applications. Here we present the expanded and enhanced version of HOCOMOCO (http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco10), the collection of models of DNA patterns, recognized by transcription factors. HOCOMOCO now provides position weight matrix (PWM) models for binding sites of 601 human TFs and, in addition, PWMs for 396 mouse TFs. Furthermore, we introduce the largest up to date collection of dinucleotide PWM models for 86 (52) human (mouse) TFs. The update is based on the analysis of massive ChIP-Seq and HT-SELEX datasets, with the validation of the resulting models on in vivo data. To facilitate a practical application, all HOCOMOCO models are linked to gene and protein databases (Entrez Gene, HGNC, UniProt) and accompanied by precomputed score thresholds. Finally, we provide command-line tools for PWM and diPWM threshold estimation and motif finding in nucleotide sequences.
KAUST Department:
Computational Bioscience Research Center (CBRC)
Citation:
HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models 2016, 44 (D1):D116 Nucleic Acids Research
Publisher:
Oxford University Press (OUP)
Journal:
Nucleic Acids Research
Issue Date:
19-Nov-2015
DOI:
10.1093/nar/gkv1249
Type:
Article
ISSN:
0305-1048; 1362-4962
Sponsors:
We thank Evolutionary Genomics Laboratory, Faculty of Bioengineering and Bioinformatics (M.V. Lomonosov Moscow State University) and personally Prof. A.S. Kondrashov for computational facilities.
Additional Links:
http://nar.oxfordjournals.org/lookup/doi/10.1093/nar/gkv1249
Appears in Collections:
Articles

Full metadata record

DC FieldValue Language
dc.contributor.authorKulakovskiy, Ivan V.en
dc.contributor.authorVorontsov, Ilya E.en
dc.contributor.authorYevshin, Ivan S.en
dc.contributor.authorSoboleva, Anastasiia V.en
dc.contributor.authorKasianov, Artem S.en
dc.contributor.authorAshoor, Haithamen
dc.contributor.authorBa Alawi, Wailen
dc.contributor.authorBajic, Vladimir B.en
dc.contributor.authorMedvedeva, Yulia A.en
dc.contributor.authorKolpakov, Fedor A.en
dc.contributor.authorMakeev, Vsevolod J.en
dc.date.accessioned2016-06-16T09:33:37Z-
dc.date.available2016-06-16T09:33:37Z-
dc.date.issued2015-11-19-
dc.identifier.citationHOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models 2016, 44 (D1):D116 Nucleic Acids Researchen
dc.identifier.issn0305-1048-
dc.identifier.issn1362-4962-
dc.identifier.doi10.1093/nar/gkv1249-
dc.identifier.urihttp://hdl.handle.net/10754/613302-
dc.description.abstractModels of transcription factor (TF) binding sites provide a basis for a wide spectrum of studies in regulatory genomics, from reconstruction of regulatory networks to functional annotation of transcripts and sequence variants. While TFs may recognize different sequence patterns in different conditions, it is pragmatic to have a single generic model for each particular TF as a baseline for practical applications. Here we present the expanded and enhanced version of HOCOMOCO (http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco10), the collection of models of DNA patterns, recognized by transcription factors. HOCOMOCO now provides position weight matrix (PWM) models for binding sites of 601 human TFs and, in addition, PWMs for 396 mouse TFs. Furthermore, we introduce the largest up to date collection of dinucleotide PWM models for 86 (52) human (mouse) TFs. The update is based on the analysis of massive ChIP-Seq and HT-SELEX datasets, with the validation of the resulting models on in vivo data. To facilitate a practical application, all HOCOMOCO models are linked to gene and protein databases (Entrez Gene, HGNC, UniProt) and accompanied by precomputed score thresholds. Finally, we provide command-line tools for PWM and diPWM threshold estimation and motif finding in nucleotide sequences.en
dc.description.sponsorshipWe thank Evolutionary Genomics Laboratory, Faculty of Bioengineering and Bioinformatics (M.V. Lomonosov Moscow State University) and personally Prof. A.S. Kondrashov for computational facilities.en
dc.language.isoenen
dc.publisherOxford University Press (OUP)en
dc.relation.urlhttp://nar.oxfordjournals.org/lookup/doi/10.1093/nar/gkv1249en
dc.rightsThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com. http://creativecommons.org/licenses/by-nc/4.0/en
dc.titleHOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites modelsen
dc.typeArticleen
dc.contributor.departmentComputational Bioscience Research Center (CBRC)en
dc.identifier.journalNucleic Acids Researchen
dc.eprint.versionPublisher's Version/PDFen
dc.contributor.institutionEngelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991, GSP-1, Vavilova 32, Moscow, Russiaen
dc.contributor.institutionVavilov Institute of General Genetics, Russian Academy of Sciences, 119991, GSP-1, Gubkina 3, Moscow, Russiaen
dc.contributor.institutionDesign Technological Institute of Digital Techniques, Siberian Branch of the Russian Academy of Sciences, 630090, Academician Rzhanov 6, Novosibirsk, Russiaen
dc.contributor.institutionInstitute of Systems Biology Ltd, 630112, office 901, Krasina 54, Novosibirsk, Russiaen
dc.contributor.institutionMoscow Institute of Physics and Technology, 141700, Institutskiy per. 9, Dolgoprudny, Moscow Region, Russiaen
dc.contributor.institutionCenter for Bioengineering, Russian Academy of Sciences, 117312, 60-letiya Oktyabrya 7/2, Moscow, Russiaen
dc.contributor.affiliationKing Abdullah University of Science and Technology (KAUST)en
kaust.authorAshoor, Haithamen
kaust.authorBa Alawi, Wailen
kaust.authorBajic, Vladimir B.en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.