Computational Methods for ChIP-seq Data Analysis and Applications

Handle URI:
http://hdl.handle.net/10754/623305
Title:
Computational Methods for ChIP-seq Data Analysis and Applications
Authors:
Ashoor, Haitham ( 0000-0003-2527-0317 )
Abstract:
The development of Chromatin immunoprecipitation followed by sequencing (ChIP-seq) technology has enabled the construction of genome-wide maps of protein-DNA interaction. Such maps provide information about transcriptional regulation at the epigenetic level (histone modifications and histone variants) and at the level of transcription factor (TF) activity. This dissertation presents novel computational methods for ChIP-seq data analysis and applications. The work of this dissertation addresses four main challenges. First, I address the problem of detecting histone modifications from ChIP-seq cancer samples. The presence of copy number variations (CNVs) in cancer samples results in statistical biases that lead to inaccurate predictions when standard methods are used. To overcome this issue I developed HMCan, a specially designed algorithm to handle ChIP-seq cancer data by accounting for the presence of CNVs. When using ChIP-seq data from cancer cells, HMCan demonstrates unbiased and accurate predictions compared to the standard state of the art methods. Second, I address the problem of identifying changes in histone modifications between two ChIP-seq samples with different genetic backgrounds (for example cancer vs. normal). In addition to CNVs, different antibody efficiency between samples and presence of samples replicates are challenges for this problem. To overcome these issues, I developed the HMCan-diff algorithm as an extension to HMCan. HMCan-diff implements robust normalization methods to address the challenges listed above. HMCan-diff significantly outperforms another state of the art methods on data containing cancer samples. Third, I investigate and analyze predictions of different methods for enhancer prediction based on ChIP-seq data. The analysis shows that predictions generated by different methods are poorly overlapping. To overcome this issue, I developed DENdb, a database that integrates enhancer predictions from different methods. DENdb also integrates several experimental data including ChIP-seq data for TF binding sites. Finally, I present an extensive computational comparison of different ab-initio motif identification methods based on TF ChIP-seq data. The comparison included 10 different methods over 159 different TF datasets. Recommendations of this comparison indicate that the usage of simple methods outperforms the usage of high order models.
Advisors:
Bajic, Vladimir B. ( 0000-0001-5435-4750 )
Committee Member:
Fischle, Wolfgang; Gao, Xin ( 0000-0002-7108-3574 ) ; Schönbach, Christian
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Program:
Computer Science
Issue Date:
25-Apr-2017
Type:
Dissertation
Appears in Collections:
Dissertations

Full metadata record

DC FieldValue Language
dc.contributor.advisorBajic, Vladimir B.en
dc.contributor.authorAshoor, Haithamen
dc.date.accessioned2017-05-01T08:51:43Z-
dc.date.available2017-05-01T08:51:43Z-
dc.date.issued2017-04-25-
dc.identifier.urihttp://hdl.handle.net/10754/623305-
dc.description.abstractThe development of Chromatin immunoprecipitation followed by sequencing (ChIP-seq) technology has enabled the construction of genome-wide maps of protein-DNA interaction. Such maps provide information about transcriptional regulation at the epigenetic level (histone modifications and histone variants) and at the level of transcription factor (TF) activity. This dissertation presents novel computational methods for ChIP-seq data analysis and applications. The work of this dissertation addresses four main challenges. First, I address the problem of detecting histone modifications from ChIP-seq cancer samples. The presence of copy number variations (CNVs) in cancer samples results in statistical biases that lead to inaccurate predictions when standard methods are used. To overcome this issue I developed HMCan, a specially designed algorithm to handle ChIP-seq cancer data by accounting for the presence of CNVs. When using ChIP-seq data from cancer cells, HMCan demonstrates unbiased and accurate predictions compared to the standard state of the art methods. Second, I address the problem of identifying changes in histone modifications between two ChIP-seq samples with different genetic backgrounds (for example cancer vs. normal). In addition to CNVs, different antibody efficiency between samples and presence of samples replicates are challenges for this problem. To overcome these issues, I developed the HMCan-diff algorithm as an extension to HMCan. HMCan-diff implements robust normalization methods to address the challenges listed above. HMCan-diff significantly outperforms another state of the art methods on data containing cancer samples. Third, I investigate and analyze predictions of different methods for enhancer prediction based on ChIP-seq data. The analysis shows that predictions generated by different methods are poorly overlapping. To overcome this issue, I developed DENdb, a database that integrates enhancer predictions from different methods. DENdb also integrates several experimental data including ChIP-seq data for TF binding sites. Finally, I present an extensive computational comparison of different ab-initio motif identification methods based on TF ChIP-seq data. The comparison included 10 different methods over 159 different TF datasets. Recommendations of this comparison indicate that the usage of simple methods outperforms the usage of high order models.en
dc.language.isoenen
dc.subjectBioinformaticsen
dc.subjectcomputer scienceen
dc.subjectmachine learningen
dc.subjectEpigenomicsen
dc.subjecttranscription regulationen
dc.titleComputational Methods for ChIP-seq Data Analysis and Applicationsen
dc.typeDissertationen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
thesis.degree.grantorKing Abdullah University of Science and Technologyen_GB
dc.contributor.committeememberFischle, Wolfgangen
dc.contributor.committeememberGao, Xinen
dc.contributor.committeememberSchönbach, Christianen
thesis.degree.disciplineComputer Scienceen
thesis.degree.nameDoctor of Philosophyen
dc.person.id101761en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.