Pipeline for the Analysis of ChIP-seq Data and New Motif Ranking Procedure
Type
ThesisAuthors
Ashoor, Haitham
Advisors
Bajic, Vladimir B.
Committee members
Moshkov, Mikhail
Zhang, Xiangliang

Program
Computer ScienceDate
2011-06Permanent link to this record
http://hdl.handle.net/10754/136689
Metadata
Show full item recordAbstract
This thesis presents a computational methodology for ab-initio identification of transcription factor binding sites based on ChIP-seq data. This method consists of three main steps, namely ChIP-seq data processing, motif discovery and models selection. A novel method for ranking the models of motifs identified in this process is proposed. This method combines multiple factors in order to rank the provided candidate motifs. It combines the model coverage of the ChIP-seq fragments that contain motifs from which that model is built, the suitable background data made up of shuffled ChIP-seq fragments, and the p-value that resulted from evaluating the model on actual and background data. Two ChIP-seq datasets retrieved from ENCODE project are used to evaluate and demonstrate the ability of the method to predict correct TFBSs with high precision. The first dataset relates to neuron-restrictive silencer factor, NRSF, while the second one corresponds to growth-associated binding protein, GABP. The pipeline system shows high precision prediction for both datasets, as in both cases the top ranked motif closely resembles the known motifs for the respective transcription factors.Citation
Ashoor, H. (2011). Pipeline for the Analysis of ChIP-seq Data and New Motif Ranking Procedure. KAUST Research Repository. https://doi.org/10.25781/KAUST-D30U6ae974a485f413a2113503eed53cd6c53
10.25781/KAUST-D30U6