Distinguishing the Transcription Regulation Patterns in Promoters of Human Genes with Different Function or Evolutionary Age
Type
ThesisAuthors
Alam, Tanvir
Advisors
Bajic, Vladimir B.
Committee members
Gao, Xin
Zhang, Xiangliang

Program
Computer ScienceDate
2012-07Embargo End Date
2013-07-30Permanent link to this record
http://hdl.handle.net/10754/244611
Metadata
Show full item recordAccess Restrictions
At the time of archiving, the student author of this thesis opted to temporarily restrict access to it. The full text of this thesis became available to the public after the expiration of the embargo on 2013-07-30.Abstract
Distinguishing transcription regulatory patterns of different gene groups is a common problem in various bioinformatics studies. In this work we developed a methodology to deal with such a problem based on machine learning techniques. We applied our method to two biologically important problems related to detecting a difference in transcription regulation of: a/ protein-coding and long non-coding RNAs (lncRNAs) in human, as well as b/ a difference between primate-specific and non-primate-specific long non-coding RNAs. Our method is capable to classify RNAs using various regulatory features of genes that transcribe into these RNAs, such as nucleotide frequencies, transcription factor binding sites, de novo sequence motifs, CpG islands, repetitive elements, histone modification marks, and others. Ten-fold cross-validation tests suggest that our model can distinguish protein-coding and non-coding RNAs with accuracy above 80%. Twenty-fold cross-validation tests suggest that our model can distinguish primate-specific from non-primate-specific promoters of lncRNAs with accuracy above 80%. Consequently, we can hypothesize that transcription of the groups of genes mentioned above are regulated by different mechanisms. Feature selection techniques allowed us to reduce the number of features significantly while keeping the accuracy around 80%. Consequently, we can conclude that selected features play significant role in transcription regulation of coding and non-coding genes, as well as primate-specific and non-primate-specific lncRNA genes.Citation
Alam, T. (2012). Distinguishing the Transcription Regulation Patterns in Promoters of Human Genes with Different Function or Evolutionary Age. KAUST Research Repository. https://doi.org/10.25781/KAUST-85H4Dae974a485f413a2113503eed53cd6c53
10.25781/KAUST-85H4D