Distinguishing the Transcription Regulation Patterns in Promoters of Human Genes with Different Function or Evolutionary Age

Handle URI:
http://hdl.handle.net/10754/244611
Title:
Distinguishing the Transcription Regulation Patterns in Promoters of Human Genes with Different Function or Evolutionary Age
Authors:
Alam, Tanvir
Abstract:
Distinguishing transcription regulatory patterns of different gene groups is a common problem in various bioinformatics studies. In this work we developed a methodology to deal with such a problem based on machine learning techniques. We applied our method to two biologically important problems related to detecting a difference in transcription regulation of: a/ protein-coding and long non-coding RNAs (lncRNAs) in human, as well as b/ a difference between primate-specific and non-primate-specific long non-coding RNAs. Our method is capable to classify RNAs using various regulatory features of genes that transcribe into these RNAs, such as nucleotide frequencies, transcription factor binding sites, de novo sequence motifs, CpG islands, repetitive elements, histone modification marks, and others. Ten-fold cross-validation tests suggest that our model can distinguish protein-coding and non-coding RNAs with accuracy above 80%. Twenty-fold cross-validation tests suggest that our model can distinguish primate-specific from non-primate-specific promoters of lncRNAs with accuracy above 80%. Consequently, we can hypothesize that transcription of the groups of genes mentioned above are regulated by different mechanisms. Feature selection techniques allowed us to reduce the number of features significantly while keeping the accuracy around 80%. Consequently, we can conclude that selected features play significant role in transcription regulation of coding and non-coding genes, as well as primate-specific and non-primate-specific lncRNA genes.
Advisors:
Bajic, Vladimir B. ( 0000-0001-5435-4750 )
Committee Member:
Gao, Xin ( 0000-0002-7108-3574 ) ; Zhang, Xiangliang ( 0000-0002-3574-5665 )
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Program:
Computer Science
Issue Date:
Jul-2012
Type:
Thesis
Appears in Collections:
Theses; Computer Science Program; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.advisorBajic, Vladimir B.en
dc.contributor.authorAlam, Tanviren
dc.date.accessioned2012-09-18T09:09:27Z-
dc.date.available2012-09-18T09:09:27Z-
dc.date.issued2012-07en
dc.identifier.urihttp://hdl.handle.net/10754/244611en
dc.description.abstractDistinguishing transcription regulatory patterns of different gene groups is a common problem in various bioinformatics studies. In this work we developed a methodology to deal with such a problem based on machine learning techniques. We applied our method to two biologically important problems related to detecting a difference in transcription regulation of: a/ protein-coding and long non-coding RNAs (lncRNAs) in human, as well as b/ a difference between primate-specific and non-primate-specific long non-coding RNAs. Our method is capable to classify RNAs using various regulatory features of genes that transcribe into these RNAs, such as nucleotide frequencies, transcription factor binding sites, de novo sequence motifs, CpG islands, repetitive elements, histone modification marks, and others. Ten-fold cross-validation tests suggest that our model can distinguish protein-coding and non-coding RNAs with accuracy above 80%. Twenty-fold cross-validation tests suggest that our model can distinguish primate-specific from non-primate-specific promoters of lncRNAs with accuracy above 80%. Consequently, we can hypothesize that transcription of the groups of genes mentioned above are regulated by different mechanisms. Feature selection techniques allowed us to reduce the number of features significantly while keeping the accuracy around 80%. Consequently, we can conclude that selected features play significant role in transcription regulation of coding and non-coding genes, as well as primate-specific and non-primate-specific lncRNA genes.en
dc.language.isoenen
dc.titleDistinguishing the Transcription Regulation Patterns in Promoters of Human Genes with Different Function or Evolutionary Ageen
dc.typeThesisen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
thesis.degree.grantorKing Abdullah University of Science and Technologyen_GB
dc.contributor.committeememberGao, Xinen
dc.contributor.committeememberZhang, Xiangliangen
thesis.degree.disciplineComputer Scienceen
thesis.degree.nameMaster of Scienceen
dc.person.id113348en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.