Identifying Regulatory Patterns at the 3'end Regions of Over-expressed and Under-expressed Genes

Handle URI:
http://hdl.handle.net/10754/292823
Title:
Identifying Regulatory Patterns at the 3'end Regions of Over-expressed and Under-expressed Genes
Authors:
Othoum, Ghofran K
Abstract:
Promoters, neighboring regulatory regions and those extending further upstream of the 5’end of genes, are considered one of the main components affecting the expression status of genes in a specific phenotype. More recently research by Chen et al. (2006, 2012) and Mapendano et al. (2010) demonstrated that the 3’end regulatory regions of genes also influence gene expression. However, the association between the regulatory regions surrounding 3’end of genes and their over- or under-expression status in a particular phenotype has not been systematically studied. The aim of this study is to ascertain if regulatory regions surrounding the 3’end of genes contain sufficient regulatory information to correlate genes with their expression status in a particular phenotype. Over- and under-expressed ovarian cancer (OC) genes were used as a model. Exploratory analysis of the 3’end regions were performed by transforming the annotated regions using principal component analysis (PCA), followed by clustering the transformed data thereby achieving a clear separation of genes with different expression status. Additionally, several classification algorithms such as Naïve Bayes, Random Forest and Support Vector Machine (SVM) were tested with different parameter settings to analyze the discriminatory capacity of the 3’end regions of genes related to their gene expression status. The best performance was achieved using the SVM classification model with 10-fold cross-validation that yielded an accuracy of 98.4%, sensitivity of 99.5% and specificity of 92.5%. For gene expression status for newly available instances, based on information derived from the 3’end regions, an SVM predictive model was developed with 10-fold cross-validation that yielded an accuracy of 67.0%, sensitivity of 73.2% and specificity of 61.0%. Moreover, building an SVM with polynomial kernel model to PCA transformed data yielded an accuracy of 83.1%, sensitivity of 92.5% and specificity of 74.8% using 10-fold cross-validation for evaluation. These clustering and classification analyses strongly suggest that the regions surrounding the 3’end of genes contain sufficiently rich regulatory information to discriminate between over- and under-expressed genes; at least in the case of genes implicated in OC.
Advisors:
Bajic, Vladimir B. ( 0000-0001-5435-4750 )
Committee Member:
Essack, Magbubah; Moshkov, Mikhail ( 0000-0003-0085-9483 )
KAUST Department:
Biological and Environmental Sciences and Engineering (BESE) Division
Program:
Bioscience
Issue Date:
May-2013
Type:
Thesis
Appears in Collections:
Bioscience Program; Theses; Biological and Environmental Sciences and Engineering (BESE) Division

Full metadata record

DC FieldValue Language
dc.contributor.advisorBajic, Vladimir B.en
dc.contributor.authorOthoum, Ghofran Ken
dc.date.accessioned2013-05-27T05:51:09Z-
dc.date.available2013-05-27T05:51:09Z-
dc.date.issued2013-05en
dc.identifier.urihttp://hdl.handle.net/10754/292823en
dc.description.abstractPromoters, neighboring regulatory regions and those extending further upstream of the 5’end of genes, are considered one of the main components affecting the expression status of genes in a specific phenotype. More recently research by Chen et al. (2006, 2012) and Mapendano et al. (2010) demonstrated that the 3’end regulatory regions of genes also influence gene expression. However, the association between the regulatory regions surrounding 3’end of genes and their over- or under-expression status in a particular phenotype has not been systematically studied. The aim of this study is to ascertain if regulatory regions surrounding the 3’end of genes contain sufficient regulatory information to correlate genes with their expression status in a particular phenotype. Over- and under-expressed ovarian cancer (OC) genes were used as a model. Exploratory analysis of the 3’end regions were performed by transforming the annotated regions using principal component analysis (PCA), followed by clustering the transformed data thereby achieving a clear separation of genes with different expression status. Additionally, several classification algorithms such as Naïve Bayes, Random Forest and Support Vector Machine (SVM) were tested with different parameter settings to analyze the discriminatory capacity of the 3’end regions of genes related to their gene expression status. The best performance was achieved using the SVM classification model with 10-fold cross-validation that yielded an accuracy of 98.4%, sensitivity of 99.5% and specificity of 92.5%. For gene expression status for newly available instances, based on information derived from the 3’end regions, an SVM predictive model was developed with 10-fold cross-validation that yielded an accuracy of 67.0%, sensitivity of 73.2% and specificity of 61.0%. Moreover, building an SVM with polynomial kernel model to PCA transformed data yielded an accuracy of 83.1%, sensitivity of 92.5% and specificity of 74.8% using 10-fold cross-validation for evaluation. These clustering and classification analyses strongly suggest that the regions surrounding the 3’end of genes contain sufficiently rich regulatory information to discriminate between over- and under-expressed genes; at least in the case of genes implicated in OC.en
dc.language.isoenen
dc.subject3'end regionsen
dc.subjectregulatory regionsen
dc.subjectdata miningen
dc.subjectclustering analysisen
dc.subjectclassification modelen
dc.subjectovarian canceren
dc.titleIdentifying Regulatory Patterns at the 3'end Regions of Over-expressed and Under-expressed Genesen
dc.typeThesisen
dc.contributor.departmentBiological and Environmental Sciences and Engineering (BESE) Divisionen
thesis.degree.grantorKing Abdullah University of Science and Technologyen_GB
dc.contributor.committeememberEssack, Magbubahen
dc.contributor.committeememberMoshkov, Mikhailen
thesis.degree.disciplineBioscienceen
thesis.degree.nameMaster of Scienceen
dc.person.id118916en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.