Show simple item record

dc.contributor.advisorBajic, Vladimir B.
dc.contributor.authorOthoum, Ghofran K.
dc.date.accessioned2013-05-27T05:51:09Z
dc.date.available2014-05-27T00:00:00Z
dc.date.issued2013-05
dc.identifier.citationOthoum, G. K. (2013). Identifying Regulatory Patterns at the 3'end Regions of Over-expressed and Under-expressed Genes. KAUST Research Repository. https://doi.org/10.25781/KAUST-7G81Z
dc.identifier.doi10.25781/KAUST-7G81Z
dc.identifier.urihttp://hdl.handle.net/10754/292823
dc.description.abstractPromoters, neighboring regulatory regions and those extending further upstream of the 5’end of genes, are considered one of the main components affecting the expression status of genes in a specific phenotype. More recently research by Chen et al. (2006, 2012) and Mapendano et al. (2010) demonstrated that the 3’end regulatory regions of genes also influence gene expression. However, the association between the regulatory regions surrounding 3’end of genes and their over- or under-expression status in a particular phenotype has not been systematically studied. The aim of this study is to ascertain if regulatory regions surrounding the 3’end of genes contain sufficient regulatory information to correlate genes with their expression status in a particular phenotype. Over- and under-expressed ovarian cancer (OC) genes were used as a model. Exploratory analysis of the 3’end regions were performed by transforming the annotated regions using principal component analysis (PCA), followed by clustering the transformed data thereby achieving a clear separation of genes with different expression status. Additionally, several classification algorithms such as Naïve Bayes, Random Forest and Support Vector Machine (SVM) were tested with different parameter settings to analyze the discriminatory capacity of the 3’end regions of genes related to their gene expression status. The best performance was achieved using the SVM classification model with 10-fold cross-validation that yielded an accuracy of 98.4%, sensitivity of 99.5% and specificity of 92.5%. For gene expression status for newly available instances, based on information derived from the 3’end regions, an SVM predictive model was developed with 10-fold cross-validation that yielded an accuracy of 67.0%, sensitivity of 73.2% and specificity of 61.0%. Moreover, building an SVM with polynomial kernel model to PCA transformed data yielded an accuracy of 83.1%, sensitivity of 92.5% and specificity of 74.8% using 10-fold cross-validation for evaluation. These clustering and classification analyses strongly suggest that the regions surrounding the 3’end of genes contain sufficiently rich regulatory information to discriminate between over- and under-expressed genes; at least in the case of genes implicated in OC.
dc.language.isoen
dc.subject3'end regions
dc.subjectregulatory regions
dc.subjectdata mining
dc.subjectclustering analysis
dc.subjectclassification model
dc.subjectovarian cancer
dc.titleIdentifying Regulatory Patterns at the 3'end Regions of Over-expressed and Under-expressed Genes
dc.typeThesis
dc.contributor.departmentBiological and Environmental Science and Engineering (BESE) Division
dc.rights.embargodate2014-05-27
thesis.degree.grantorKing Abdullah University of Science and Technology
dc.contributor.committeememberEssack, Magbubah
dc.contributor.committeememberMoshkov, Mikhail
thesis.degree.disciplineBioscience
thesis.degree.nameMaster of Science
dc.rights.accessrightsAt the time of archiving, the student author of this thesis opted to temporarily restrict access to it. The full text of this thesis became available to the public after the expiration of the embargo on 2014-05-27.
refterms.dateFOA2014-05-27T00:00:00Z


Files in this item

Thumbnail
Name:
ThesisGhofranOthoum.pdf
Size:
1.194Mb
Format:
PDF
Description:
ThesisGhofranOthoum

This item appears in the following Collection(s)

Show simple item record