Show simple item record

dc.contributor.advisorBajic, Vladimir B.
dc.contributor.authorKhamis, Abdullah M.
dc.date.accessioned2016-04-20T09:33:00Z
dc.date.available2017-04-20T00:00:00Z
dc.date.issued2016-03-31
dc.identifier.doi10.25781/KAUST-14K4P
dc.identifier.urihttp://hdl.handle.net/10754/606030
dc.description.abstractProteins play critical roles in cellular processes of living organisms. It is therefore important to identify and characterize their key properties associated with their functions. Correlating protein’s structural, sequence and physicochemical properties of its amino acids (aa) with protein functions could identify some of the critical factors governing the specific functionality. We point out that not all functions of even well studied proteins are known. This, complemented by the huge increase in the number of newly discovered and predicted proteins, makes challenging the experimental characterization of the whole spectrum of possible protein functions for all proteins of interest. Consequently, the use of computational methods has become more attractive. Here we address two questions. The first one is how to use protein aa sequence and physicochemical properties to characterize a family of proteins. The second one focuses on how to use transcription factor (TF) protein’s domains to enhance accuracy of predicting TF DNA binding sites (TFBSs). To address the first question, we developed a novel method using computational representation of proteins based on characteristics of different protein regions (N-terminal, M-region and C-terminal) and combined these with the properties of protein aa sequences. We show that this description provides important biological insight about characterization of the protein functional groups. Using feature selection techniques, we identified key properties of proteins that allow for very accurate characterization of different protein families. We demonstrated efficiency of our method in application to a number of antimicrobial peptide families. To address the second question we developed another novel method that uses a combination of aa properties of DNA binding domains of TFs and their TFBS properties to develop machine learning models for predicting TFBSs. Feature selection is used to identify the most relevant characteristics of the aa for such modeling. In addition to reducing the number of required models to only 14 for several hundred TFs, the final prediction accuracy of our models appears dramatically better than with other methods. Overall, we show how to efficiently utilize properties of proteins in deriving more accurate solutions for two important problems of computational biology and bioinformatics. 
dc.language.isoen
dc.subjectMachine Learning
dc.subjectfeature selection
dc.subjectprotein properties
dc.subjectBioinformatics
dc.titleMachine Learning Identification of Protein Properties Useful for Specific Applications
dc.typeDissertation
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
dc.rights.embargodate2017-04-20
thesis.degree.grantorKing Abdullah University of Science and Technology
dc.contributor.committeememberGojobori, Takashi
dc.contributor.committeememberTegner, Jesper
dc.contributor.committeememberGao, Xin
thesis.degree.disciplineComputer Science
thesis.degree.nameDoctor of Philosophy
dc.rights.accessrightsAt the time of archiving, the student author of this dissertation opted to temporarily restrict access to it. The full text of this dissertation became available to the public after the expiration of the embargo on 2017-04-20.
refterms.dateFOA2017-04-20T00:00:00Z


Files in this item

Thumbnail
Name:
PhD Dissertation_Final copy.pdf
Size:
5.777Mb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record