Predicting Gene Functions and Phenotypes by combining Deep Learning and Ontologies
Type
DissertationAuthors
Kulmanov, Maxat
Advisors
Hoehndorf, Robert
Committee members
Arold, Stefan T.
Moshkov, Mikhail

Hunter, Larry
Program
Computer ScienceDate
2020-04-08Permanent link to this record
http://hdl.handle.net/10754/662467
Metadata
Show full item recordAbstract
The amount of available protein sequences is rapidly increasing, mainly as a consequence of the development and application of high throughput sequencing technologies in the life sciences. It is a key question in the life sciences to identify the functions of proteins, and furthermore to identify the phenotypes that may be associated with a loss (or gain) of function in these proteins. Protein functions are generally determined experimentally, and it is clear that experimental determination of protein functions will not scale to the current { and rapidly increasing { amount of available protein sequences (over 300 million). Furthermore, identifying phenotypes resulting from loss of function is even more challenging as the phenotype is modi ed by whole organism interactions and environmental variables. It is clear that accurate computational prediction of protein functions and loss of function phenotypes would be of signi cant value both to academic research and to the biotechnology industry. We developed and expanded novel methods for representation learning, predicting protein functions and their loss of function phenotypes. We use deep neural network algorithm and combine them with symbolic inference into neural-symbolic algorithms. Our work signi cantly improves previously developed methods for predicting protein functions through methodological advances in machine learning, incorporation of broader data types that may be predictive of functions, and improved systems for neural-symbolic integration. The methods we developed are generic and can be applied to other domains in which similar types of structured and unstructured information exist. In future, our methods can be applied to prediction of protein function for metagenomic samples in order to evaluate the potential for discovery of novel proteins of industrial value. Also our methods can be applied to the prediction of loss of function phenotypes in human genetics and incorporate the results in a variant prioritization tool that can be applied to diagnose patients with Mendelian disorders.Citation
Kulmanov, M. (2020). Predicting Gene Functions and Phenotypes by combining Deep Learning and Ontologies. KAUST Research Repository. https://doi.org/10.25781/KAUST-13X08ae974a485f413a2113503eed53cd6c53
10.25781/KAUST-13X08