Ontology based text mining of gene-phenotype associations: application to candidate gene prediction
Type
ArticleAuthors
Kafkas, SenayHoehndorf, Robert

KAUST Department
Bio-Ontology Research Group (BORG)Computational Bioscience Research Center (CBRC)
Computer Science Program
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
KAUST Grant Number
URF/1/3454-01-01FCC/1/1976-08-01
Date
2019-02-27Online Publication Date
2019-02-27Print Publication Date
2019-01-01Permanent link to this record
http://hdl.handle.net/10754/631523
Metadata
Show full item recordAbstract
Gene–phenotype associations play an important role in understanding the disease mechanisms which is a requirement for treatment development. A portion of gene–phenotype associations are observed mainly experimentally and made publicly available through several standard resources such as MGI. However, there is still a vast amount of gene–phenotype associations buried in the biomedical literature. Given the large amount of literature data, we need automated text mining tools to alleviate the burden in manual curation of gene–phenotype associations and to develop comprehensive resources. In this study, we present an ontology-based approach in combination with statistical methods to text mine gene–phenotype associations from the literature. Our method achieved AUC values of 0.90 and 0.75 in recovering known gene–phenotype associations from HPO and MGI respectively. We posit that candidate genes and their relevant diseases should be expressed with similar phenotypes in publications. Thus, we demonstrate the utility of our approach by predicting disease candidate genes based on the semantic similarities of phenotypes associated with genes and diseases. To the best of our knowledge, this is the first study using an ontology based approach to extract gene–phenotype associations from the literature. We evaluated our disease candidate prediction model on the gene–disease associations from MGI. Our model achieved AUC values of 0.90 and 0.87 on OMIM (human) and MGI (mouse) datasets of gene–disease associations respectively. Our manual analysis on the text mined data revealed that our method can accurately extract gene–phenotype associations which are not currently covered by the existing public gene–phenotype resources. Overall, results indicate that our method can precisely extract known as well as new gene–phenotype associations from literature. All the data and methods are available at https://github.com/bio-ontology-research-group/genepheno.Citation
Kafkas Ş, Hoehndorf R (2019) Ontology based text mining of gene-phenotype associations: application to candidate gene prediction. Database 2019. Available: http://dx.doi.org/10.1093/database/baz019.Sponsors
This work was supported by funding from King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. URF/1/3454-01-01 and FCC/1/1976-08-01.Publisher
Oxford University Press (OUP)Journal
DatabaseRelations
Is Supplemented By:- [Software]
Title: bio-ontology-research-group/genepheno: this repository contains text mined gene-phenotype data. Publication Date: 2018-10-07. github: bio-ontology-research-group/genepheno Handle: 10754/668122
ae974a485f413a2113503eed53cd6c53
10.1093/database/baz019
Scopus Count
Except where otherwise noted, this item's license is described as This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.