Ontology based text mining of gene-phenotype associations: application to candidate gene prediction
KAUST DepartmentComputational Bioscience Research Center (CBRC)
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Computer Science Program
Permanent link to this recordhttp://hdl.handle.net/10754/631523
MetadataShow full item record
AbstractGene–phenotype associations play an important role in understanding the disease mechanisms which is a requirement for treatment development. A portion of gene–phenotype associations are observed mainly experimentally and made publicly available through several standard resources such as MGI. However, there is still a vast amount of gene–phenotype associations buried in the biomedical literature. Given the large amount of literature data, we need automated text mining tools to alleviate the burden in manual curation of gene–phenotype associations and to develop comprehensive resources. In this study, we present an ontology-based approach in combination with statistical methods to text mine gene–phenotype associations from the literature. Our method achieved AUC values of 0.90 and 0.75 in recovering known gene–phenotype associations from HPO and MGI respectively. We posit that candidate genes and their relevant diseases should be expressed with similar phenotypes in publications. Thus, we demonstrate the utility of our approach by predicting disease candidate genes based on the semantic similarities of phenotypes associated with genes and diseases. To the best of our knowledge, this is the first study using an ontology based approach to extract gene–phenotype associations from the literature. We evaluated our disease candidate prediction model on the gene–disease associations from MGI. Our model achieved AUC values of 0.90 and 0.87 on OMIM (human) and MGI (mouse) datasets of gene–disease associations respectively. Our manual analysis on the text mined data revealed that our method can accurately extract gene–phenotype associations which are not currently covered by the existing public gene–phenotype resources. Overall, results indicate that our method can precisely extract known as well as new gene–phenotype associations from literature. All the data and methods are available at https://github.com/bio-ontology-research-group/genepheno.
CitationKafkas Ş, Hoehndorf R (2019) Ontology based text mining of gene-phenotype associations: application to candidate gene prediction. Database 2019. Available: http://dx.doi.org/10.1093/database/baz019.
SponsorsThis work was supported by funding from King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. URF/1/3454-01-01 and FCC/1/1976-08-01.
PublisherOxford University Press (OUP)
Except where otherwise noted, this item's license is described as This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.