Althagafi, Azza Th.; Hoehndorf, Robert(Cold Spring Harbor Laboratory, 2019-01-25)[Preprint]
Background: Interpretation of personal genomics data, for example in genetic counseling, is challenging due to the complexity of the data and the amount of background knowledge required for its interpretation. This background knowledge is distributed across several databases. Further information about genomic features can also be predicted through machine learning methods. Making this information accessible more easily has the potential to improve interpretation of variants in personal genomes. Results: We have developed VSIM, a web application for the interpretation and visualization of variants in personal genome sequences. VSIM identifies disease variants related to Mendelian, complex, and digenic disease as well as pharmacogenomic variants in personal genomes and visualizes them using a web server. VSIM can further be used to simulate populations of children based on two parent genomes, and can be applied to support premarital genetic counseling. We make VSIM available as source code as well as through a container that can be installed easily in network environments in which genomic data is specially protected. VSIM and related documentation is freely available at https://github.com/bio-ontology-research-group/VSIM. Conclusions: VSIM is a software that provides a web-based interface to variant interpretation in genetic counseling. VSIM can also be used for premarital genetic screening by simulating a population of children and analyze the disorder they might be carrying.
Kafkas, Senay; Abdelhakim, Marwa; Hashish, Yasmeen; Kulmanov, Maxat; Abdellatif, Marwa; Schofield, Paul N; Hoehndorf, Robert(Cold Spring Harbor Laboratory, 2018-12-10)[Preprint]
Understanding the relationship between the pathophysiology of infectious disease, the biology of the causative agent and the development of therapeutic and diagnostic approaches is dependent on the synthesis of a wide range of types of information. Provision of a comprehensive and integrated disease phenotype knowledgebase has the potential to provide novel and orthogonal sources of information for the understanding of infectious agent pathogenesis, and support for research on disease mechanisms. We have developed PathoPhenoDB, a database containing pathogen-to-phenotype associations. PathoPhenoDB relies on manual curation of pathogen-disease relations, on ontology-based text mining as well as manual curation to associate phenotypes with infectious disease. Using Semantic Web technologies, PathoPhenoDB also links to knowledge about drug resistance mechanisms and drugs used in the treatment of infectious diseases. PathoPhenoDB is accessible at http://patho.phenomebrowser.net/, and the data is freely available through a public SPARQL endpoint.
Recent developments in machine learning have lead to a rise of large number of methods for extracting features from structured data. The features are represented as a vectors and may encode for some semantic aspects of data. They can be used in a machine learning models for different tasks or to compute similarities between the entities of the data. SPARQL is a query language for structured data originally developed for querying Resource Description Framework (RDF) data. It has been in use for over a decade as a standardized NoSQL query language. Many different tools have been developed to enable data sharing with SPARQL. For example, SPARQL endpoints make your data interoperable and available to the world. SPARQL queries can be executed across multiple endpoints. We have developed a Vec2SPARQL, which is a general framework for integrating structured data and their vector space representations. Vec2SPARQL allows jointly querying vector functions such as computing similarities (cosine, correlations) or classifications with machine learning models within a single SPARQL query. We demonstrate applications of our approach for biomedical and clinical use cases. Our source code is freely available at https://github.com/bio-ontology-research-group/vec2sparql and we make a Vec2SPARQL endpoint available at http://sparql.bio2vec.net/.
Kafkas, Senay; Hoehndorf, Robert(Cold Spring Harbor Laboratory, 2018-10-08)[Preprint]
Background: Infectious diseases claim millions of lives especially in the developing countries each year, and resistance to drugs is an emerging threat worldwide. Identification of causative pathogens accurately and rapidly plays a key role in the success of treatment. To support infectious disease research and mechanisms of infection, there is a need for an open resource on pathogen-disease associations that can be utilized in computational studies. A large number of pathogen-disease associations is available from the literature in unstructured form and we need automated methods to extract the data. Results: We developed a text mining system designed for extracting pathogen-disease relations from literature. Our approach utilizes background knowledge from an ontology and statistical methods for extracting associations between pathogens and diseases. In total, we extracted a total of 3,420 pathogen-disease associations from literature. We integrated our literature-derived associations into a database which links pathogens to their phenotypes for supporting infectious disease research. Conclusions: To the best of our knowledge, we present the first study focusing on extracting pathogen-disease associations from publications. We believe the text mined data can be utilized as a valuable resource for infectious disease research. All the data is publicly available from https://github.com/bio-ontology-research-group/padimi and through a public SPARQL endpoint from http://patho.phenomebrowser.net/.
AlShahrani, Mona; Hoehndorf, Robert(Cold Spring Harbor Laboratory, 2018-08-06)[Preprint]
Drug repurposing is the problem of finding new uses for known drugs, and may either involve finding a new protein target or a new indication for a known mechanism. Several computational methods for drug repurposing exist, and many of these methods rely on combinations of different sources of information, extract hand-crafted features and use a computational model to predict targets or indications for a drug. One of the distinguishing features between different drug repurposing systems is the selection of features. Recently, a set of novel machine learning methods have become available that can efficiently learn features from datasets, and these methods can be applied, among others, to text and structured data in knowledge graphs. We developed a novel method that combines information in literature and structured databases, and applies feature learning to generate vector space embeddings. We apply our method to the identification of drug targets and indications for known drugs based on heterogeneous information about drugs, target proteins, and diseases. We demonstrate that our method is able to combine complementary information from both structured databases and from literature, and we show that our method can compete with well-established methods for drug repurposing. Our approach is generic and can be applied to other areas in which multi-modal information is used to build predictive models.
Export search results
The export option will allow you to export the current search results of the entered query to a file. Different
formats are available for download. To export the items, click on the button corresponding with the preferred download format.
By default, clicking on the export buttons will result in a download of the allowed maximum amount of items.
For anonymous users the allowed maximum amount is 50 search results.
To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export.
The amount of items that can be exported at once is similarly restricted as the full export.
After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.