• Vec2SPARQL: integrating SPARQL queries and knowledge graph embeddings

      Kulmanov, Maxat; Kafkas, Senay; Karwath, Andreas; Malic, Alexander; Gkoutos, Georgios; Dumontier, Michel; Hoehndorf, Robert (Cold Spring Harbor Laboratory, 2018-11-08)
      Recent developments in machine learning have lead to a rise of large number of methods for extracting features from structured data. The features are represented as a vectors and may encode for some semantic aspects of data. They can be used in a machine learning models for different tasks or to compute similarities between the entities of the data. SPARQL is a query language for structured data originally developed for querying Resource Description Framework (RDF) data. It has been in use for over a decade as a standardized NoSQL query language. Many different tools have been developed to enable data sharing with SPARQL. For example, SPARQL endpoints make your data interoperable and available to the world. SPARQL queries can be executed across multiple endpoints. We have developed a Vec2SPARQL, which is a general framework for integrating structured data and their vector space representations. Vec2SPARQL allows jointly querying vector functions such as computing similarities (cosine, correlations) or classifications with machine learning models within a single SPARQL query. We demonstrate applications of our approach for biomedical and clinical use cases. Our source code is freely available at https://github.com/bio-ontology-research-group/vec2sparql and we make a Vec2SPARQL endpoint available at http://sparql.bio2vec.net/.
    • Downlink Non-Orthogonal Multiple Access (NOMA) in Poisson Networks

      Ali, Konpal S.; Haenggi, Martin; Elsawy, Hesham; Chaaban, Anas; Alouini, Mohamed-Slim (arXiv, 2018-10-15)
      A network model is considered where Poisson distributed base stations transmit to N power-domain non-orthogonal multiple access (NOMA) users (UEs) each {that employ successive interference cancellation (SIC) for decoding}. We propose three models for the clustering of NOMA UEs and consider two different ordering techniques for the NOMA UEs: mean signal power-based and instantaneous signal-to-intercell-interference-and-noise-ratio-based. For each technique, we present a signal-to-interference-and-noise ratio analysis for the coverage of the typical UE. We plot the rate region for the two-user case and show that neither ordering technique is consistently superior to the other. We propose two efficient algorithms for finding a feasible resource allocation that maximize the cell sum rate Rtot, for general N, constrained to: 1) a minimum throughput T for each UE, 2) identical throughput for all UEs. We show the existence of: 1) an optimum N that maximizes the constrained Rtot given a set of network parameters, 2) a critical SIC level necessary for NOMA to outperform orthogonal multiple access. The results highlight the importance in choosing the network parameters N, the constraints, and the ordering technique to balance the Rtot and fairness requirements. We also show that interference-aware UE clustering can significantly improve performance.
    • Assessing Radiometric Correction Approaches for Multi-Spectral UAS Imagery for Horticultural Applications

      Tu, Yu-Hsuan; Phinn, Stuart; Johansen, Kasper; Robson, Andrew (MDPI AG, 2018-10-11)
      UAS-based multi-spectral imagery is becoming increasingly popular for the improved monitoring and managing of various horticultural crops. However, for UAS data to be used as an industry standard for assessing tree structure and condition as well as production parameters, it is imperative that the appropriate data collection and pre-processing protocols are established to enable multi-temporal comparison. There are several UAS-based radiometric correction methods commonly used for precision agricultural purposes. However, their relative accuracies have not been assessed for data acquired in complex horticultural environments. This study assessed the variations in estimated surface reflectance values of different radiometric corrections applied to multi-spectral UAS imagery acquired in both avocado and banana orchards. We found that inaccurate calibration panel measurements, inaccurate signal-to-reflectance conversion, and high variation in geometry between illumination, surface, and sensor viewing produced significant radiometric variations in at-surface reflectance estimates. Potential solutions to address these limitations included appropriate panel deployment, site-specific sensor calibration, and appropriate BRDF correction. Future UAS based horticultural crop monitoring can benefit from the proposed solutions to radiometric corrections to ensure they are using comparable image-based maps of multi-temporal biophysical properties.
    • Ontology based mining of pathogen-disease associations from literature

      Kafkas, Senay; Hoehndorf, Robert (Cold Spring Harbor Laboratory, 2018-10-08)
      Background: Infectious diseases claim millions of lives especially in the developing countries each year, and resistance to drugs is an emerging threat worldwide. Identification of causative pathogens accurately and rapidly plays a key role in the success of treatment. To support infectious disease research and mechanisms of infection, there is a need for an open resource on pathogen-disease associations that can be utilized in computational studies. A large number of pathogen-disease associations is available from the literature in unstructured form and we need automated methods to extract the data. Results: We developed a text mining system designed for extracting pathogen-disease relations from literature. Our approach utilizes background knowledge from an ontology and statistical methods for extracting associations between pathogens and diseases. In total, we extracted a total of 3,420 pathogen-disease associations from literature. We integrated our literature-derived associations into a database which links pathogens to their phenotypes for supporting infectious disease research. Conclusions: To the best of our knowledge, we present the first study focusing on extracting pathogen-disease associations from publications. We believe the text mined data can be utilized as a valuable resource for infectious disease research. All the data is publicly available from https://github.com/bio-ontology-research-group/padimi and through a public SPARQL endpoint from http://patho.phenomebrowser.net/.
    • Spectral-Efficiency - Illumination Pareto Front for Energy Harvesting Enabled VLC System

      Abdelhady, Amr Mohamed Abdelaziz; Amin, Osama; Chaaban, Anas; Shihada, Basem; Alouini, Mohamed-Slim (2018-10-07)
      The continuous improvement in optical energy harvesting devices motivates visible light communication (VLC) system developers to utilize such available free energy sources. An outdoor VLC system is considered where an optical base station sends data to multiple users that are capable of harvesting the optical energy. The proposed VLC system serves multiple users using time division multiple access (TDMA) with unequal time and power allocation, which are allocated to improve the system performance. The adopted optical system provides users with illumination and data communication services. The outdoor optical design objective is to maximize the illumination, while the communication design objective is to maximize the spectral efficiency (SE). The design objectives are shown to be conflicting, therefore, a multiobjective optimization problem is formulated to obtain the Pareto front performance curve for the proposed system. To this end, the marginal optimization problems are solved first using low complexity algorithms. Then, based on the proposed algorithms, a low complexity algorithm is developed to obtain an inner bound of the Pareto front for the illumination-SE tradeoff. The inner bound for the Pareto-front is shown to be close to the optimal Pareto-frontier via several simulation scenarios for different system parameters.
    • A multidrug resistant clinical P. aeruginosa isolate in the MLST550 clonal complex: uncoupled quorum sensing modulates the interplay of virulence and resistance

      Cao, Huiluo; Xia, Tingying; Li, Yanran; Xu, Zeling; Bougouffa, Salim; Lo, Yat Kei; Bajic, Vladimir B.; Luo, Haiwei; Woo, Patrick C. Y.; Yan, Aixin (Cold Spring Harbor Laboratory, 2018-09-12)
      Pseudomonas aeruginosa is a prevalent and pernicious pathogen equipped with both extraordinary capabilities to infect the host and to develop antimicrobials resistance (AMR). Monitoring the emergence of AMR high risk clones and understanding the interplay of their pathogenicity and antibiotic resistance is of paramount importance to avoid resistance dissemination and to control <P.aeruginosa infections. In this study, we report the identification of a multidrug resistant (MDR) P.aeruginosa strain PA154197 isolated from a blood stream infection in Hong Kong. PA154197 belongs to a distinctive MLST550 clonal complex shared by two international P.aeruginosa isolates VW0289 and AUS544. Comparative genome and transcriptome analysis with the reference strain PAO1 led to the identification of a variety of genetic variations in antibiotic resistance genes and the hyper-expression of three multidrug efflux pumps MexAB-OprM, MexEF-OprN, and MexGHI-OpmD in PA154197. Unlike many resistant isolates displaying an attenuated virulence, PA154197 produces a significantly high level of the P.aeruginosa major virulence factor pyocyanin (PYO) and displays an uncompromised virulence compared to PAO1. Further analysis revealed that the secondary quorum sensing system Pqs which primarily controls the PYO production is hyper-active in PA154197 independent of the master QS systems Las and Rhl. Together, these investigations disclose a unique, uncoupled QS mediated pathoadaptation mechanism inclinical P.aeruginosa which may account for the high pathogenic potentials and antibiotics resistance in the MDR isolate PA154197.
    • GZMA and RASGRP1 are novel tumor suppressors that counter dissemination of Theileria annulata-transformed macrophages

      Rchiad, Zineb; Haidar, Malak; Ansari, Hifzur Rahman; Tajeri, Shahin; Ben Rached, Fathia; Langsley, Gordon; Pain, Arnab (Cold Spring Harbor Laboratory, 2018-06-05)
      Theileria annulata is a tick-transmitted apicomplexan parasite that infects and transforms bovine leukocytes into disseminating tumors that cause a disease called tropical theileriosis. Using RNA sequencing we identified bovine genes, whose transcription is perturbed during Theileria-induced transformation to define the transcriptional atlas of transformed virulent versus attenuated (dampened dissemination) macrophages and transformed B cells. Dataset comparisons highlighted a small set of novel genes associated with Theileria-transformed leukocyte dissemination and the roles of Granzyme A (GZMA) and RAS guanyl-releasing protein 1 (RASGRP1) confirmed by CRISPR/Cas9-mediated down-regulation of their expression. Knockdown of both GZMA and RASGRP1 in attenuated macrophages led to a regain in their dissemination in Rag2/γC mice confirming in vivo both GZMA and RASGRP1 as novel dissemination suppressors.
    • Physical and transcriptional organisation of the bread wheat intracellular immune receptor repertoire

      Steuernagel, Burkhard; Witek, Kamil; Krattinger, Simon G.; Ramirez-Gonzalez, Ricardo H.; Schoonbeek, Henk-jan; Yu, Guotai; Baggs, Erin; Witek, Agnieszka; Yadav, Inderjit; Krasileva, Ksenia V.; Jones, Jonathan D. G.; Uauy, Cristobal; Keller, Beat; Ridout, Christopher J.; Wulff, Brande; The International Wheat Genome Sequencing Consortium (Cold Spring Harbor Laboratory, 2018-06-05)
      Disease resistance genes encoding intracellular immune receptors of the nucleotide-binding and leucine-rich repeat (NLR) class of proteins detect pathogens by the presence of pathogen effectors. Plant genomes typically contain hundreds of NLR encoding genes. The availability of the hexaploid wheat cultivar Chinese Spring reference genome now allows a detailed study of its NLR complement. However, low NLR expression as well as high intra-family sequence homology hinders their accurate gene annotation. Here we developed NLR-Annotator for in silico NLR identification independent of transcript support. Although developed for wheat, we demonstrate the universal applicability of NLR-Annotator across diverse plant taxa. Applying our tool to wheat and combining it with a transcript-validated subset of genes from the reference gene annotation, we characterized the structure, phylogeny and expression profile of the NLR gene family. We detected 3,400 full-length NLR loci of which 1,540 were confirmed as complete genes. NLRs with integrated domains mostly group in specific sub-clades. Members of another subclade predominantly locate in close physical proximity to NLRs carrying integrated domains suggesting a paired helper-function. Most NLRs (88%) display low basal expression (in the lower 10 percentile of transcripts), which may be tissue-specific and/or induced by biotic stress. As a case study for applying our tool to the positional cloning of resistance genes, we estimated the number of NLR genes within the intervals of mapped rust resistance genes. Our study will support the identification of functional resistance genes in wheat to accelerate the breeding and engineering of disease resistant varieties.
    • Drug repurposing through joint learning on knowledge graphs and literature

      AlShahrani, Mona; Hoehndorf, Robert (Cold Spring Harbor Laboratory, 2018-08-06)
      Drug repurposing is the problem of finding new uses for known drugs, and may either involve finding a new protein target or a new indication for a known mechanism. Several computational methods for drug repurposing exist, and many of these methods rely on combinations of different sources of information, extract hand-crafted features and use a computational model to predict targets or indications for a drug. One of the distinguishing features between different drug repurposing systems is the selection of features. Recently, a set of novel machine learning methods have become available that can efficiently learn features from datasets, and these methods can be applied, among others, to text and structured data in knowledge graphs. We developed a novel method that combines information in literature and structured databases, and applies feature learning to generate vector space embeddings. We apply our method to the identification of drug targets and indications for known drugs based on heterogeneous information about drugs, target proteins, and diseases. We demonstrate that our method is able to combine complementary information from both structured databases and from literature, and we show that our method can compete with well-established methods for drug repurposing. Our approach is generic and can be applied to other areas in which multi-modal information is used to build predictive models.
    • A fast and cost-effective microsampling protocol incorporating reduced animal usage for time-series transcriptomics in rodent malaria parasites

      Ramaprasad, Abhinay; Subudhi, Amit Kumar; Culleton, Richard; Pain, Arnab (Cold Spring Harbor Laboratory, 2018-06-21)
      The transcriptional regulation occurring in malaria parasites during the clinically important life stages within host erythrocytes can be studied in vivo with rodent malaria parasites propagated in mice. Time-series transcriptome profiling commonly involves the euthanasia of groups of mice at specific time points followed by the extraction of parasite RNA from whole blood samples. Current methodologies for parasite RNA extraction involve several steps and when multiple time points are profiled, these protocols are laborious, time consuming, and require the euthanisation of large cohorts of mice. We designed a simplified protocol for parasite RNA extraction from blood volumes as low as 20 microliters (microsamples), serially bled from mice via tail snips and directly lysed with TRIzol reagent. Gene expression data derived from microsampling using RNA-seq were closely matched to those derived from larger volumes of leucocyte-depleted and saponin-treated blood obtained from euthanized mice and also tightly correlated between biological replicates. Transcriptome profiling of microsamples taken at different time points during the intra-erythrocytic developmental cycle of the rodent malaria parasite Plasmodium vinckei revealed the transcriptional cascade commonly observed in malaria parasites. Microsampling is a quick, robust and cost-efficient approach to sample collection for in vivo time-series transcriptomic studies in rodent malaria parasites.
    • Communicating Using Spatial Mode Multiplexing: Potentials, Challenges and Perspectives

      Trichili, Abderrahmen; Park, Ki-Hong; Zghal, Mouard; Ooi, Boon S.; Alouini, Mohamed-Slim (2018-08)
      Time, polarization, and wavelength multiplexing schemes have been used to satisfy the growing need of transmission capacity. Using space as a new dimension for communication systems has been recently suggested as a versatile technique to address future bandwidth issues. We review the potentials of harnessing the space as an additional degree of freedom for communication applications including free space optics, optical fiber installation, underwater wireless optical links, on-chip interconnects, data center indoor connections, radio frequency and acoustic communications. We focus on the orbital angular momentum (OAM) modes and equally identify the challenges related to each of the applications of spatial modes and the particular OAM modes in communication. Finally, we discuss the perspectives of this emerging technology.
    • Modeling of Viral Aerosol Transmission and Detection

      Khalid, Maryam; Amin, Osama; Ahmed, Sajid; Alouini, Mohamed-Slim (2018)
      The objective of this work is to investigate the spread mechanism of diseases in the atmosphere as an engineering problem. Among the viral transmission mechanisms that do not include physical contact, aerosol transmission is the most significant mode of transmission where virus-laden droplets are carried over long distances by wind. In this work, we focus on aerosol transmission of virus and introduce the idea of viewing virus transmission through aerosols and their transport as a molecular communication problem, where one has no control over transmission source but a robust receiver can be designed using nano-biosensors. To investigate this idea, a complete system is presented and end-toend mathematical model for the aerosol transmission channel is derived under certain constraints and boundary conditions. In addition to transmitter and channel, a receiver architecture composed of air sampler and Silicon Nanowire field effect transistor is also discussed. Furthermore, a detection problem is formulated for which maximum likelihood decision rule and the corresponding missed detection probability is discussed. At the end, simulation results are presented to investigate the parameters that affect the performance and justify the feasibility of proposed setup in related applications.
    • Spatial Poisson Processes for Fatigue Crack Initiation

      Babuska, Ivo; Sawlan, Zaid A; Scavino, Marco; Szabó, Barna; Tempone, Raul (2018-05-09)
    • OligoPVP: Phenotype-driven analysis of individual genomic information to prioritize oligogenic disease variants

      Boudellioua, Imene; Kulmanov, Maxat; Schofield, Paul N; Gkoutos, Georgios V; Hoehndorf, Robert (Cold Spring Harbor Laboratory, 2018-05-02)
      Purpose: An increasing number of Mendelian disorders have been identified for which two or more variants in one or more genes are required to cause the disease, or significantly modify its severity or phenotype. It is difficult to discover such interactions using existing approaches. The purpose of our work is to develop and evaluate a system that can identify combinations of variants underlying oligogenic diseases in individual whole exome or whole genome sequences. Methods: Information that links patient phenotypes to databases of gene-phenotype associations observed in clinical research can provide useful information and improve variant prioritization for Mendelian diseases. Additionally, background knowledge about interactions between genes can be utilized to guide and restrict the selection of candidate disease modules. Results: We developed OligoPVP, an algorithm that can be used to identify variants in oligogenic diseases and their interactions, using whole exome or whole genome sequences together with patient phenotypes as input. We demonstrate that OligoPVP has significantly improved performance when compared to state of the art pathogenicity detection methods. Conclusions: Our results show that OligoPVP can efficiently detect oligogenic interactions using a phenotype-driven approach and identify etiologically important variants in whole genomes.
    • DeepPVP: phenotype-based prioritization of causative variants using deep learning

      Boudellioua, Imene; Kulmanov, Maxat; Schofield, Paul N; Gkoutos, Georgios V; Hoehndorf, Robert (Cold Spring Harbor Laboratory, 2018-05-02)
      Background: Prioritization of variants in personal genomic data is a major challenge. Recently, computational methods that rely on comparing phenotype similarity have shown to be useful to identify causative variants. In these methods, pathogenicity prediction is combined with a semantic similarity measure to prioritize not only variants that are likely to be dysfunctional but those that are likely involved in the pathogenesis of a patient's phenotype. Results: We have developed DeepPVP, a variant prioritization method that combined automated inference with deep neural networks to identify the likely causative variants in whole exome or whole genome sequence data. We demonstrate that DeepPVP performs significantly better than existing methods, including phenotype-based methods that use similar features. DeepPVP is freely available at https://github.com/bio-ontology-research-group/phenomenet-vp Conclusions: DeepPVP further improves on existing variant prioritization methods both in terms of speed as well as accuracy.
    • Semantic Disease Gene Embeddings (SmuDGE): phenotype-based disease gene prioritization without phenotypes

      AlShahrani, Mona; Hoehndorf, Robert (Cold Spring Harbor Laboratory, 2018-04-30)
      In the past years, several methods have been developed to incorporate information about phenotypes into computational disease gene prioritization methods. These methods commonly compute the similarity between a disease's (or patient's) phenotypes and a database of gene-to-phenotype associations to find the phenotypically most similar match. A key limitation of these methods is their reliance on knowledge about phenotypes associated with particular genes which is highly incomplete in humans as well as in many model organisms such as the mouse. Results: We developed SmuDGE, a method that uses feature learning to generate vector-based representations of phenotypes associated with an entity. SmuDGE can be used as a trainable semantic similarity measure to compare two sets of phenotypes (such as between a disease and gene, or a disease and patient). More importantly, SmuDGE can generate phenotype representations for entities that are only indirectly associated with phenotypes through an interaction network; for this purpose, SmuDGE exploits background knowledge in interaction networks comprising of multiple types of interactions. We demonstrate that SmuDGE can match or outperform semantic similarity in phenotype-based disease gene prioritization, and furthermore significantly extends the coverage of phenotype-based methods to all genes in a connected interaction network.
    • A Matrix Splitting Method for Composite Function Minimization

      Yuan, Ganzhao; Zheng, Wei-Shi; Ghanem, Bernard (arXiv, 2016-12-07)
      Composite function minimization captures a wide spectrum of applications in both computer vision and machine learning. It includes bound constrained optimization and cardinality regularized optimization as special cases. This paper proposes and analyzes a new Matrix Splitting Method (MSM) for minimizing composite functions. It can be viewed as a generalization of the classical Gauss-Seidel method and the Successive Over-Relaxation method for solving linear systems in the literature. Incorporating a new Gaussian elimination procedure, the matrix splitting method achieves state-of-the-art performance. For convex problems, we establish the global convergence, convergence rate, and iteration complexity of MSM, while for non-convex problems, we prove its global convergence. Finally, we validate the performance of our matrix splitting method on two particular applications: nonnegative matrix factorization and cardinality regularized sparse coding. Extensive experiments show that our method outperforms existing composite function minimization techniques in term of both efficiency and efficacy.
    • Modeling soil organic carbon with Quantile Regression: Dissecting predictors' effects on carbon stocks

      Lombardo, Luigi; Saia, Sergio; Schillaci, Calogero; Mai, Paul Martin; Huser, Raphaël (arXiv, 2017-08-13)
      Soil Organic Carbon (SOC) estimation is crucial to manage both natural and anthropic ecosystems and has recently been put under the magnifying glass after the Paris agreement 2016 due to its relationship with greenhouse gas. Statistical applications have dominated the SOC stock mapping at regional scale so far. However, the community has hardly ever attempted to implement Quantile Regression (QR) to spatially predict the SOC distribution. In this contribution, we test QR to estimate SOC stock (0-30 $cm$ depth) in the agricultural areas of a highly variable semi-arid region (Sicily, Italy, around 25,000 $km2$) by using topographic and remotely sensed predictors. We also compare the results with those from available SOC stock measurement. The QR models produced robust performances and allowed to recognize dominant effects among the predictors with respect to the considered quantile. This information, currently lacking, suggests that QR can discern predictor influences on SOC stock at specific sub-domains of each predictors. In this work, the predictive map generated at the median shows lower errors than those of the Joint Research Centre and International Soil Reference, and Information Centre benchmarks. The results suggest the use of QR as a comprehensive and effective method to map SOC using legacy data in agro-ecosystems. The R code scripted in this study for QR is included.
    • Parameters and Fractional Differentiation Orders Estimation for Linear Continuous-Time Non-Commensurate Fractional Order Systems

      Belkhatir, Zehor; Laleg-Kirati, Taous-Meriem (Submitted to Elsevier, 2017-05-31)
      This paper proposes a two-stage estimation algorithm to solve the problem of joint estimation of the parameters and the fractional differentiation orders of a linear continuous-time fractional system with non-commensurate orders. The proposed algorithm combines the modulating functions and the first-order Newton methods. Sufficient conditions ensuring the convergence of the method are provided. An error analysis in the discrete case is performed. Moreover, the method is extended to the joint estimation of smooth unknown input and fractional differentiation orders. The performance of the proposed approach is illustrated with different numerical examples. Furthermore, a potential application of the algorithm is proposed which consists in the estimation of the differentiation orders of a fractional neurovascular model along with the neural activity considered as input for this model.
    • Error Probability Analysis of Hardware Impaired Systems with Asymmetric Transmission

      Javed, Sidrah; Amin, Osama; Ikki, Salama S.; Alouini, Mohamed-Slim (2018-04-26)
      Error probability study of the hardware impaired (HWI) systems highly depends on the adopted model. Recent models have proved that the aggregate noise is equivalent to improper Gaussian signals. Therefore, considering the distinct noise nature and self-interfering (SI) signals, an optimal maximum likelihood (ML) receiver is derived. This renders the conventional minimum Euclidean distance (MED) receiver as a sub-optimal receiver because it is based on the assumptions of ideal hardware transceivers and proper Gaussian noise in communication systems. Next, the average error probability performance of the proposed optimal ML receiver is analyzed and tight bounds and approximations are derived for various adopted systems including transmitter and receiver I/Q imbalanced systems with or without transmitter distortions as well as transmitter or receiver only impaired systems. Motivated by recent studies that shed the light on the benefit of improper Gaussian signaling in mitigating the HWIs, asymmetric quadrature amplitude modulation or phase shift keying is optimized and adapted for transmission. Finally, different numerical and simulation results are presented to support the superiority of the proposed ML receiver over MED receiver, the tightness of the derived bounds and effectiveness of asymmetric transmission in dampening HWIs and improving overall system performance