Recent Submissions

  • Vec2SPARQL: integrating SPARQL queries and knowledge graph embeddings

    Kulmanov, Maxat; Kafkas, Senay; Karwath, Andreas; Malic, Alexander; Gkoutos, Georgios; Dumontier, Michel; Hoehndorf, Robert (Cold Spring Harbor Laboratory, 2018-11-08)
    Recent developments in machine learning have lead to a rise of large number of methods for extracting features from structured data. The features are represented as a vectors and may encode for some semantic aspects of data. They can be used in a machine learning models for different tasks or to compute similarities between the entities of the data. SPARQL is a query language for structured data originally developed for querying Resource Description Framework (RDF) data. It has been in use for over a decade as a standardized NoSQL query language. Many different tools have been developed to enable data sharing with SPARQL. For example, SPARQL endpoints make your data interoperable and available to the world. SPARQL queries can be executed across multiple endpoints. We have developed a Vec2SPARQL, which is a general framework for integrating structured data and their vector space representations. Vec2SPARQL allows jointly querying vector functions such as computing similarities (cosine, correlations) or classifications with machine learning models within a single SPARQL query. We demonstrate applications of our approach for biomedical and clinical use cases. Our source code is freely available at and we make a Vec2SPARQL endpoint available at
  • Downlink Non-Orthogonal Multiple Access (NOMA) in Poisson Networks

    Ali, Konpal S.; Haenggi, Martin; Elsawy, Hesham; Chaaban, Anas; Alouini, Mohamed-Slim (arXiv, 2018-10-15)
    A network model is considered where Poisson distributed base stations transmit to N power-domain non-orthogonal multiple access (NOMA) users (UEs) each {that employ successive interference cancellation (SIC) for decoding}. We propose three models for the clustering of NOMA UEs and consider two different ordering techniques for the NOMA UEs: mean signal power-based and instantaneous signal-to-intercell-interference-and-noise-ratio-based. For each technique, we present a signal-to-interference-and-noise ratio analysis for the coverage of the typical UE. We plot the rate region for the two-user case and show that neither ordering technique is consistently superior to the other. We propose two efficient algorithms for finding a feasible resource allocation that maximize the cell sum rate Rtot, for general N, constrained to: 1) a minimum throughput T for each UE, 2) identical throughput for all UEs. We show the existence of: 1) an optimum N that maximizes the constrained Rtot given a set of network parameters, 2) a critical SIC level necessary for NOMA to outperform orthogonal multiple access. The results highlight the importance in choosing the network parameters N, the constraints, and the ordering technique to balance the Rtot and fairness requirements. We also show that interference-aware UE clustering can significantly improve performance.
  • Assessing Radiometric Correction Approaches for Multi-Spectral UAS Imagery for Horticultural Applications

    Tu, Yu-Hsuan; Phinn, Stuart; Johansen, Kasper; Robson, Andrew (MDPI AG, 2018-10-11)
    UAS-based multi-spectral imagery is becoming increasingly popular for the improved monitoring and managing of various horticultural crops. However, for UAS data to be used as an industry standard for assessing tree structure and condition as well as production parameters, it is imperative that the appropriate data collection and pre-processing protocols are established to enable multi-temporal comparison. There are several UAS-based radiometric correction methods commonly used for precision agricultural purposes. However, their relative accuracies have not been assessed for data acquired in complex horticultural environments. This study assessed the variations in estimated surface reflectance values of different radiometric corrections applied to multi-spectral UAS imagery acquired in both avocado and banana orchards. We found that inaccurate calibration panel measurements, inaccurate signal-to-reflectance conversion, and high variation in geometry between illumination, surface, and sensor viewing produced significant radiometric variations in at-surface reflectance estimates. Potential solutions to address these limitations included appropriate panel deployment, site-specific sensor calibration, and appropriate BRDF correction. Future UAS based horticultural crop monitoring can benefit from the proposed solutions to radiometric corrections to ensure they are using comparable image-based maps of multi-temporal biophysical properties.
  • Ontology based mining of pathogen-disease associations from literature

    Kafkas, Senay; Hoehndorf, Robert (Cold Spring Harbor Laboratory, 2018-10-08)
    Background: Infectious diseases claim millions of lives especially in the developing countries each year, and resistance to drugs is an emerging threat worldwide. Identification of causative pathogens accurately and rapidly plays a key role in the success of treatment. To support infectious disease research and mechanisms of infection, there is a need for an open resource on pathogen-disease associations that can be utilized in computational studies. A large number of pathogen-disease associations is available from the literature in unstructured form and we need automated methods to extract the data. Results: We developed a text mining system designed for extracting pathogen-disease relations from literature. Our approach utilizes background knowledge from an ontology and statistical methods for extracting associations between pathogens and diseases. In total, we extracted a total of 3,420 pathogen-disease associations from literature. We integrated our literature-derived associations into a database which links pathogens to their phenotypes for supporting infectious disease research. Conclusions: To the best of our knowledge, we present the first study focusing on extracting pathogen-disease associations from publications. We believe the text mined data can be utilized as a valuable resource for infectious disease research. All the data is publicly available from and through a public SPARQL endpoint from
  • Spectral-Efficiency - Illumination Pareto Front for Energy Harvesting Enabled VLC System

    Abdelhady, Amr Mohamed Abdelaziz; Amin, Osama; Chaaban, Anas; Shihada, Basem; Alouini, Mohamed-Slim (2018-10-07)
    The continuous improvement in optical energy harvesting devices motivates visible light communication (VLC) system developers to utilize such available free energy sources. An outdoor VLC system is considered where an optical base station sends data to multiple users that are capable of harvesting the optical energy. The proposed VLC system serves multiple users using time division multiple access (TDMA) with unequal time and power allocation, which are allocated to improve the system performance. The adopted optical system provides users with illumination and data communication services. The outdoor optical design objective is to maximize the illumination, while the communication design objective is to maximize the spectral efficiency (SE). The design objectives are shown to be conflicting, therefore, a multiobjective optimization problem is formulated to obtain the Pareto front performance curve for the proposed system. To this end, the marginal optimization problems are solved first using low complexity algorithms. Then, based on the proposed algorithms, a low complexity algorithm is developed to obtain an inner bound of the Pareto front for the illumination-SE tradeoff. The inner bound for the Pareto-front is shown to be close to the optimal Pareto-frontier via several simulation scenarios for different system parameters.
  • A multidrug resistant clinical P. aeruginosa isolate in the MLST550 clonal complex: uncoupled quorum sensing modulates the interplay of virulence and resistance

    Cao, Huiluo; Xia, Tingying; Li, Yanran; Xu, Zeling; Bougouffa, Salim; Lo, Yat Kei; Bajic, Vladimir B.; Luo, Haiwei; Woo, Patrick C. Y.; Yan, Aixin (Cold Spring Harbor Laboratory, 2018-09-12)
    Pseudomonas aeruginosa is a prevalent and pernicious pathogen equipped with both extraordinary capabilities to infect the host and to develop antimicrobials resistance (AMR). Monitoring the emergence of AMR high risk clones and understanding the interplay of their pathogenicity and antibiotic resistance is of paramount importance to avoid resistance dissemination and to control <P.aeruginosa infections. In this study, we report the identification of a multidrug resistant (MDR) P.aeruginosa strain PA154197 isolated from a blood stream infection in Hong Kong. PA154197 belongs to a distinctive MLST550 clonal complex shared by two international P.aeruginosa isolates VW0289 and AUS544. Comparative genome and transcriptome analysis with the reference strain PAO1 led to the identification of a variety of genetic variations in antibiotic resistance genes and the hyper-expression of three multidrug efflux pumps MexAB-OprM, MexEF-OprN, and MexGHI-OpmD in PA154197. Unlike many resistant isolates displaying an attenuated virulence, PA154197 produces a significantly high level of the P.aeruginosa major virulence factor pyocyanin (PYO) and displays an uncompromised virulence compared to PAO1. Further analysis revealed that the secondary quorum sensing system Pqs which primarily controls the PYO production is hyper-active in PA154197 independent of the master QS systems Las and Rhl. Together, these investigations disclose a unique, uncoupled QS mediated pathoadaptation mechanism inclinical P.aeruginosa which may account for the high pathogenic potentials and antibiotics resistance in the MDR isolate PA154197.
  • Drug repurposing through joint learning on knowledge graphs and literature

    AlShahrani, Mona; Hoehndorf, Robert (Cold Spring Harbor Laboratory, 2018-08-06)
    Drug repurposing is the problem of finding new uses for known drugs, and may either involve finding a new protein target or a new indication for a known mechanism. Several computational methods for drug repurposing exist, and many of these methods rely on combinations of different sources of information, extract hand-crafted features and use a computational model to predict targets or indications for a drug. One of the distinguishing features between different drug repurposing systems is the selection of features. Recently, a set of novel machine learning methods have become available that can efficiently learn features from datasets, and these methods can be applied, among others, to text and structured data in knowledge graphs. We developed a novel method that combines information in literature and structured databases, and applies feature learning to generate vector space embeddings. We apply our method to the identification of drug targets and indications for known drugs based on heterogeneous information about drugs, target proteins, and diseases. We demonstrate that our method is able to combine complementary information from both structured databases and from literature, and we show that our method can compete with well-established methods for drug repurposing. Our approach is generic and can be applied to other areas in which multi-modal information is used to build predictive models.
  • Communicating Using Spatial Mode Multiplexing: Potentials, Challenges and Perspectives

    Trichili, Abderrahmen; Park, Ki-Hong; Zghal, Mouard; Ooi, Boon S.; Alouini, Mohamed-Slim (2018-08)
    Time, polarization, and wavelength multiplexing schemes have been used to satisfy the growing need of transmission capacity. Using space as a new dimension for communication systems has been recently suggested as a versatile technique to address future bandwidth issues. We review the potentials of harnessing the space as an additional degree of freedom for communication applications including free space optics, optical fiber installation, underwater wireless optical links, on-chip interconnects, data center indoor connections, radio frequency and acoustic communications. We focus on the orbital angular momentum (OAM) modes and equally identify the challenges related to each of the applications of spatial modes and the particular OAM modes in communication. Finally, we discuss the perspectives of this emerging technology.
  • A fast and cost-effective microsampling protocol incorporating reduced animal usage for time-series transcriptomics in rodent malaria parasites

    Ramaprasad, Abhinay; Subudhi, Amit Kumar; Culleton, Richard; Pain, Arnab (Cold Spring Harbor Laboratory, 2018-06-21)
    The transcriptional regulation occurring in malaria parasites during the clinically important life stages within host erythrocytes can be studied in vivo with rodent malaria parasites propagated in mice. Time-series transcriptome profiling commonly involves the euthanasia of groups of mice at specific time points followed by the extraction of parasite RNA from whole blood samples. Current methodologies for parasite RNA extraction involve several steps and when multiple time points are profiled, these protocols are laborious, time consuming, and require the euthanisation of large cohorts of mice. We designed a simplified protocol for parasite RNA extraction from blood volumes as low as 20 microliters (microsamples), serially bled from mice via tail snips and directly lysed with TRIzol reagent. Gene expression data derived from microsampling using RNA-seq were closely matched to those derived from larger volumes of leucocyte-depleted and saponin-treated blood obtained from euthanized mice and also tightly correlated between biological replicates. Transcriptome profiling of microsamples taken at different time points during the intra-erythrocytic developmental cycle of the rodent malaria parasite Plasmodium vinckei revealed the transcriptional cascade commonly observed in malaria parasites. Microsampling is a quick, robust and cost-efficient approach to sample collection for in vivo time-series transcriptomic studies in rodent malaria parasites.
  • GZMA and RASGRP1 are novel tumor suppressors that counter dissemination of Theileria annulata-transformed macrophages

    Rchiad, Zineb; Haidar, Malak; Ansari, Hifzur Rahman; Tajeri, Shahin; Ben Rached, Fathia; Langsley, Gordon; Pain, Arnab (Cold Spring Harbor Laboratory, 2018-06-05)
    Theileria annulata is a tick-transmitted apicomplexan parasite that infects and transforms bovine leukocytes into disseminating tumors that cause a disease called tropical theileriosis. Using RNA sequencing we identified bovine genes, whose transcription is perturbed during Theileria-induced transformation to define the transcriptional atlas of transformed virulent versus attenuated (dampened dissemination) macrophages and transformed B cells. Dataset comparisons highlighted a small set of novel genes associated with Theileria-transformed leukocyte dissemination and the roles of Granzyme A (GZMA) and RAS guanyl-releasing protein 1 (RASGRP1) confirmed by CRISPR/Cas9-mediated down-regulation of their expression. Knockdown of both GZMA and RASGRP1 in attenuated macrophages led to a regain in their dissemination in Rag2/γC mice confirming in vivo both GZMA and RASGRP1 as novel dissemination suppressors.
  • Physical and transcriptional organisation of the bread wheat intracellular immune receptor repertoire

    Steuernagel, Burkhard; Witek, Kamil; Krattinger, Simon G.; Ramirez-Gonzalez, Ricardo H.; Schoonbeek, Henk-jan; Yu, Guotai; Baggs, Erin; Witek, Agnieszka; Yadav, Inderjit; Krasileva, Ksenia V.; Jones, Jonathan D. G.; Uauy, Cristobal; Keller, Beat; Ridout, Christopher J.; Wulff, Brande; The International Wheat Genome Sequencing Consortium (Cold Spring Harbor Laboratory, 2018-06-05)
    Disease resistance genes encoding intracellular immune receptors of the nucleotide-binding and leucine-rich repeat (NLR) class of proteins detect pathogens by the presence of pathogen effectors. Plant genomes typically contain hundreds of NLR encoding genes. The availability of the hexaploid wheat cultivar Chinese Spring reference genome now allows a detailed study of its NLR complement. However, low NLR expression as well as high intra-family sequence homology hinders their accurate gene annotation. Here we developed NLR-Annotator for in silico NLR identification independent of transcript support. Although developed for wheat, we demonstrate the universal applicability of NLR-Annotator across diverse plant taxa. Applying our tool to wheat and combining it with a transcript-validated subset of genes from the reference gene annotation, we characterized the structure, phylogeny and expression profile of the NLR gene family. We detected 3,400 full-length NLR loci of which 1,540 were confirmed as complete genes. NLRs with integrated domains mostly group in specific sub-clades. Members of another subclade predominantly locate in close physical proximity to NLRs carrying integrated domains suggesting a paired helper-function. Most NLRs (88%) display low basal expression (in the lower 10 percentile of transcripts), which may be tissue-specific and/or induced by biotic stress. As a case study for applying our tool to the positional cloning of resistance genes, we estimated the number of NLR genes within the intervals of mapped rust resistance genes. Our study will support the identification of functional resistance genes in wheat to accelerate the breeding and engineering of disease resistant varieties.
  • Spatial Poisson Processes for Fatigue Crack Initiation

    Babuska, Ivo; Sawlan, Zaid A; Scavino, Marco; Szabó, Barna; Tempone, Raul (2018-05-09)
  • DeepPVP: phenotype-based prioritization of causative variants using deep learning

    Boudellioua, Imene; Kulmanov, Maxat; Schofield, Paul N; Gkoutos, Georgios V; Hoehndorf, Robert (Cold Spring Harbor Laboratory, 2018-05-02)
    Background: Prioritization of variants in personal genomic data is a major challenge. Recently, computational methods that rely on comparing phenotype similarity have shown to be useful to identify causative variants. In these methods, pathogenicity prediction is combined with a semantic similarity measure to prioritize not only variants that are likely to be dysfunctional but those that are likely involved in the pathogenesis of a patient's phenotype. Results: We have developed DeepPVP, a variant prioritization method that combined automated inference with deep neural networks to identify the likely causative variants in whole exome or whole genome sequence data. We demonstrate that DeepPVP performs significantly better than existing methods, including phenotype-based methods that use similar features. DeepPVP is freely available at Conclusions: DeepPVP further improves on existing variant prioritization methods both in terms of speed as well as accuracy.
  • OligoPVP: Phenotype-driven analysis of individual genomic information to prioritize oligogenic disease variants

    Boudellioua, Imene; Kulmanov, Maxat; Schofield, Paul N; Gkoutos, Georgios V; Hoehndorf, Robert (Cold Spring Harbor Laboratory, 2018-05-02)
    Purpose: An increasing number of Mendelian disorders have been identified for which two or more variants in one or more genes are required to cause the disease, or significantly modify its severity or phenotype. It is difficult to discover such interactions using existing approaches. The purpose of our work is to develop and evaluate a system that can identify combinations of variants underlying oligogenic diseases in individual whole exome or whole genome sequences. Methods: Information that links patient phenotypes to databases of gene-phenotype associations observed in clinical research can provide useful information and improve variant prioritization for Mendelian diseases. Additionally, background knowledge about interactions between genes can be utilized to guide and restrict the selection of candidate disease modules. Results: We developed OligoPVP, an algorithm that can be used to identify variants in oligogenic diseases and their interactions, using whole exome or whole genome sequences together with patient phenotypes as input. We demonstrate that OligoPVP has significantly improved performance when compared to state of the art pathogenicity detection methods. Conclusions: Our results show that OligoPVP can efficiently detect oligogenic interactions using a phenotype-driven approach and identify etiologically important variants in whole genomes.
  • Semantic Disease Gene Embeddings (SmuDGE): phenotype-based disease gene prioritization without phenotypes

    AlShahrani, Mona; Hoehndorf, Robert (Cold Spring Harbor Laboratory, 2018-04-30)
    In the past years, several methods have been developed to incorporate information about phenotypes into computational disease gene prioritization methods. These methods commonly compute the similarity between a disease's (or patient's) phenotypes and a database of gene-to-phenotype associations to find the phenotypically most similar match. A key limitation of these methods is their reliance on knowledge about phenotypes associated with particular genes which is highly incomplete in humans as well as in many model organisms such as the mouse. Results: We developed SmuDGE, a method that uses feature learning to generate vector-based representations of phenotypes associated with an entity. SmuDGE can be used as a trainable semantic similarity measure to compare two sets of phenotypes (such as between a disease and gene, or a disease and patient). More importantly, SmuDGE can generate phenotype representations for entities that are only indirectly associated with phenotypes through an interaction network; for this purpose, SmuDGE exploits background knowledge in interaction networks comprising of multiple types of interactions. We demonstrate that SmuDGE can match or outperform semantic similarity in phenotype-based disease gene prioritization, and furthermore significantly extends the coverage of phenotype-based methods to all genes in a connected interaction network.
  • Error Probability Analysis of Hardware Impaired Systems with Asymmetric Transmission

    Javed, Sidrah; Amin, Osama; Ikki, Salama S.; Alouini, Mohamed-Slim (2018-04-26)
    Error probability study of the hardware impaired (HWI) systems highly depends on the adopted model. Recent models have proved that the aggregate noise is equivalent to improper Gaussian signals. Therefore, considering the distinct noise nature and self-interfering (SI) signals, an optimal maximum likelihood (ML) receiver is derived. This renders the conventional minimum Euclidean distance (MED) receiver as a sub-optimal receiver because it is based on the assumptions of ideal hardware transceivers and proper Gaussian noise in communication systems. Next, the average error probability performance of the proposed optimal ML receiver is analyzed and tight bounds and approximations are derived for various adopted systems including transmitter and receiver I/Q imbalanced systems with or without transmitter distortions as well as transmitter or receiver only impaired systems. Motivated by recent studies that shed the light on the benefit of improper Gaussian signaling in mitigating the HWIs, asymmetric quadrature amplitude modulation or phase shift keying is optimized and adapted for transmission. Finally, different numerical and simulation results are presented to support the superiority of the proposed ML receiver over MED receiver, the tightness of the derived bounds and effectiveness of asymmetric transmission in dampening HWIs and improving overall system performance
  • Existence of weak solutions to first-order stationary mean-field games with Dirichlet conditions

    Ferreira, Rita; Gomes, Diogo A.; Tada, Teruo (arXiv, 2018-04-19)
    In this paper, we study first-order stationary monotone mean-field games (MFGs) with Dirichlet boundary conditions. While for Hamilton--Jacobi equations Dirichlet conditions may not be satisfied, here, we establish the existence of solutions of MFGs that satisfy those conditions. To construct these solutions, we introduce a monotone regularized problem. Applying Schaefer's fixed-point theorem and using the monotonicity of the MFG, we verify that there exists a unique weak solution to the regularized problem. Finally, we take the limit of the solutions of the regularized problem and using Minty's method, we show the existence of weak solutions to the original MFG.
  • Current Controlled Magnetization Switching in Cylindrical Nanowires for High-Density 3D Memory Applications

    Mohammed, Hanan; Corte-León, Hector; Ivanov, Yurii P.; Lopatin, Sergei; Moreno, Julian A.; Chuvilin, Andrey; Salimath, Akshaykumar; Manchon, Aurelien; Kazakova, Olga; Kosel, Jürgen (arXiv, 2018-04-18)
    A next-generation memory device utilizing a three-dimensional nanowire system requires the reliable control of domain wall motion. In this letter, domain walls are studied in cylindrical nanowires consisting of alternating segments of cobalt and nickel. The material interfaces acting as domain wall pinning sites, are utilized in combination with current pulses, to control the position of the domain wall, which is monitored using magnetoresistance measurements. Magnetic force microscopy results further confirm the occurrence of current assisted domain wall depinning. Data bits are therefore shifted along the nanowire by sequentially pinning and depinning a domain wall between successive interfaces, a requirement necessary for race-track type memory devices. We demonstrate that the direction, amplitude and duration of the applied current pulses determine the propagation of the domain wall across pinning sites. These results demonstrate a multi-bit cylindrical nanowire device, utilizing current assisted data manipulation. The prospect of sequential pinning and depinning in these nanowires allows the bit density to increase by several Tbs, depending on the number of segments within these nanowires.
  • Using Multi-Spectral UAV Imagery to Extract Tree Crop Structural Properties and Assess Pruning Effects

    Johansen, Kasper; Raharjo, Tri; McCabe, Matthew (MDPI AG, 2018-04-18)
    Unmanned aerial vehicles (UAV) provide an unprecedented capacity to monitor the development and dynamics of tree growth and structure through time. It is generally thought that the pruning of tree crops encourages new growth, has a positive effect on fruiting, makes fruit-picking easier, and may increase yield, as it increases light interception and tree crown surface area. To establish the response of pruning in an orchard of lychee trees, an assessment of changes in tree structure, i.e. tree crown perimeter, width, height, area and Plant Projective Cover (PPC), was undertaken using multi-spectral UAV imagery collected before and after a pruning event. While tree crown perimeter, width and area could be derived directly from the delineated tree crowns, height was estimated from a produced canopy height model and PPC was most accurately predicted based on the NIR band. Pre- and post-pruning results showed significant differences in all measured tree structural parameters, including an average decrease in tree crown perimeter of 1.94 m, tree crown width of 0.57 m, tree crown height of 0.62 m, tree crown area of 3.5 m2, and PPC of 14.8%. In order to provide guidance on data collection protocols for orchard management, the impact of flying height variations was also examined, offering some insight into the influence of scale and the scalability of this UAV based approach for larger orchards. The different flying heights (i.e. 30, 50 and 70 m) produced similar measurements of tree crown width and PPC, while tree crown perimeter, area and height measurements decreased with increasing flying height. Overall, these results illustrate that routine collection of multi-spectral UAV imagery can provide a means of assessing pruning effects on changes in tree structure in commercial orchards, and highlight the importance of collecting imagery with consistent flight configurations, as varying flying heights may cause changes to tree structural measurements.
  • Numerical approximation of a binary fluid-surfactant phase field model of two-phase incompressible flow

    Zhu, Guangpu; Kou, Jisheng; Sun, Shuyu; Yao, Jun; Li, Aifen (arXiv, 2018-04-17)
    In this paper, we consider the numerical approximation of a binary fluid-surfactant phase field model of two-phase incompressible flow. The nonlinearly coupled model consists of two Cahn-Hilliard type equations and incompressible Navier-Stokes equations. Using the Invariant Energy Quadratization (IEQ) approach, the governing system is transformed into an equivalent form, which allows the nonlinear potentials to be treated efficiently and semi-explicitly. we construct a first and a second-order time marching schemes, which are extremely efficient and easy-to-implement, for the transformed governing system. At each time step, the schemes involve solving a sequence of linear elliptic equations, and computations of phase variables, velocity and pressure are totally decoupled. We further establish a rigorous proof of unconditional energy stability for the semi-implicit schemes. Numerical results in both two and three dimensions are obtained, which demonstrate that the proposed schemes are accurate, efficient and unconditionally energy stable. Using our schemes, we investigate the effect of surfactants on droplet deformation and collision under a shear flow. The increase of surfactant concentration can enhance droplet deformation and inhibit droplet coalescence.

View more