Recent Submissions

• Drug repurposing through joint learning on knowledge graphs and literature

(Cold Spring Harbor Laboratory, 2018-08-06)
Drug repurposing is the problem of finding new uses for known drugs, and may either involve finding a new protein target or a new indication for a known mechanism. Several computational methods for drug repurposing exist, and many of these methods rely on combinations of different sources of information, extract hand-crafted features and use a computational model to predict targets or indications for a drug. One of the distinguishing features between different drug repurposing systems is the selection of features. Recently, a set of novel machine learning methods have become available that can efficiently learn features from datasets, and these methods can be applied, among others, to text and structured data in knowledge graphs. We developed a novel method that combines information in literature and structured databases, and applies feature learning to generate vector space embeddings. We apply our method to the identification of drug targets and indications for known drugs based on heterogeneous information about drugs, target proteins, and diseases. We demonstrate that our method is able to combine complementary information from both structured databases and from literature, and we show that our method can compete with well-established methods for drug repurposing. Our approach is generic and can be applied to other areas in which multi-modal information is used to build predictive models.
• Communicating Using Spatial Mode Multiplexing: Potentials, Challenges and Perspectives

(2018-08)
Time, polarization, and wavelength multiplexing schemes have been used to satisfy the growing need of transmission capacity. Using space as a new dimension for communication systems has been recently suggested as a versatile technique to address future bandwidth issues. We review the potentials of harnessing the space as an additional degree of freedom for communication applications including free space optics, optical fiber installation, underwater wireless optical links, on-chip interconnects, data center indoor connections, radio frequency and acoustic communications. We focus on the orbital angular momentum (OAM) modes and equally identify the challenges related to each of the applications of spatial modes and the particular OAM modes in communication. Finally, we discuss the perspectives of this emerging technology.
• A fast and cost-effective microsampling protocol incorporating reduced animal usage for time-series transcriptomics in rodent malaria parasites

(Cold Spring Harbor Laboratory, 2018-06-21)
The transcriptional regulation occurring in malaria parasites during the clinically important life stages within host erythrocytes can be studied in vivo with rodent malaria parasites propagated in mice. Time-series transcriptome profiling commonly involves the euthanasia of groups of mice at specific time points followed by the extraction of parasite RNA from whole blood samples. Current methodologies for parasite RNA extraction involve several steps and when multiple time points are profiled, these protocols are laborious, time consuming, and require the euthanisation of large cohorts of mice. We designed a simplified protocol for parasite RNA extraction from blood volumes as low as 20 microliters (microsamples), serially bled from mice via tail snips and directly lysed with TRIzol reagent. Gene expression data derived from microsampling using RNA-seq were closely matched to those derived from larger volumes of leucocyte-depleted and saponin-treated blood obtained from euthanized mice and also tightly correlated between biological replicates. Transcriptome profiling of microsamples taken at different time points during the intra-erythrocytic developmental cycle of the rodent malaria parasite Plasmodium vinckei revealed the transcriptional cascade commonly observed in malaria parasites. Microsampling is a quick, robust and cost-efficient approach to sample collection for in vivo time-series transcriptomic studies in rodent malaria parasites.
• GZMA and RASGRP1 are novel tumor suppressors that counter dissemination of Theileria annulata-transformed macrophages

(Cold Spring Harbor Laboratory, 2018-06-05)
Theileria annulata is a tick-transmitted apicomplexan parasite that infects and transforms bovine leukocytes into disseminating tumors that cause a disease called tropical theileriosis. Using RNA sequencing we identified bovine genes, whose transcription is perturbed during Theileria-induced transformation to define the transcriptional atlas of transformed virulent versus attenuated (dampened dissemination) macrophages and transformed B cells. Dataset comparisons highlighted a small set of novel genes associated with Theileria-transformed leukocyte dissemination and the roles of Granzyme A (GZMA) and RAS guanyl-releasing protein 1 (RASGRP1) confirmed by CRISPR/Cas9-mediated down-regulation of their expression. Knockdown of both GZMA and RASGRP1 in attenuated macrophages led to a regain in their dissemination in Rag2/γC mice confirming in vivo both GZMA and RASGRP1 as novel dissemination suppressors.
• Physical and transcriptional organisation of the bread wheat intracellular immune receptor repertoire

(Cold Spring Harbor Laboratory, 2018-06-05)
Disease resistance genes encoding intracellular immune receptors of the nucleotide-binding and leucine-rich repeat (NLR) class of proteins detect pathogens by the presence of pathogen effectors. Plant genomes typically contain hundreds of NLR encoding genes. The availability of the hexaploid wheat cultivar Chinese Spring reference genome now allows a detailed study of its NLR complement. However, low NLR expression as well as high intra-family sequence homology hinders their accurate gene annotation. Here we developed NLR-Annotator for in silico NLR identification independent of transcript support. Although developed for wheat, we demonstrate the universal applicability of NLR-Annotator across diverse plant taxa. Applying our tool to wheat and combining it with a transcript-validated subset of genes from the reference gene annotation, we characterized the structure, phylogeny and expression profile of the NLR gene family. We detected 3,400 full-length NLR loci of which 1,540 were confirmed as complete genes. NLRs with integrated domains mostly group in specific sub-clades. Members of another subclade predominantly locate in close physical proximity to NLRs carrying integrated domains suggesting a paired helper-function. Most NLRs (88%) display low basal expression (in the lower 10 percentile of transcripts), which may be tissue-specific and/or induced by biotic stress. As a case study for applying our tool to the positional cloning of resistance genes, we estimated the number of NLR genes within the intervals of mapped rust resistance genes. Our study will support the identification of functional resistance genes in wheat to accelerate the breeding and engineering of disease resistant varieties.

(2018-05-09)
• DeepPVP: phenotype-based prioritization of causative variants using deep learning

(Cold Spring Harbor Laboratory, 2018-05-02)
Background: Prioritization of variants in personal genomic data is a major challenge. Recently, computational methods that rely on comparing phenotype similarity have shown to be useful to identify causative variants. In these methods, pathogenicity prediction is combined with a semantic similarity measure to prioritize not only variants that are likely to be dysfunctional but those that are likely involved in the pathogenesis of a patient's phenotype. Results: We have developed DeepPVP, a variant prioritization method that combined automated inference with deep neural networks to identify the likely causative variants in whole exome or whole genome sequence data. We demonstrate that DeepPVP performs significantly better than existing methods, including phenotype-based methods that use similar features. DeepPVP is freely available at https://github.com/bio-ontology-research-group/phenomenet-vp Conclusions: DeepPVP further improves on existing variant prioritization methods both in terms of speed as well as accuracy.
• OligoPVP: Phenotype-driven analysis of individual genomic information to prioritize oligogenic disease variants

(Cold Spring Harbor Laboratory, 2018-05-02)
Purpose: An increasing number of Mendelian disorders have been identified for which two or more variants in one or more genes are required to cause the disease, or significantly modify its severity or phenotype. It is difficult to discover such interactions using existing approaches. The purpose of our work is to develop and evaluate a system that can identify combinations of variants underlying oligogenic diseases in individual whole exome or whole genome sequences. Methods: Information that links patient phenotypes to databases of gene-phenotype associations observed in clinical research can provide useful information and improve variant prioritization for Mendelian diseases. Additionally, background knowledge about interactions between genes can be utilized to guide and restrict the selection of candidate disease modules. Results: We developed OligoPVP, an algorithm that can be used to identify variants in oligogenic diseases and their interactions, using whole exome or whole genome sequences together with patient phenotypes as input. We demonstrate that OligoPVP has significantly improved performance when compared to state of the art pathogenicity detection methods. Conclusions: Our results show that OligoPVP can efficiently detect oligogenic interactions using a phenotype-driven approach and identify etiologically important variants in whole genomes.
• Semantic Disease Gene Embeddings (SmuDGE): phenotype-based disease gene prioritization without phenotypes

(Cold Spring Harbor Laboratory, 2018-04-30)
In the past years, several methods have been developed to incorporate information about phenotypes into computational disease gene prioritization methods. These methods commonly compute the similarity between a disease's (or patient's) phenotypes and a database of gene-to-phenotype associations to find the phenotypically most similar match. A key limitation of these methods is their reliance on knowledge about phenotypes associated with particular genes which is highly incomplete in humans as well as in many model organisms such as the mouse. Results: We developed SmuDGE, a method that uses feature learning to generate vector-based representations of phenotypes associated with an entity. SmuDGE can be used as a trainable semantic similarity measure to compare two sets of phenotypes (such as between a disease and gene, or a disease and patient). More importantly, SmuDGE can generate phenotype representations for entities that are only indirectly associated with phenotypes through an interaction network; for this purpose, SmuDGE exploits background knowledge in interaction networks comprising of multiple types of interactions. We demonstrate that SmuDGE can match or outperform semantic similarity in phenotype-based disease gene prioritization, and furthermore significantly extends the coverage of phenotype-based methods to all genes in a connected interaction network.

(2018-04-26)
• Existence of weak solutions to first-order stationary mean-field games with Dirichlet conditions

(arXiv, 2018-04-19)
In this paper, we study first-order stationary monotone mean-field games (MFGs) with Dirichlet boundary conditions. While for Hamilton--Jacobi equations Dirichlet conditions may not be satisfied, here, we establish the existence of solutions of MFGs that satisfy those conditions. To construct these solutions, we introduce a monotone regularized problem. Applying Schaefer's fixed-point theorem and using the monotonicity of the MFG, we verify that there exists a unique weak solution to the regularized problem. Finally, we take the limit of the solutions of the regularized problem and using Minty's method, we show the existence of weak solutions to the original MFG.
• Current Controlled Magnetization Switching in Cylindrical Nanowires for High-Density 3D Memory Applications

(arXiv, 2018-04-18)
A next-generation memory device utilizing a three-dimensional nanowire system requires the reliable control of domain wall motion. In this letter, domain walls are studied in cylindrical nanowires consisting of alternating segments of cobalt and nickel. The material interfaces acting as domain wall pinning sites, are utilized in combination with current pulses, to control the position of the domain wall, which is monitored using magnetoresistance measurements. Magnetic force microscopy results further confirm the occurrence of current assisted domain wall depinning. Data bits are therefore shifted along the nanowire by sequentially pinning and depinning a domain wall between successive interfaces, a requirement necessary for race-track type memory devices. We demonstrate that the direction, amplitude and duration of the applied current pulses determine the propagation of the domain wall across pinning sites. These results demonstrate a multi-bit cylindrical nanowire device, utilizing current assisted data manipulation. The prospect of sequential pinning and depinning in these nanowires allows the bit density to increase by several Tbs, depending on the number of segments within these nanowires.
• Using Multi-Spectral UAV Imagery to Extract Tree Crop Structural Properties and Assess Pruning Effects

(MDPI AG, 2018-04-18)
Unmanned aerial vehicles (UAV) provide an unprecedented capacity to monitor the development and dynamics of tree growth and structure through time. It is generally thought that the pruning of tree crops encourages new growth, has a positive effect on fruiting, makes fruit-picking easier, and may increase yield, as it increases light interception and tree crown surface area. To establish the response of pruning in an orchard of lychee trees, an assessment of changes in tree structure, i.e. tree crown perimeter, width, height, area and Plant Projective Cover (PPC), was undertaken using multi-spectral UAV imagery collected before and after a pruning event. While tree crown perimeter, width and area could be derived directly from the delineated tree crowns, height was estimated from a produced canopy height model and PPC was most accurately predicted based on the NIR band. Pre- and post-pruning results showed significant differences in all measured tree structural parameters, including an average decrease in tree crown perimeter of 1.94 m, tree crown width of 0.57 m, tree crown height of 0.62 m, tree crown area of 3.5 m2, and PPC of 14.8%. In order to provide guidance on data collection protocols for orchard management, the impact of flying height variations was also examined, offering some insight into the influence of scale and the scalability of this UAV based approach for larger orchards. The different flying heights (i.e. 30, 50 and 70 m) produced similar measurements of tree crown width and PPC, while tree crown perimeter, area and height measurements decreased with increasing flying height. Overall, these results illustrate that routine collection of multi-spectral UAV imagery can provide a means of assessing pruning effects on changes in tree structure in commercial orchards, and highlight the importance of collecting imagery with consistent flight configurations, as varying flying heights may cause changes to tree structural measurements.
• Numerical approximation of a binary fluid-surfactant phase field model of two-phase incompressible flow

(arXiv, 2018-04-17)
In this paper, we consider the numerical approximation of a binary fluid-surfactant phase field model of two-phase incompressible flow. The nonlinearly coupled model consists of two Cahn-Hilliard type equations and incompressible Navier-Stokes equations. Using the Invariant Energy Quadratization (IEQ) approach, the governing system is transformed into an equivalent form, which allows the nonlinear potentials to be treated efficiently and semi-explicitly. we construct a first and a second-order time marching schemes, which are extremely efficient and easy-to-implement, for the transformed governing system. At each time step, the schemes involve solving a sequence of linear elliptic equations, and computations of phase variables, velocity and pressure are totally decoupled. We further establish a rigorous proof of unconditional energy stability for the semi-implicit schemes. Numerical results in both two and three dimensions are obtained, which demonstrate that the proposed schemes are accurate, efficient and unconditionally energy stable. Using our schemes, we investigate the effect of surfactants on droplet deformation and collision under a shear flow. The increase of surfactant concentration can enhance droplet deformation and inhibit droplet coalescence.
• Weighted Low-Rank Approximation of Matrices and Background Modeling

(arXiv, 2018-04-15)
We primarily study a special a weighted low-rank approximation of matrices and then apply it to solve the background modeling problem. We propose two algorithms for this purpose: one operates in the batch mode on the entire data and the other one operates in the batch-incremental mode on the data and naturally captures more background variations and computationally more effective. Moreover, we propose a robust technique that learns the background frame indices from the data and does not require any training frames. We demonstrate through extensive experiments that by inserting a simple weight in the Frobenius norm, it can be made robust to the outliers similar to the $\ell_1$ norm. Our methods match or outperform several state-of-the-art online and batch background modeling methods in virtually all quantitative and qualitative measures.
• Herding Complex Networks

(arXiv, 2018-04-12)
The problem of controlling complex networks is of interest to disciplines ranging from biology to swarm robotics. However, controllability can be too strict a condition, failing to capture a range of desirable behaviors. Herdability, which describes the ability to drive a system to a specific set in the state space, was recently introduced as an alternative network control notion. This paper considers the application of herdability to the study of complex networks. The herdability of a class of networked systems is investigated and two problems related to ensuring system herdability are explored. The first is the input addition problem, which investigates which nodes in a network should receive inputs to ensure that the system is herdable. The second is a related problem of selecting the best single node from which to herd the network, in the case that a single node is guaranteed to make the system is herdable. In order to select the best herding node, a novel control energy based herdability centrality measure is introduced.
• SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos

(arXiv, 2018-04-12)
In this paper, we introduce SoccerNet, a benchmark for action spotting in soccer videos. The dataset is composed of 500 complete soccer games from six main European leagues, covering three seasons from 2014 to 2017 and a total duration of 764 hours. A total of 6,637 temporal annotations are automatically parsed from online match reports at a one minute resolution for three main classes of events (Goal, Yellow/Red Card, and Substitution). As such, the dataset is easily scalable. These annotations are manually refined to a one second resolution by anchoring them at a single timestamp following well-defined soccer rules. With an average of one event every 6.9 minutes, this dataset focuses on the problem of localizing very sparse events within long videos. We define the task of spotting as finding the anchors of soccer events in a video. Making use of recent developments in the realm of generic action recognition and detection in video, we provide strong baselines for detecting soccer events. We show that our best model for classifying temporal segments of length one minute reaches a mean Average Precision (mAP) of 67.8%. For the spotting task, our baseline reaches an Average-mAP of 49.7% for tolerances $\delta$ ranging from 5 to 60 seconds.
• Herdable Systems Over Signed, Directed Graphs

(arXiv, 2018-04-11)
This paper considers the notion of herdability, a set-based reachability condition, which asks whether the state of a system can be controlled to be element-wise larger than a non-negative threshold. The basic theory of herdable systems is presented, including a necessary and sufficient condition for herdability. This paper then considers the impact of the underlying graph structure of a linear system on the herdability of the system, for the case where the graph is represented as signed and directed. By classifying nodes based on the length and sign of walks from an input, we find a class of completely herdable systems as well as provide a complete characterization of nodes that can be herded in systems with an underlying graph that is a directed out-branching rooted at a single input.
• Model-based Quantile Regression for Discrete Data

(arXiv, 2018-04-10)
Quantile regression is a class of methods voted to the modelling of conditional quantiles. In a Bayesian framework quantile regression has typically been carried out exploiting the Asymmetric Laplace Distribution as a working likelihood. Despite the fact that this leads to a proper posterior for the regression coefficients, the resulting posterior variance is however affected by an unidentifiable parameter, hence any inferential procedure beside point estimation is unreliable. We propose a model-based approach for quantile regression that considers quantiles of the generating distribution directly, and thus allows for a proper uncertainty quantification. We then create a link between quantile regression and generalised linear models by mapping the quantiles to the parameter of the response variable, and we exploit it to fit the model with R-INLA. We extend it also in the case of discrete responses, where there is no 1-to-1 relationship between quantiles and distribution's parameter, by introducing continuous generalisations of the most common discrete variables (Poisson, Binomial and Negative Binomial) to be exploited in the fitting.
• Path to Stochastic Stability: Comparative Analysis of Stochastic Learning Dynamics in Games

(arXiv, 2018-04-08)
Stochastic stability is a popular solution concept for stochastic learning dynamics in games. However, a critical limitation of this solution concept is its inability to distinguish between different learning rules that lead to the same steady-state behavior. We address this limitation for the first time and develop a framework for the comparative analysis of stochastic learning dynamics with different update rules but same steady-state behavior. We present the framework in the context of two learning dynamics: Log-Linear Learning (LLL) and Metropolis Learning (ML). Although both of these dynamics have the same stochastically stable states, LLL and ML correspond to different behavioral models for decision making. Moreover, we demonstrate through an example setup of sensor coverage game that for each of these dynamics, the paths to stochastically stable states exhibit distinctive behaviors. Therefore, we propose multiple criteria to analyze and quantify the differences in the short and medium run behavior of stochastic learning dynamics. We derive and compare upper bounds on the expected hitting time to the set of Nash equilibria for both LLL and ML. For the medium to long-run behavior, we identify a set of tools from the theory of perturbed Markov chains that result in a hierarchical decomposition of the state space into collections of states called cycles. We compare LLL and ML based on the proposed criteria and develop invaluable insights into the comparative behavior of the two dynamics.