• Parameter-free Network Sparsification and Data Reduction by Minimal Algorithmic Information Loss

      Zenil, Hector; Kiani, Narsis A.; Tegner, Jesper (arXiv, 2018-02-16)
      The study of large and complex datasets, or big data, organized as networks has emerged as one of the central challenges in most areas of science and technology. Cellular and molecular networks in biology is one of the prime examples. Henceforth, a number of techniques for data dimensionality reduction, especially in the context of networks, have been developed. Yet, current techniques require a predefined metric upon which to minimize the data size. Here we introduce a family of parameter-free algorithms based on (algorithmic) information theory that are designed to minimize the loss of any (enumerable computable) property contributing to the object's algorithmic content and thus important to preserve in a process of data dimension reduction when forcing the algorithm to delete first the least important features. Being independent of any particular criterion, they are universal in a fundamental mathematical sense. Using suboptimal approximations of efficient (polynomial) estimations we demonstrate how to preserve network properties outperforming other (leading) algorithms for network dimension reduction. Our method preserves all graph-theoretic indices measured, ranging from degree distribution, clustering-coefficient, edge betweenness, and degree and eigenvector centralities. We conclude and demonstrate numerically that our parameter-free, Minimal Information Loss Sparsification (MILS) method is robust, has the potential to maximize the preservation of all recursively enumerable features in data and networks, and achieves equal to significantly better results than other data reduction and network sparsification methods.
    • Parameters and Fractional Differentiation Orders Estimation for Linear Continuous-Time Non-Commensurate Fractional Order Systems

      Belkhatir, Zehor; Laleg-Kirati, Taous-Meriem (Submitted to Elsevier, 2017-05-31)
      This paper proposes a two-stage estimation algorithm to solve the problem of joint estimation of the parameters and the fractional differentiation orders of a linear continuous-time fractional system with non-commensurate orders. The proposed algorithm combines the modulating functions and the first-order Newton methods. Sufficient conditions ensuring the convergence of the method are provided. An error analysis in the discrete case is performed. Moreover, the method is extended to the joint estimation of smooth unknown input and fractional differentiation orders. The performance of the proposed approach is illustrated with different numerical examples. Furthermore, a potential application of the algorithm is proposed which consists in the estimation of the differentiation orders of a fractional neurovascular model along with the neural activity considered as input for this model.
    • Particle Simulation of Fractional Diffusion Equations

      Allouch, Samer; Lucchesi, Marco; Maître, O. P. Le; Mustapha, K. A.; Knio, Omar (arXiv, 2017-07-12)
      This work explores different particle-based approaches to the simulation of one-dimensional fractional subdiffusion equations in unbounded domains. We rely on smooth particle approximations, and consider four methods for estimating the fractional diffusion term. The first method is based on direct differentiation of the particle representation, it follows the Riesz definition of the fractional derivative and results in a non-conservative scheme. The other three methods follow the particle strength exchange (PSE) methodology and are by construction conservative, in the sense that the total particle strength is time invariant. The first PSE algorithm is based on using direct differentiation to estimate the fractional diffusion flux, and exploiting the resulting estimates in an integral representation of the divergence operator. Meanwhile, the second one relies on the regularized Riesz representation of the fractional diffusion term to derive a suitable interaction formula acting directly on the particle representation of the diffusing field. A third PSE construction is considered that exploits the Green's function of the fractional diffusion equation. The performance of all four approaches is assessed for the case of a one-dimensional diffusion equation with constant diffusivity. This enables us to take advantage of known analytical solutions, and consequently conduct a detailed analysis of the performance of the methods. This includes a quantitative study of the various sources of error, namely filtering, quadrature, domain truncation, and time integration, as well as a space and time self-convergence analysis. These analyses are conducted for different values of the order of the fractional derivatives, and computational experiences are used to gain insight that can be used for generalization of the present constructions.
    • Passivity and Evolutionary Game Dynamics

      Park, Shinkyu; Shamma, Jeff S.; Martins, Nuno C. (arXiv, 2018-03-21)
      This paper investigates an energy conservation and dissipation -- passivity -- aspect of dynamic models in evolutionary game theory. We define a notion of passivity using the state-space representation of the models, and we devise systematic methods to examine passivity and to identify properties of passive dynamic models. Based on the methods, we describe how passivity is connected to stability in population games and illustrate stability of passive dynamic models using numerical simulations.
    • Path to Stochastic Stability: Comparative Analysis of Stochastic Learning Dynamics in Games

      Jaleel, Hassan; Shamma, Jeff S. (arXiv, 2018-04-08)
      Stochastic stability is a popular solution concept for stochastic learning dynamics in games. However, a critical limitation of this solution concept is its inability to distinguish between different learning rules that lead to the same steady-state behavior. We address this limitation for the first time and develop a framework for the comparative analysis of stochastic learning dynamics with different update rules but same steady-state behavior. We present the framework in the context of two learning dynamics: Log-Linear Learning (LLL) and Metropolis Learning (ML). Although both of these dynamics have the same stochastically stable states, LLL and ML correspond to different behavioral models for decision making. Moreover, we demonstrate through an example setup of sensor coverage game that for each of these dynamics, the paths to stochastically stable states exhibit distinctive behaviors. Therefore, we propose multiple criteria to analyze and quantify the differences in the short and medium run behavior of stochastic learning dynamics. We derive and compare upper bounds on the expected hitting time to the set of Nash equilibria for both LLL and ML. For the medium to long-run behavior, we identify a set of tools from the theory of perturbed Markov chains that result in a hierarchical decomposition of the state space into collections of states called cycles. We compare LLL and ML based on the proposed criteria and develop invaluable insights into the comparative behavior of the two dynamics.
    • PathoPhenoDB: linking human pathogens to their disease phenotypes in support of infectious disease research

      Kafkas, Senay; Abdelhakim, Marwa; Hashish, Yasmeen; Kulmanov, Maxat; Abdellatif, Marwa; Schofield, Paul N; Hoehndorf, Robert (Cold Spring Harbor Laboratory, 2018-12-10)
      Understanding the relationship between the pathophysiology of infectious disease, the biology of the causative agent and the development of therapeutic and diagnostic approaches is dependent on the synthesis of a wide range of types of information. Provision of a comprehensive and integrated disease phenotype knowledgebase has the potential to provide novel and orthogonal sources of information for the understanding of infectious agent pathogenesis, and support for research on disease mechanisms. We have developed PathoPhenoDB, a database containing pathogen-to-phenotype associations. PathoPhenoDB relies on manual curation of pathogen-disease relations, on ontology-based text mining as well as manual curation to associate phenotypes with infectious disease. Using Semantic Web technologies, PathoPhenoDB also links to knowledge about drug resistance mechanisms and drugs used in the treatment of infectious diseases. PathoPhenoDB is accessible at http://patho.phenomebrowser.net/, and the data is freely available through a public SPARQL endpoint.
    • Penultimate modeling of spatial extremes: statistical inference for max-infinitely divisible processes

      Huser, Raphaël; Opitz, Thomas; Thibaud, Emeric (arXiv, 2018-01-09)
      Extreme-value theory for stochastic processes has motivated the statistical use of max-stable models for spatial extremes. However, fitting such asymptotic models to maxima observed over finite blocks is problematic when the asymptotic stability of the dependence does not prevail in finite samples. This issue is particularly serious when data are asymptotically independent, such that the dependence strength weakens and eventually vanishes as events become more extreme. We here aim to provide flexible sub-asymptotic models for spatially indexed block maxima, which more realistically account for discrepancies between data and asymptotic theory. We develop models pertaining to the wider class of max-infinitely divisible processes, extending the class of max-stable processes while retaining dependence properties that are natural for maxima: max-id models are positively associated, and they yield a self-consistent family of models for block maxima defined over any time unit. We propose two parametric construction principles for max-id models, emphasizing a point process-based generalized spectral representation, that allows for asymptotic independence while keeping the max-stable extremal-$t$ model as a special case. Parameter estimation is efficiently performed by pairwise likelihood, and we illustrate our new modeling framework with an application to Dutch wind gust maxima calculated over different time units.
    • Perturbation-Based Regularization for Signal Estimation in Linear Discrete Ill-posed Problems

      Suliman, Mohamed Abdalla Elhag; Ballal, Tarig; Al-Naffouri, Tareq Y. (arXiv, 2016-11-29)
      Estimating the values of unknown parameters from corrupted measured data faces a lot of challenges in ill-posed problems. In such problems, many fundamental estimation methods fail to provide a meaningful stabilized solution. In this work, we propose a new regularization approach and a new regularization parameter selection approach for linear least-squares discrete ill-posed problems. The proposed approach is based on enhancing the singular-value structure of the ill-posed model matrix to acquire a better solution. Unlike many other regularization algorithms that seek to minimize the estimated data error, the proposed approach is developed to minimize the mean-squared error of the estimator which is the objective in many typical estimation scenarios. The performance of the proposed approach is demonstrated by applying it to a large set of real-world discrete ill-posed problems. Simulation results demonstrate that the proposed approach outperforms a set of benchmark regularization methods in most cases. In addition, the approach also enjoys the lowest runtime and offers the highest level of robustness amongst all the tested benchmark regularization methods.
    • Phenotypic, functional and taxonomic features predict host-pathogen interactions: Table S1; Figure S1

      Liu-Wei, Wang; Kafkas, Senay; Hoehndorf, Robert (Cold Spring Harbor Laboratory, 2018-12-31)
      Identification of host-pathogen interactions (HPIs) can reveal mechanistic insights of infectious diseases for potential treatments and drug discoveries. Current computational methods for the prediction of HPIs often rely on our knowledge on the sequences and functions of pathogen proteins, which is limited for many species, especially for species of emerging pathogens. Matching the phenotypes elicited by pathogens with phenotypes associated with host proteins might improve the prediction of HPIs. We developed an ontology-based method that prioritizes potential interaction protein partners for pathogens using machine learning models. Our method exploits the underlying disease mechanisms by associating phenotypic and functional features of pathogens and human proteins, corroborated by multiple ontologies as background knowledge. Additionally, by embedding the phenotypic information of the pathogens within a formally represented taxonomy, we demonstrate that our model can also accurately predict interaction partners for pathogens without known phenotypes, using a combination of their taxonomic relationships with other pathogens and information from ontologies as background knowledge. Our results show that the integration of phenotypic, functional and taxonomic knowledge not only improves the prediction of HPIs, but also enables us to investigate novel pathogens in emerging infectious diseases.
    • Physical and transcriptional organisation of the bread wheat intracellular immune receptor repertoire

      Steuernagel, Burkhard; Witek, Kamil; Krattinger, Simon G.; Ramirez-Gonzalez, Ricardo H.; Schoonbeek, Henk-jan; Yu, Guotai; Baggs, Erin; Witek, Agnieszka; Yadav, Inderjit; Krasileva, Ksenia V.; Jones, Jonathan D. G.; Uauy, Cristobal; Keller, Beat; Ridout, Christopher J.; Wulff, Brande; The International Wheat Genome Sequencing Consortium (Cold Spring Harbor Laboratory, 2018-06-05)
      Disease resistance genes encoding intracellular immune receptors of the nucleotide-binding and leucine-rich repeat (NLR) class of proteins detect pathogens by the presence of pathogen effectors. Plant genomes typically contain hundreds of NLR encoding genes. The availability of the hexaploid wheat cultivar Chinese Spring reference genome now allows a detailed study of its NLR complement. However, low NLR expression as well as high intra-family sequence homology hinders their accurate gene annotation. Here we developed NLR-Annotator for in silico NLR identification independent of transcript support. Although developed for wheat, we demonstrate the universal applicability of NLR-Annotator across diverse plant taxa. Applying our tool to wheat and combining it with a transcript-validated subset of genes from the reference gene annotation, we characterized the structure, phylogeny and expression profile of the NLR gene family. We detected 3,400 full-length NLR loci of which 1,540 were confirmed as complete genes. NLRs with integrated domains mostly group in specific sub-clades. Members of another subclade predominantly locate in close physical proximity to NLRs carrying integrated domains suggesting a paired helper-function. Most NLRs (88%) display low basal expression (in the lower 10 percentile of transcripts), which may be tissue-specific and/or induced by biotic stress. As a case study for applying our tool to the positional cloning of resistance genes, we estimated the number of NLR genes within the intervals of mapped rust resistance genes. Our study will support the identification of functional resistance genes in wheat to accelerate the breeding and engineering of disease resistant varieties.
    • Possible evidence for spin-transfer torque induced by spin-triplet supercurrent

      Li, Lailai; Zhao, Yuelei; Zhang, Xixiang; Sun, Young (arXiv, 2017-10-04)
      Cooper pairs in superconductors are normally spin singlet. Nevertheless, recent studies suggest that spin-triplet Cooper pairs can be created at carefully engineered superconductor-ferromagnet interfaces. If Cooper pairs are spin-polarized they would transport not only charge but also a net spin component, but without dissipation, and therefore minimize the heating effects associated with spintronic devices. Although it is now established that triplet supercurrents exist, their most interesting property - spin - is only inferred indirectly from transport measurements. In conventional spintronics, it is well known that spin currents generate spin-transfer torques that alter magnetization dynamics and switch magnetic moments. The observation of similar effects due to spin-triplet supercurrents would not only confirm the net spin of triplet pairs but also pave the way for applications of superconducting spintronics. Here, we present a possible evidence for spin-transfer torques induced by triplet supercurrents in superconductor/ferromagnet/superconductor (S/F/S) Josephson junctions. Below the superconducting transition temperature T_c, the ferromagnetic resonance (FMR) field at X-band (~ 9.0 GHz) shifts rapidly to a lower field with decreasing temperature due to the spin-transfer torques induced by triplet supercurrents. In contrast, this phenomenon is absent in ferromagnet/superconductor (F/S) bilayers and superconductor/insulator/ferromagnet/superconductor (S/I/F/S) multilayers where no supercurrents pass through the ferromagnetic layer. These experimental observations are discussed with theoretical predictions for ferromagnetic Josephson junctions with precessing magnetization.
    • Precision phenotyping reveals novel loci for quantitative resistance to septoria tritici blotch in European winter wheat

      Yates, Steven; Mikaberidze, Alexey; Krattinger, Simon G.; Abrouk, Michael; Hund, Andreas; Yu, Kang; Studer, Bruno; Fouche, Simone; Meile, Lukas; Pereira, Danilo; Karisto, Petteri; McDonald, Bruce (Cold Spring Harbor Laboratory, 2018-12-21)
      Accurate, high-throughput phenotyping for quantitative traits is the limiting factor for progress in plant breeding. We developed automated image analysis to measure quantitative resistance to septoria tritici blotch (STB), a globally important wheat disease, enabling identification of small chromosome intervals containing plausible candidate genes for STB resistance. 335 winter wheat cultivars were included in a replicated field experiment that experienced natural epidemic development by a highly diverse but fungicide-resistant pathogen population. More than 5.4 million automatically generated phenotypes were associated with 13,648 SNP markers to perform a GWAS. We identified 26 chromosome intervals explaining 1.9-10.6% of the variance associated with four resistance traits. Seventeen of the intervals were less than 5 Mbp in size and encoded only 173 genes, including many genes associated with disease resistance. Five intervals contained four or fewer genes, providing high priority targets for functional validation. Ten chromosome intervals were not previously associated with STB resistance. Our experiment illustrates how high-throughput automated phenotyping can accelerate breeding for quantitative disease resistance. The SNP markers associated with these chromosome intervals can be used to recombine different forms of quantitative STB resistance that are likely to be more durable than pyramids of major resistance genes.
    • Predictive Systems Toxicology

      Kiani, Narsis A.; Shang, Ming-Mei; Zenil, Hector; Tegner, Jesper (arXiv, 2018-01-15)
      In this review we address to what extent computational techniques can augment our ability to predict toxicity. The first section provides a brief history of empirical observations on toxicity dating back to the dawn of Sumerian civilization. Interestingly, the concept of dose emerged very early on, leading up to the modern emphasis on kinetic properties, which in turn encodes the insight that toxicity is not solely a property of a compound but instead depends on the interaction with the host organism. The next logical step is the current conception of evaluating drugs from a personalized medicine point-of-view. We review recent work on integrating what could be referred to as classical pharmacokinetic analysis with emerging systems biology approaches incorporating multiple omics data. These systems approaches employ advanced statistical analytical data processing complemented with machine learning techniques and use both pharmacokinetic and omics data. We find that such integrated approaches not only provide improved predictions of toxicity but also enable mechanistic interpretations of the molecular mechanisms underpinning toxicity and drug resistance. We conclude the chapter by discussing some of the main challenges, such as how to balance the inherent tension between the predictive capacity of models, which in practice amounts to constraining the number of features in the models versus allowing for rich mechanistic interpretability, i.e. equipping models with numerous molecular features. This challenge also requires patient-specific predictions on toxicity, which in turn requires proper stratification of patients as regards how they respond, with or without adverse toxic effects. In summary, the transformation of the ancient concept of dose is currently successfully operationalized using rich integrative data encoded in patient-specific models.
    • Pricing American Options by Exercise Rate Optimization

      Bayer, Christian; Tempone, Raul; Wolfers, Sören (arXiv, 2018-09-20)
      We present a novel method for the numerical pricing of American options based on Monte Carlo simulation and optimization of exercise strategies. Previous solutions to this problem either explicitly or implicitly determine so-called optimal \emph{exercise regions}, which consist of points in time and space at which the option is exercised. In contrast, our method determines \emph{exercise rates} of randomized exercise strategies. We show that the supremum of the corresponding stochastic optimization problem provides the correct option price. By integrating analytically over the random exercise decision, we obtain an objective function that is differentiable with respect to perturbations of the exercise rate even for finitely many sample paths. Starting in a neutral strategy with constant exercise rate then allows us to globally optimize this function in a gradual manner. Numerical experiments on vanilla put options in the multivariate Black--Scholes model and preliminary theoretical analysis underline the efficiency of our method both with respect to the number of time-discretization steps and the required number of degrees of freedom in the parametrization of exercise rates. Finally, the flexibility of our method is demonstrated by numerical experiments on max call options in the Black--Scholes model and vanilla put options in Heston model and the non-Markovian rough Bergomi model.
    • Privacy preserving randomized gossip algorithms

      Hanzely, Filip; Konečný, Jakub; Loizou, Nicolas; Richtarik, Peter; Grishchenko, Dmitry (arXiv, 2017-06-23)
      In this work we present three different randomized gossip algorithms for solving the average consensus problem while at the same time protecting the information about the initial private values stored at the nodes. We give iteration complexity bounds for all methods, and perform extensive numerical experiments.
    • Proteome-level assessment of origin, prevalence and function of Leucine-Aspartic Acid (LD) motifs

      Alam, Tanvir; Alazmi, Meshari; Naser, Rayan Mohammad Mahmoud; Huser, Franceline; Momin, Afaque Ahmad Imtiyaz; Walkiewicz, Katarzyna Wiktoria; Canlas, Christian; Huser, Raphaël; Ali, Amal J.; Merzaban, Jasmeen; Bajic, Vladimir B.; Gao, Xin; Arold, Stefan T. (Cold Spring Harbor Laboratory, 2018-03-11)
      Short Linear Motifs (SLiMs) contribute to almost every cellular function by connecting appropriate protein partners. Accurate prediction of SLiMs is difficult due to their shortness and sequence degeneracy. Leucine-aspartic acid (LD) motifs are SLiMs that link paxillin family proteins to factors controlling (cancer) cell adhesion, motility and survival. The existence and importance of LD motifs beyond the paxillin family is poorly understood. To enable a proteome-wide assessment of these motifs, we developed an active-learning based framework that iteratively integrates computational predictions with experimental validation. Our analysis of the human proteome identified a dozen proteins that contain LD motifs, all being involved in cell adhesion and migration, and revealed a new type of inverse LD motif consensus. Our evolutionary analysis suggested that LD motif signalling originated in the common unicellular ancestor of opisthokonts and amoebozoa by co-opting nuclear export sequences. Inter-species comparison revealed a conserved LD signalling core, and reveals the emergence of species-specific adaptive connections, while maintaining a strong functional focus of the LD motif interactome. Collectively, our data elucidate the mechanisms underlying the origin and adaptation of an ancestral SLiM.
    • Quantitative Seq-LGS: Genome-Wide Identification of Genetic Drivers of Multiple Phenotypes in Malaria Parasites

      Abkallo, Hussein M.; Martinelli, Axel; Inoue, Megumi; Ramaprasad, Abhinay; Xangsayarath, Phonepadith; Gitaka, Jesse; Tang, Jianxia; Yahata, Kazuhide; Zoungrana, Augustin; Mitaka, Hayato; Hunt, Paul; Carter, Richard; Kaneko, Osamu; Mustonen, Ville; Illingworth, Christopher J.R.; Pain, Arnab; Culleton, Richard (Cold Spring Harbor Laboratory Press, 2016-10-01)
      Identifying the genetic determinants of phenotypes that impact on disease severity is of fundamental importance for the design of new interventions against malaria. Traditionally, such discovery has relied on labor-intensive approaches that require significant investments of time and resources. By combining Linkage Group Selection (LGS), quantitative whole genome population sequencing and a novel mathematical modeling approach (qSeq-LGS), we simultaneously identified multiple genes underlying two distinct phenotypes, identifying novel alleles for growth rate and strain specific immunity (SSI), while removing the need for traditionally required steps such as cloning, individual progeny phenotyping and marker generation. The detection of novel variants, verified by experimental phenotyping methods, demonstrates the remarkable potential of this approach for the identification of genes controlling selectable phenotypes in malaria and other apicomplexan parasites for which experimental genetic crosses are amenable.
    • Randomized Block Cubic Newton Method

      Doikov, Nikita; Richtarik, Peter (arXiv, 2018-02-12)
      We study the problem of minimizing the sum of three convex functions: a differentiable, twice-differentiable and a non-smooth term in a high dimensional setting. To this effect we propose and analyze a randomized block cubic Newton (RBCN) method, which in each iteration builds a model of the objective function formed as the sum of the natural models of its three components: a linear model with a quadratic regularizer for the differentiable term, a quadratic model with a cubic regularizer for the twice differentiable term, and perfect (proximal) model for the nonsmooth term. Our method in each iteration minimizes the model over a random subset of blocks of the search variable. RBCN is the first algorithm with these properties, generalizing several existing methods, matching the best known bounds in all special cases. We establish ${\cal O}(1/\epsilon)$, ${\cal O}(1/\sqrt{\epsilon})$ and ${\cal O}(\log (1/\epsilon))$ rates under different assumptions on the component functions. Lastly, we show numerically that our method outperforms the state-of-the-art on a variety of machine learning problems, including cubically regularized least-squares, logistic regression with constraints, and Poisson regression.
    • A Randomized Exchange Algorithm for Computing Optimal Approximate Designs of Experiments

      Harman, Radoslav; Filová, Lenka; Richtarik, Peter (arXiv, 2018-01-17)
      We propose a class of subspace ascent methods for computing optimal approximate designs that covers both existing as well as new and more efficient algorithms. Within this class of methods, we construct a simple, randomized exchange algorithm (REX). Numerical comparisons suggest that the performance of REX is comparable or superior to the performance of state-of-the-art methods across a broad range of problem structures and sizes. We focus on the most commonly used criterion of D-optimality that also has applications beyond experimental design, such as the construction of the minimum volume ellipsoid containing a given set of data-points. For D-optimality, we prove that the proposed algorithm converges to the optimum. We also provide formulas for the optimal exchange of weights in the case of the criterion of A-optimality. These formulas enable one to use REX for computing A-optimal and I-optimal designs.
    • Robust Beamforming in Cache-Enabled Cloud Radio Access Networks

      Dhifallah, Oussama Najeeb; Dahrouj, Hayssam; Al-Naffouri, Tareq Y.; Alouini, Mohamed-Slim (arXiv, 2016-09-06)
      Popular content caching is expected to play a major role in efficiently reducing backhaul congestion and achieving user satisfaction in next generation mobile radio systems. Consider the downlink of a cache-enabled cloud radio access network (CRAN), where each cache-enabled base station (BS) is equipped with limited-size local cache storage. The central computing unit (cloud) is connected to the BSs via a limited capacity backhaul link and serves a set of single-antenna mobile users (MUs). This paper assumes that only imperfect channel state information (CSI) is available at the cloud. It focuses on the problem of minimizing the total network power and backhaul cost so as to determine the beamforming vector of each user across the network, the quantization noise covariance matrix, and the BS clustering subject to imperfect channel state information and fixed cache placement assumptions. The paper suggests solving such a difficult, non-convex optimization problem using the semidefinite relaxation (SDR). The paper then uses the ℓ0-norm approximation to provide a feasible, sub-optimal solution using the majorization-minimization (MM) approach. Simulation results particularly show how the cache-enabled network significantly improves the backhaul cost especially at high signal-to-interference-plus-noise ratio (SINR) values as compared to conventional cache-less CRANs.