Browsing Preprints by Title
Now showing items 128147 of 195

Parameterfree Network Sparsification and Data Reduction by Minimal Algorithmic Information Loss(arXiv, 20180216)The study of large and complex datasets, or big data, organized as networks has emerged as one of the central challenges in most areas of science and technology. Cellular and molecular networks in biology is one of the prime examples. Henceforth, a number of techniques for data dimensionality reduction, especially in the context of networks, have been developed. Yet, current techniques require a predefined metric upon which to minimize the data size. Here we introduce a family of parameterfree algorithms based on (algorithmic) information theory that are designed to minimize the loss of any (enumerable computable) property contributing to the object's algorithmic content and thus important to preserve in a process of data dimension reduction when forcing the algorithm to delete first the least important features. Being independent of any particular criterion, they are universal in a fundamental mathematical sense. Using suboptimal approximations of efficient (polynomial) estimations we demonstrate how to preserve network properties outperforming other (leading) algorithms for network dimension reduction. Our method preserves all graphtheoretic indices measured, ranging from degree distribution, clusteringcoefficient, edge betweenness, and degree and eigenvector centralities. We conclude and demonstrate numerically that our parameterfree, Minimal Information Loss Sparsification (MILS) method is robust, has the potential to maximize the preservation of all recursively enumerable features in data and networks, and achieves equal to significantly better results than other data reduction and network sparsification methods.

Parameters and Fractional Differentiation Orders Estimation for Linear ContinuousTime NonCommensurate Fractional Order Systems(Submitted to Elsevier, 20170531)This paper proposes a twostage estimation algorithm to solve the problem of joint estimation of the parameters and the fractional differentiation orders of a linear continuoustime fractional system with noncommensurate orders. The proposed algorithm combines the modulating functions and the firstorder Newton methods. Sufficient conditions ensuring the convergence of the method are provided. An error analysis in the discrete case is performed. Moreover, the method is extended to the joint estimation of smooth unknown input and fractional differentiation orders. The performance of the proposed approach is illustrated with different numerical examples. Furthermore, a potential application of the algorithm is proposed which consists in the estimation of the differentiation orders of a fractional neurovascular model along with the neural activity considered as input for this model.

Particle Simulation of Fractional Diffusion Equations(arXiv, 20170712)This work explores different particlebased approaches to the simulation of onedimensional fractional subdiffusion equations in unbounded domains. We rely on smooth particle approximations, and consider four methods for estimating the fractional diffusion term. The first method is based on direct differentiation of the particle representation, it follows the Riesz definition of the fractional derivative and results in a nonconservative scheme. The other three methods follow the particle strength exchange (PSE) methodology and are by construction conservative, in the sense that the total particle strength is time invariant. The first PSE algorithm is based on using direct differentiation to estimate the fractional diffusion flux, and exploiting the resulting estimates in an integral representation of the divergence operator. Meanwhile, the second one relies on the regularized Riesz representation of the fractional diffusion term to derive a suitable interaction formula acting directly on the particle representation of the diffusing field. A third PSE construction is considered that exploits the Green's function of the fractional diffusion equation. The performance of all four approaches is assessed for the case of a onedimensional diffusion equation with constant diffusivity. This enables us to take advantage of known analytical solutions, and consequently conduct a detailed analysis of the performance of the methods. This includes a quantitative study of the various sources of error, namely filtering, quadrature, domain truncation, and time integration, as well as a space and time selfconvergence analysis. These analyses are conducted for different values of the order of the fractional derivatives, and computational experiences are used to gain insight that can be used for generalization of the present constructions.

Passivity and Evolutionary Game Dynamics(arXiv, 20180321)This paper investigates an energy conservation and dissipation  passivity  aspect of dynamic models in evolutionary game theory. We define a notion of passivity using the statespace representation of the models, and we devise systematic methods to examine passivity and to identify properties of passive dynamic models. Based on the methods, we describe how passivity is connected to stability in population games and illustrate stability of passive dynamic models using numerical simulations.

Path to Stochastic Stability: Comparative Analysis of Stochastic Learning Dynamics in Games(arXiv, 20180408)Stochastic stability is a popular solution concept for stochastic learning dynamics in games. However, a critical limitation of this solution concept is its inability to distinguish between different learning rules that lead to the same steadystate behavior. We address this limitation for the first time and develop a framework for the comparative analysis of stochastic learning dynamics with different update rules but same steadystate behavior. We present the framework in the context of two learning dynamics: LogLinear Learning (LLL) and Metropolis Learning (ML). Although both of these dynamics have the same stochastically stable states, LLL and ML correspond to different behavioral models for decision making. Moreover, we demonstrate through an example setup of sensor coverage game that for each of these dynamics, the paths to stochastically stable states exhibit distinctive behaviors. Therefore, we propose multiple criteria to analyze and quantify the differences in the short and medium run behavior of stochastic learning dynamics. We derive and compare upper bounds on the expected hitting time to the set of Nash equilibria for both LLL and ML. For the medium to longrun behavior, we identify a set of tools from the theory of perturbed Markov chains that result in a hierarchical decomposition of the state space into collections of states called cycles. We compare LLL and ML based on the proposed criteria and develop invaluable insights into the comparative behavior of the two dynamics.

PathoPhenoDB: linking human pathogens to their disease phenotypes in support of infectious disease research(Cold Spring Harbor Laboratory, 20181210)Understanding the relationship between the pathophysiology of infectious disease, the biology of the causative agent and the development of therapeutic and diagnostic approaches is dependent on the synthesis of a wide range of types of information. Provision of a comprehensive and integrated disease phenotype knowledgebase has the potential to provide novel and orthogonal sources of information for the understanding of infectious agent pathogenesis, and support for research on disease mechanisms. We have developed PathoPhenoDB, a database containing pathogentophenotype associations. PathoPhenoDB relies on manual curation of pathogendisease relations, on ontologybased text mining as well as manual curation to associate phenotypes with infectious disease. Using Semantic Web technologies, PathoPhenoDB also links to knowledge about drug resistance mechanisms and drugs used in the treatment of infectious diseases. PathoPhenoDB is accessible at http://patho.phenomebrowser.net/, and the data is freely available through a public SPARQL endpoint.

Penultimate modeling of spatial extremes: statistical inference for maxinfinitely divisible processes(arXiv, 20180109)Extremevalue theory for stochastic processes has motivated the statistical use of maxstable models for spatial extremes. However, fitting such asymptotic models to maxima observed over finite blocks is problematic when the asymptotic stability of the dependence does not prevail in finite samples. This issue is particularly serious when data are asymptotically independent, such that the dependence strength weakens and eventually vanishes as events become more extreme. We here aim to provide flexible subasymptotic models for spatially indexed block maxima, which more realistically account for discrepancies between data and asymptotic theory. We develop models pertaining to the wider class of maxinfinitely divisible processes, extending the class of maxstable processes while retaining dependence properties that are natural for maxima: maxid models are positively associated, and they yield a selfconsistent family of models for block maxima defined over any time unit. We propose two parametric construction principles for maxid models, emphasizing a point processbased generalized spectral representation, that allows for asymptotic independence while keeping the maxstable extremal$t$ model as a special case. Parameter estimation is efficiently performed by pairwise likelihood, and we illustrate our new modeling framework with an application to Dutch wind gust maxima calculated over different time units.

PerturbationBased Regularization for Signal Estimation in Linear Discrete Illposed Problems(arXiv, 20161129)Estimating the values of unknown parameters from corrupted measured data faces a lot of challenges in illposed problems. In such problems, many fundamental estimation methods fail to provide a meaningful stabilized solution. In this work, we propose a new regularization approach and a new regularization parameter selection approach for linear leastsquares discrete illposed problems. The proposed approach is based on enhancing the singularvalue structure of the illposed model matrix to acquire a better solution. Unlike many other regularization algorithms that seek to minimize the estimated data error, the proposed approach is developed to minimize the meansquared error of the estimator which is the objective in many typical estimation scenarios. The performance of the proposed approach is demonstrated by applying it to a large set of realworld discrete illposed problems. Simulation results demonstrate that the proposed approach outperforms a set of benchmark regularization methods in most cases. In addition, the approach also enjoys the lowest runtime and offers the highest level of robustness amongst all the tested benchmark regularization methods.

Phenotypic, functional and taxonomic features predict hostpathogen interactions: Table S1; Figure S1(Cold Spring Harbor Laboratory, 20181231)Identification of hostpathogen interactions (HPIs) can reveal mechanistic insights of infectious diseases for potential treatments and drug discoveries. Current computational methods for the prediction of HPIs often rely on our knowledge on the sequences and functions of pathogen proteins, which is limited for many species, especially for species of emerging pathogens. Matching the phenotypes elicited by pathogens with phenotypes associated with host proteins might improve the prediction of HPIs. We developed an ontologybased method that prioritizes potential interaction protein partners for pathogens using machine learning models. Our method exploits the underlying disease mechanisms by associating phenotypic and functional features of pathogens and human proteins, corroborated by multiple ontologies as background knowledge. Additionally, by embedding the phenotypic information of the pathogens within a formally represented taxonomy, we demonstrate that our model can also accurately predict interaction partners for pathogens without known phenotypes, using a combination of their taxonomic relationships with other pathogens and information from ontologies as background knowledge. Our results show that the integration of phenotypic, functional and taxonomic knowledge not only improves the prediction of HPIs, but also enables us to investigate novel pathogens in emerging infectious diseases.

Physical and transcriptional organisation of the bread wheat intracellular immune receptor repertoire(Cold Spring Harbor Laboratory, 20180605)Disease resistance genes encoding intracellular immune receptors of the nucleotidebinding and leucinerich repeat (NLR) class of proteins detect pathogens by the presence of pathogen effectors. Plant genomes typically contain hundreds of NLR encoding genes. The availability of the hexaploid wheat cultivar Chinese Spring reference genome now allows a detailed study of its NLR complement. However, low NLR expression as well as high intrafamily sequence homology hinders their accurate gene annotation. Here we developed NLRAnnotator for in silico NLR identification independent of transcript support. Although developed for wheat, we demonstrate the universal applicability of NLRAnnotator across diverse plant taxa. Applying our tool to wheat and combining it with a transcriptvalidated subset of genes from the reference gene annotation, we characterized the structure, phylogeny and expression profile of the NLR gene family. We detected 3,400 fulllength NLR loci of which 1,540 were confirmed as complete genes. NLRs with integrated domains mostly group in specific subclades. Members of another subclade predominantly locate in close physical proximity to NLRs carrying integrated domains suggesting a paired helperfunction. Most NLRs (88%) display low basal expression (in the lower 10 percentile of transcripts), which may be tissuespecific and/or induced by biotic stress. As a case study for applying our tool to the positional cloning of resistance genes, we estimated the number of NLR genes within the intervals of mapped rust resistance genes. Our study will support the identification of functional resistance genes in wheat to accelerate the breeding and engineering of disease resistant varieties.

Possible evidence for spintransfer torque induced by spintriplet supercurrent(arXiv, 20171004)Cooper pairs in superconductors are normally spin singlet. Nevertheless, recent studies suggest that spintriplet Cooper pairs can be created at carefully engineered superconductorferromagnet interfaces. If Cooper pairs are spinpolarized they would transport not only charge but also a net spin component, but without dissipation, and therefore minimize the heating effects associated with spintronic devices. Although it is now established that triplet supercurrents exist, their most interesting property  spin  is only inferred indirectly from transport measurements. In conventional spintronics, it is well known that spin currents generate spintransfer torques that alter magnetization dynamics and switch magnetic moments. The observation of similar effects due to spintriplet supercurrents would not only confirm the net spin of triplet pairs but also pave the way for applications of superconducting spintronics. Here, we present a possible evidence for spintransfer torques induced by triplet supercurrents in superconductor/ferromagnet/superconductor (S/F/S) Josephson junctions. Below the superconducting transition temperature T_c, the ferromagnetic resonance (FMR) field at Xband (~ 9.0 GHz) shifts rapidly to a lower field with decreasing temperature due to the spintransfer torques induced by triplet supercurrents. In contrast, this phenomenon is absent in ferromagnet/superconductor (F/S) bilayers and superconductor/insulator/ferromagnet/superconductor (S/I/F/S) multilayers where no supercurrents pass through the ferromagnetic layer. These experimental observations are discussed with theoretical predictions for ferromagnetic Josephson junctions with precessing magnetization.

Precision phenotyping reveals novel loci for quantitative resistance to septoria tritici blotch in European winter wheat(Cold Spring Harbor Laboratory, 20181221)Accurate, highthroughput phenotyping for quantitative traits is the limiting factor for progress in plant breeding. We developed automated image analysis to measure quantitative resistance to septoria tritici blotch (STB), a globally important wheat disease, enabling identification of small chromosome intervals containing plausible candidate genes for STB resistance. 335 winter wheat cultivars were included in a replicated field experiment that experienced natural epidemic development by a highly diverse but fungicideresistant pathogen population. More than 5.4 million automatically generated phenotypes were associated with 13,648 SNP markers to perform a GWAS. We identified 26 chromosome intervals explaining 1.910.6% of the variance associated with four resistance traits. Seventeen of the intervals were less than 5 Mbp in size and encoded only 173 genes, including many genes associated with disease resistance. Five intervals contained four or fewer genes, providing high priority targets for functional validation. Ten chromosome intervals were not previously associated with STB resistance. Our experiment illustrates how highthroughput automated phenotyping can accelerate breeding for quantitative disease resistance. The SNP markers associated with these chromosome intervals can be used to recombine different forms of quantitative STB resistance that are likely to be more durable than pyramids of major resistance genes.

Predictive Systems Toxicology(arXiv, 20180115)In this review we address to what extent computational techniques can augment our ability to predict toxicity. The first section provides a brief history of empirical observations on toxicity dating back to the dawn of Sumerian civilization. Interestingly, the concept of dose emerged very early on, leading up to the modern emphasis on kinetic properties, which in turn encodes the insight that toxicity is not solely a property of a compound but instead depends on the interaction with the host organism. The next logical step is the current conception of evaluating drugs from a personalized medicine pointofview. We review recent work on integrating what could be referred to as classical pharmacokinetic analysis with emerging systems biology approaches incorporating multiple omics data. These systems approaches employ advanced statistical analytical data processing complemented with machine learning techniques and use both pharmacokinetic and omics data. We find that such integrated approaches not only provide improved predictions of toxicity but also enable mechanistic interpretations of the molecular mechanisms underpinning toxicity and drug resistance. We conclude the chapter by discussing some of the main challenges, such as how to balance the inherent tension between the predictive capacity of models, which in practice amounts to constraining the number of features in the models versus allowing for rich mechanistic interpretability, i.e. equipping models with numerous molecular features. This challenge also requires patientspecific predictions on toxicity, which in turn requires proper stratification of patients as regards how they respond, with or without adverse toxic effects. In summary, the transformation of the ancient concept of dose is currently successfully operationalized using rich integrative data encoded in patientspecific models.

Pricing American Options by Exercise Rate Optimization(arXiv, 20180920)We present a novel method for the numerical pricing of American options based on Monte Carlo simulation and optimization of exercise strategies. Previous solutions to this problem either explicitly or implicitly determine socalled optimal \emph{exercise regions}, which consist of points in time and space at which the option is exercised. In contrast, our method determines \emph{exercise rates} of randomized exercise strategies. We show that the supremum of the corresponding stochastic optimization problem provides the correct option price. By integrating analytically over the random exercise decision, we obtain an objective function that is differentiable with respect to perturbations of the exercise rate even for finitely many sample paths. Starting in a neutral strategy with constant exercise rate then allows us to globally optimize this function in a gradual manner. Numerical experiments on vanilla put options in the multivariate BlackScholes model and preliminary theoretical analysis underline the efficiency of our method both with respect to the number of timediscretization steps and the required number of degrees of freedom in the parametrization of exercise rates. Finally, the flexibility of our method is demonstrated by numerical experiments on max call options in the BlackScholes model and vanilla put options in Heston model and the nonMarkovian rough Bergomi model.

Privacy preserving randomized gossip algorithms(arXiv, 20170623)In this work we present three different randomized gossip algorithms for solving the average consensus problem while at the same time protecting the information about the initial private values stored at the nodes. We give iteration complexity bounds for all methods, and perform extensive numerical experiments.

Proteomelevel assessment of origin, prevalence and function of LeucineAspartic Acid (LD) motifs(Cold Spring Harbor Laboratory, 20180311)Short Linear Motifs (SLiMs) contribute to almost every cellular function by connecting appropriate protein partners. Accurate prediction of SLiMs is difficult due to their shortness and sequence degeneracy. Leucineaspartic acid (LD) motifs are SLiMs that link paxillin family proteins to factors controlling (cancer) cell adhesion, motility and survival. The existence and importance of LD motifs beyond the paxillin family is poorly understood. To enable a proteomewide assessment of these motifs, we developed an activelearning based framework that iteratively integrates computational predictions with experimental validation. Our analysis of the human proteome identified a dozen proteins that contain LD motifs, all being involved in cell adhesion and migration, and revealed a new type of inverse LD motif consensus. Our evolutionary analysis suggested that LD motif signalling originated in the common unicellular ancestor of opisthokonts and amoebozoa by coopting nuclear export sequences. Interspecies comparison revealed a conserved LD signalling core, and reveals the emergence of speciesspecific adaptive connections, while maintaining a strong functional focus of the LD motif interactome. Collectively, our data elucidate the mechanisms underlying the origin and adaptation of an ancestral SLiM.

Quantitative SeqLGS: GenomeWide Identification of Genetic Drivers of Multiple Phenotypes in Malaria Parasites(Cold Spring Harbor Laboratory Press, 20161001)Identifying the genetic determinants of phenotypes that impact on disease severity is of fundamental importance for the design of new interventions against malaria. Traditionally, such discovery has relied on laborintensive approaches that require significant investments of time and resources. By combining Linkage Group Selection (LGS), quantitative whole genome population sequencing and a novel mathematical modeling approach (qSeqLGS), we simultaneously identified multiple genes underlying two distinct phenotypes, identifying novel alleles for growth rate and strain specific immunity (SSI), while removing the need for traditionally required steps such as cloning, individual progeny phenotyping and marker generation. The detection of novel variants, verified by experimental phenotyping methods, demonstrates the remarkable potential of this approach for the identification of genes controlling selectable phenotypes in malaria and other apicomplexan parasites for which experimental genetic crosses are amenable.

Randomized Block Cubic Newton Method(arXiv, 20180212)We study the problem of minimizing the sum of three convex functions: a differentiable, twicedifferentiable and a nonsmooth term in a high dimensional setting. To this effect we propose and analyze a randomized block cubic Newton (RBCN) method, which in each iteration builds a model of the objective function formed as the sum of the natural models of its three components: a linear model with a quadratic regularizer for the differentiable term, a quadratic model with a cubic regularizer for the twice differentiable term, and perfect (proximal) model for the nonsmooth term. Our method in each iteration minimizes the model over a random subset of blocks of the search variable. RBCN is the first algorithm with these properties, generalizing several existing methods, matching the best known bounds in all special cases. We establish ${\cal O}(1/\epsilon)$, ${\cal O}(1/\sqrt{\epsilon})$ and ${\cal O}(\log (1/\epsilon))$ rates under different assumptions on the component functions. Lastly, we show numerically that our method outperforms the stateoftheart on a variety of machine learning problems, including cubically regularized leastsquares, logistic regression with constraints, and Poisson regression.

A Randomized Exchange Algorithm for Computing Optimal Approximate Designs of Experiments(arXiv, 20180117)We propose a class of subspace ascent methods for computing optimal approximate designs that covers both existing as well as new and more efficient algorithms. Within this class of methods, we construct a simple, randomized exchange algorithm (REX). Numerical comparisons suggest that the performance of REX is comparable or superior to the performance of stateoftheart methods across a broad range of problem structures and sizes. We focus on the most commonly used criterion of Doptimality that also has applications beyond experimental design, such as the construction of the minimum volume ellipsoid containing a given set of datapoints. For Doptimality, we prove that the proposed algorithm converges to the optimum. We also provide formulas for the optimal exchange of weights in the case of the criterion of Aoptimality. These formulas enable one to use REX for computing Aoptimal and Ioptimal designs.

Robust Beamforming in CacheEnabled Cloud Radio Access Networks(arXiv, 20160906)Popular content caching is expected to play a major role in efficiently reducing backhaul congestion and achieving user satisfaction in next generation mobile radio systems. Consider the downlink of a cacheenabled cloud radio access network (CRAN), where each cacheenabled base station (BS) is equipped with limitedsize local cache storage. The central computing unit (cloud) is connected to the BSs via a limited capacity backhaul link and serves a set of singleantenna mobile users (MUs). This paper assumes that only imperfect channel state information (CSI) is available at the cloud. It focuses on the problem of minimizing the total network power and backhaul cost so as to determine the beamforming vector of each user across the network, the quantization noise covariance matrix, and the BS clustering subject to imperfect channel state information and fixed cache placement assumptions. The paper suggests solving such a difficult, nonconvex optimization problem using the semidefinite relaxation (SDR). The paper then uses the ℓ0norm approximation to provide a feasible, suboptimal solution using the majorizationminimization (MM) approach. Simulation results particularly show how the cacheenabled network significantly improves the backhaul cost especially at high signaltointerferenceplusnoise ratio (SINR) values as compared to conventional cacheless CRANs.