Now showing items 1-20 of 195

• #### Robust Beamforming in Cache-Enabled Cloud Radio Access Networks

(arXiv, 2016-09-06)
Popular content caching is expected to play a major role in efficiently reducing backhaul congestion and achieving user satisfaction in next generation mobile radio systems. Consider the downlink of a cache-enabled cloud radio access network (CRAN), where each cache-enabled base station (BS) is equipped with limited-size local cache storage. The central computing unit (cloud) is connected to the BSs via a limited capacity backhaul link and serves a set of single-antenna mobile users (MUs). This paper assumes that only imperfect channel state information (CSI) is available at the cloud. It focuses on the problem of minimizing the total network power and backhaul cost so as to determine the beamforming vector of each user across the network, the quantization noise covariance matrix, and the BS clustering subject to imperfect channel state information and fixed cache placement assumptions. The paper suggests solving such a difficult, non-convex optimization problem using the semidefinite relaxation (SDR). The paper then uses the ℓ0-norm approximation to provide a feasible, sub-optimal solution using the majorization-minimization (MM) approach. Simulation results particularly show how the cache-enabled network significantly improves the backhaul cost especially at high signal-to-interference-plus-noise ratio (SINR) values as compared to conventional cache-less CRANs.
• #### Measurement Selection: A Random Matrix Theory Approach

(Institute of Electrical and Electronics Engineers (IEEE), 2018-05-15)
This paper considers the problem of selecting a set of $k$ measurements from $n$ available sensor observations. The selected measurements should minimize a certain error function assessing the error in estimating a certain $m$ dimensional parameter vector. The exhaustive search inspecting each of the $n\choose k$ possible choices would require a very high computational complexity and as such is not practical for large $n$ and $k$. Alternative methods with low complexity have recently been investigated but their main drawbacks are that 1) they require perfect knowledge of the measurement matrix and 2) they need to be applied at the pace of change of the measurement matrix. To overcome these issues, we consider the asymptotic regime in which $k$, $n$ and $m$ grow large at the same pace. Tools from random matrix theory are then used to approximate in closed-form the most important error measures that are commonly used. The asymptotic approximations are then leveraged to select properly $k$ measurements exhibiting low values for the asymptotic error measures. Two heuristic algorithms are proposed: the first one merely consists in applying the convex optimization artifice to the asymptotic error measure. The second algorithm is a low-complexity greedy algorithm that attempts to look for a sufficiently good solution for the original minimization problem. The greedy algorithm can be applied to both the exact and the asymptotic error measures and can be thus implemented in blind and channel-aware fashions. We present two potential applications where the proposed algorithms can be used, namely antenna selection for uplink transmissions in large scale multi-user systems and sensor selection for wireless sensor networks. Numerical results are also presented and sustain the efficiency of the proposed blind methods in reaching the performances of channel-aware algorithms.
• #### Pricing American Options by Exercise Rate Optimization

(arXiv, 2018-09-20)
We present a novel method for the numerical pricing of American options based on Monte Carlo simulation and optimization of exercise strategies. Previous solutions to this problem either explicitly or implicitly determine so-called optimal \emph{exercise regions}, which consist of points in time and space at which the option is exercised. In contrast, our method determines \emph{exercise rates} of randomized exercise strategies. We show that the supremum of the corresponding stochastic optimization problem provides the correct option price. By integrating analytically over the random exercise decision, we obtain an objective function that is differentiable with respect to perturbations of the exercise rate even for finitely many sample paths. Starting in a neutral strategy with constant exercise rate then allows us to globally optimize this function in a gradual manner. Numerical experiments on vanilla put options in the multivariate Black--Scholes model and preliminary theoretical analysis underline the efficiency of our method both with respect to the number of time-discretization steps and the required number of degrees of freedom in the parametrization of exercise rates. Finally, the flexibility of our method is demonstrated by numerical experiments on max call options in the Black--Scholes model and vanilla put options in Heston model and the non-Markovian rough Bergomi model.
• #### Hierarchical adaptive sparse grids for option pricing under the rough Bergomi model

(2018-12-21)
The rough Bergomi (rBergomi) model, introduced recently in [4], is a promising rough volatility model in quantitative finance. This new model exhibits consistent results with the empirical fact of implied volatility surfaces being essentially time-invariant. This model also has the ability to capture the term structure of skew observed in equity markets. In the absence of analytical European option pricing methods for the model, and due to the non-Markovian nature of the fractional driver, the prevalent option is to use Monte Carlo (MC) simulation for pricing. Despite recent advances in the MC method in this context, pricing under the rBergomi model is still a time-consuming task. To overcome this issue, we design a novel, alternative, hierarchical approach, based on adaptive sparse grids quadrature, specifically using the same construction as multi-index stochastic collocation (MISC) [21], coupled with Brownian bridge construction and Richardson extrapolation. By uncovering the available regularity, our hierarchical method demonstrates substantial computational gains with respect to the standard MC method, when reaching a sufficiently small error tolerance in the price estimates across different parameter constellations, even for very small values of the Hurst parameter. Our work opens a new research direction in this field, i.e. to investigate the performance of methods other than Monte Carlo for pricing and calibrating under the rBergomi model.
• #### IGA-based Multi-Index Stochastic Collocation for random PDEs on arbitrary domains

(arXiv, 2018-10-26)
This paper proposes an extension of the Multi-Index Stochastic Collocation (MISC) method for forward uncertainty quantification (UQ) problems in computational domains of shape other than a square or cube, by exploiting isogeometric analysis (IGA) techniques. Introducing IGA solvers to the MISC algorithm is very natural since they are tensor-based PDE solvers, which are precisely what is required by the MISC machinery. Moreover, the combination-technique formulation of MISC allows the straight-forward reuse of existing implementations of IGA solvers. We present numerical results to showcase the effectiveness of the proposed approach.
• #### Multilevel ensemble Kalman filtering for spatio-temporal processes

(arXiv, 2018-02-02)
This work concerns state-space models, in which the state-space is an infinite-dimensional spatial field, and the evolution is in continuous time, hence requiring approximation in space and time. The multilevel Monte Carlo (MLMC) sampling strategy is leveraged in the Monte Carlo step of the ensemble Kalman filter (EnKF), thereby yielding a multilevel ensemble Kalman filter (MLEnKF) for spatio-temporal models, which has provably superior asymptotic error/cost ratio. A practically relevant stochastic partial differential equation (SPDE) example is presented, and numerical experiments with this example support our theoretical findings.
• #### Nesterov-aided Stochastic Gradient Methods using Laplace Approximation for Bayesian Design Optimization

(arXiv, 2018-07-02)
Finding the best set-up for the design of experiments is the main concern of Optimal Experimental Design (OED). We focus on the Bayesian problem of finding the set-up that maximizes the Shannon’s expected information gain. We propose using the stochastic gradient descent and its accelerated counterpart, which employs Nesterov’s method, to solve the optimization problem in OED. We couple these optimization methods with three estimators of the objective function: a double loop Monte Carlo (DLMC), a Laplace approximation of the posterior distribution and a Laplace-based importance sampling. The use of stochastic gradient methods and Laplace-based estimators allow us to afford expensive and complex models, for example, those that require solving a partial differential equation (PDE). From a theoretical viewpoint, we derive an explicit formula to compute the stochastic gradient of Monte Carlo with Laplace method. Finally, from a computational standpoint, we study four examples: three based on analytical functions and one on the finite element method solution of a PDE. The latter is an electrical impedance tomography experiment based on the complete electrode model. The accelerated stochastic gradient with Laplace approximation converges to local maxima in fewer model evaluations by up to five orders of magnitude than gradient descent with DLMC.
• #### Multilevel Monte Carlo Acceleration of Seismic Wave Propagation under Uncertainty

(arXiv, 2018-11-28)
We interpret uncertainty in a model for seismic wave propagation by treating the model parameters as random variables, and apply the Multilevel Monte Carlo (MLMC) method to reduce the cost of approximating expected values of selected, physically relevant, quantities of interest (QoI) with respect to the random variables. Targeting source inversion problems, where the source of an earthquake is inferred from ground motion recordings on the Earth's surface, we consider two QoI that measure the discrepancies between computed seismic signals and given reference signals: one QoI, QoI_E, is defined in terms of the L^2-misfit, which is directly related to maximum likelihood estimates of the source parameters; the other, QoI_W, is based on the quadratic Wasserstein distance between probability distributions, and represents one possible choice in a class of such misfit functions that have become increasingly popular to solve seismic inversion in recent years. We simulate seismic wave propagation, including seismic attenuation, using a publicly available code in widespread use, based on the spectral element method. Using random coefficients and deterministic initial and boundary data, we present benchmark numerical experiments with synthetic data in a two-dimensional physical domain and a one-dimensional velocity model where the assumed parameter uncertainty is motivated by realistic Earth models. Here, the computational cost of the standard Monte Carlo method was reduced by up to 97% for QoI_E, and up to 78% for QoI_W, using a relevant range of tolerances. Shifting to three-dimensional domains is straight-forward and will further increase the relative computational work reduction.
• #### Multilevel Double Loop Monte Carlo and Stochastic Collocation Methods with Importance Sampling for Bayesian Optimal Experimental Design

(arXiv, 2018-11-28)
An optimal experimental set-up maximizes the value of data for statistical inference and prediction, which is particularly important for experiments that are time consuming or expensive to perform. In the context of partial differential equations (PDEs), multilevel methods have been proven in many cases to dramatically reduce the computational complexity of their single-level counterparts. Here, two multilevel methods are proposed to efficiently compute the expected information gain using a Kullback-Leibler divergence measure in simulation-based Bayesian optimal experimental design. The first method is a multilevel double loop Monte Carlo (MLDLMC) with importance sampling, which greatly reduces the computational work of the inner loop. The second proposed method is a multilevel double loop stochastic collocation (MLDLSC) with importance sampling, which is high-dimensional integration by deterministic quadrature on sparse grids. In both methods, the Laplace approximation is used as an effective means of importance sampling, and the optimal values for method parameters are determined by minimizing the average computational work subject to a desired error tolerance. The computational efficiencies of the methods are demonstrated for computing the expected information gain for Bayesian inversion to infer the fiber orientation in composite laminate materials by an electrical impedance tomography experiment, given a particular set-up of the electrode configuration. MLDLSC shows a better performance than MLDLMC by exploiting the regularity of the underlying computational model with respect to the additive noise and the unknown parameters to be statistically inferred.
• #### Phenotypic, functional and taxonomic features predict host-pathogen interactions: Table S1; Figure S1

(Cold Spring Harbor Laboratory, 2018-12-31)
Identification of host-pathogen interactions (HPIs) can reveal mechanistic insights of infectious diseases for potential treatments and drug discoveries. Current computational methods for the prediction of HPIs often rely on our knowledge on the sequences and functions of pathogen proteins, which is limited for many species, especially for species of emerging pathogens. Matching the phenotypes elicited by pathogens with phenotypes associated with host proteins might improve the prediction of HPIs. We developed an ontology-based method that prioritizes potential interaction protein partners for pathogens using machine learning models. Our method exploits the underlying disease mechanisms by associating phenotypic and functional features of pathogens and human proteins, corroborated by multiple ontologies as background knowledge. Additionally, by embedding the phenotypic information of the pathogens within a formally represented taxonomy, we demonstrate that our model can also accurately predict interaction partners for pathogens without known phenotypes, using a combination of their taxonomic relationships with other pathogens and information from ontologies as background knowledge. Our results show that the integration of phenotypic, functional and taxonomic knowledge not only improves the prediction of HPIs, but also enables us to investigate novel pathogens in emerging infectious diseases.
• #### Role of MPK4 in pathogen-associated molecular pattern-triggered alternative splicing in Arabidopsis

(Cold Spring Harbor Laboratory, 2019-01-04)
Alternative splicing (AS) of pre-mRNAs in plants is an important mechanism of gene regulation in environmental stress tolerance but plant signals involved are essentially unknown. Pathogen-associated molecular pattern (PAMP)-triggered immunity (PTI) is mediated by mitogen-activated protein kinases and the majority of PTI defense genes are regulated by MPK3, MPK4 and MPK6. These responses have been mainly analyzed at the transcriptional level, however many splicing factors are direct targets of MAPKs. Here, we studied alternative splicing induced by the PAMP flagellin in Arabidopsis. We identified 506 PAMP-induced differentially alternatively spliced (DAS) genes. Although many DAS genes are targets of nonsense-mediated degradation (NMD), only 19% are potential NMD targets. Importantly, of the 506 PAMP-induced DAS genes, only 89 overlap with the set of 1849 PAMP-induced differentially expressed genes (DEG), indicating that transcriptome analysis does not identify most DAS events. Global DAS analysis of mpk3, mpk4, and mpk6 mutants revealed that MPK4 is a key regulator of PAMP-induced differential splicing, regulating AS of a number of splicing factors and immunity-related protein kinases, such as the calcium-dependent protein kinase CPK28, the cysteine-rich receptor like kinases CRK13 and CRK29 or the FLS2 co-receptor SERK4/BKK1.These data suggest that MAP kinase regulation of splicing factors is a key mechanism in PAMP-induced AS regulation of PTI.
• #### Measuring Canopy Structure and Condition Using Multi-Spectral UAS Imagery in a Horticultural Environment

(MDPI AG, 2018-12-29)
Tree condition, pruning and orchard management practices within intensive horticultural tree crop systems can be determined via measurements of tree structure. Multi-spectral imagery acquired from an unmanned aerial system (UAS) has been demonstrated as an accurate and efficient platform for measuring various tree structural attributes, but research in complex horticultural environments has been limited. This research established a methodology for accurately estimating tree crown height, extent, plant projective cover (PPC) and condition of avocado tree crops, from a UAS platform. Individual tree crowns were delineated using object-based image analysis. In comparison to field measured canopy heights, an image-derived canopy height model provided a coefficient of determination (R2) of 0.65 and relative root mean squared error of 6%. Tree crown length perpendicular to the hedgerow was accurately mapped. PPC was measured using spectral and textural image information and produced an R2 value of 0.62 against field data. A random forest classifier was applied to assign tree condition into four categories in accordance with industry standards, producing out-of-bag accuracies >96%. Our results demonstrate the potential of UAS-based mapping for the provision of information to support the horticulture industry and facilitate orchard-based assessment and management.
• #### Precision phenotyping reveals novel loci for quantitative resistance to septoria tritici blotch in European winter wheat

(Cold Spring Harbor Laboratory, 2018-12-21)
Accurate, high-throughput phenotyping for quantitative traits is the limiting factor for progress in plant breeding. We developed automated image analysis to measure quantitative resistance to septoria tritici blotch (STB), a globally important wheat disease, enabling identification of small chromosome intervals containing plausible candidate genes for STB resistance. 335 winter wheat cultivars were included in a replicated field experiment that experienced natural epidemic development by a highly diverse but fungicide-resistant pathogen population. More than 5.4 million automatically generated phenotypes were associated with 13,648 SNP markers to perform a GWAS. We identified 26 chromosome intervals explaining 1.9-10.6% of the variance associated with four resistance traits. Seventeen of the intervals were less than 5 Mbp in size and encoded only 173 genes, including many genes associated with disease resistance. Five intervals contained four or fewer genes, providing high priority targets for functional validation. Ten chromosome intervals were not previously associated with STB resistance. Our experiment illustrates how high-throughput automated phenotyping can accelerate breeding for quantitative disease resistance. The SNP markers associated with these chromosome intervals can be used to recombine different forms of quantitative STB resistance that are likely to be more durable than pyramids of major resistance genes.
• #### Searching and mapping genomic subsequences in nanopore raw signals through novel dynamic time warping algorithms

(Cold Spring Harbor Laboratory, 2018-12-10)
Nanopore sequencing is a promising technology to generate ultra-long reads based on the direct measurement of electrical current signals when a DNA molecule passes through a nanopore. These ultra-long reads are critical for detecting large structural variations in the genome. However, it is challenging to use nanopore sequencing to identify single nucleotide polymorphisms (SNPs) or other modifications such as methylations, especially at a low sequencing coverage, due to the high error rate in the base-called reads. It is possible to correct the base-calling error through the subsequence search by mapping a SNP-containing genomic region to the long nanopore raw signal sequences that contain this region and taking consensus of these signals. Nevertheless, the ultra-long raw signals and an order of magnitude difference in the sampling speed between the two sequences make the traditional algorithms infeasible to solve the problem. Here we propose two novel algorithms, the direct subsequence dynamic time warping for nanopore raw signal search (DSDTWnano) and the continuous wavelet subsequence dynamic time warping for nanopore raw signal search (cwSDTWnano), to enable the direct subsequence searching and exact mapping in nanopore raw signals. The proposed algorithms are based on the idea of subsequence-extended dynamic time warping and directly operate on the raw signals, without any loss of information. DSDTWnano could ensure an output of highly accurate query results and cwSDTWnano is the accelerated version of DSDTWnano, with the help of seeding and multi-scale coarsening of signals that are based on continuous wavelet transform. Furthermore, a novel error function is proposed to specify the mapping accuracy between a genomic sequence and an electrical current signal sequence, which may serve as the standard criterion for further genome-to-signal mapping studies. Comprehensive experiments on three real-world nanopore datasets (human and lambda phage) demonstrate the efficiency and effectiveness of the proposed algorithms. Finally, we show the power of our algorithms in SNP detection under a low coverage (20x) on E. coli, with >95% detection rate. Our program is available at https://github.com/icthrm/cwSDTWnano.git.
• #### PathoPhenoDB: linking human pathogens to their disease phenotypes in support of infectious disease research

(Cold Spring Harbor Laboratory, 2018-12-10)
Understanding the relationship between the pathophysiology of infectious disease, the biology of the causative agent and the development of therapeutic and diagnostic approaches is dependent on the synthesis of a wide range of types of information. Provision of a comprehensive and integrated disease phenotype knowledgebase has the potential to provide novel and orthogonal sources of information for the understanding of infectious agent pathogenesis, and support for research on disease mechanisms. We have developed PathoPhenoDB, a database containing pathogen-to-phenotype associations. PathoPhenoDB relies on manual curation of pathogen-disease relations, on ontology-based text mining as well as manual curation to associate phenotypes with infectious disease. Using Semantic Web technologies, PathoPhenoDB also links to knowledge about drug resistance mechanisms and drugs used in the treatment of infectious diseases. PathoPhenoDB is accessible at http://patho.phenomebrowser.net/, and the data is freely available through a public SPARQL endpoint.
• #### Teaching UAVs to Race: End-to-End Regression of Agile Controls in Simulation

(arXiv, 2018-11-22)
Automating the navigation of unmanned aerial vehicles (UAVs) in diverse scenarios has gained much attention in recent years. However, teaching UAVs to fly in challenging environments remains an unsolved problem, mainly due to the lack of training data. In this paper, we train a deep neural network to predict UAV controls from raw image data for the task of autonomous UAV racing in a photo-realistic simulation. Training is done through imitation learning with data augmentation to allow for the correction of navigation mistakes. Extensive experiments demonstrate that our trained network (when sufficient data augmentation is used) outperforms state-of-the-art methods and flies more consistently than many human pilots. Additionally, we show that our optimized network architecture can run in real-time on embedded hardware, allowing for efficient onboard processing critical for real-world deployment.
• #### TGF-b2, catalase activity, H2O2 output and metastatic potential of diverse types of tumour

(Cold Spring Harbor Laboratory, 2018-11-14)
Theileria annulata is a protozoan parasite that infects and transforms bovine macrophages causing a myeloid-leukaemia-like disease called tropical theileriosis. TGF-b2 is highly expressed in many cancer cells and is significantly increased in Theileria-transformed macrophages, as are levels of Reactive Oxygen Species (ROS), notably H2O2. Here, we describe the interplay between TGF-b2 and ROS in cellular transformation. We show that TGF-b2 drives expression of catalase to reduce the amount of H2O2 produced by T. annulata-transformed bovine macrophages, as well as by human lung (A549) and colon cancer (HT-29) cell lines. Theileria-transformed macrophages attenuated for dissemination express less catalase and produce more H2O2, but regain both virulent migratory and matrigel traversal phenotypes when stimulated with TGF-b2, or catalase that reduce H2O2 output. Increased H2O2 output therefore, underpins the aggressive dissemination phenotype of diverse tumour cell types, but in contrast, too much H2O2 can dampen dissemination.
• #### Vec2SPARQL: integrating SPARQL queries and knowledge graph embeddings

(Cold Spring Harbor Laboratory, 2018-11-08)
Recent developments in machine learning have lead to a rise of large number of methods for extracting features from structured data. The features are represented as a vectors and may encode for some semantic aspects of data. They can be used in a machine learning models for different tasks or to compute similarities between the entities of the data. SPARQL is a query language for structured data originally developed for querying Resource Description Framework (RDF) data. It has been in use for over a decade as a standardized NoSQL query language. Many different tools have been developed to enable data sharing with SPARQL. For example, SPARQL endpoints make your data interoperable and available to the world. SPARQL queries can be executed across multiple endpoints. We have developed a Vec2SPARQL, which is a general framework for integrating structured data and their vector space representations. Vec2SPARQL allows jointly querying vector functions such as computing similarities (cosine, correlations) or classifications with machine learning models within a single SPARQL query. We demonstrate applications of our approach for biomedical and clinical use cases. Our source code is freely available at https://github.com/bio-ontology-research-group/vec2sparql and we make a Vec2SPARQL endpoint available at http://sparql.bio2vec.net/.
• #### Ontology based mining of pathogen-disease associations from literature

(Cold Spring Harbor Laboratory, 2018-10-08)
Background: Infectious diseases claim millions of lives especially in the developing countries each year, and resistance to drugs is an emerging threat worldwide. Identification of causative pathogens accurately and rapidly plays a key role in the success of treatment. To support infectious disease research and mechanisms of infection, there is a need for an open resource on pathogen-disease associations that can be utilized in computational studies. A large number of pathogen-disease associations is available from the literature in unstructured form and we need automated methods to extract the data. Results: We developed a text mining system designed for extracting pathogen-disease relations from literature. Our approach utilizes background knowledge from an ontology and statistical methods for extracting associations between pathogens and diseases. In total, we extracted a total of 3,420 pathogen-disease associations from literature. We integrated our literature-derived associations into a database which links pathogens to their phenotypes for supporting infectious disease research. Conclusions: To the best of our knowledge, we present the first study focusing on extracting pathogen-disease associations from publications. We believe the text mined data can be utilized as a valuable resource for infectious disease research. All the data is publicly available from https://github.com/bio-ontology-research-group/padimi and through a public SPARQL endpoint from http://patho.phenomebrowser.net/.
• #### Spectral-Efficiency - Illumination Pareto Front for Energy Harvesting Enabled VLC System

(2018-10-07)
The continuous improvement in optical energy harvesting devices motivates visible light communication (VLC) system developers to utilize such available free energy sources. An outdoor VLC system is considered where an optical base station sends data to multiple users that are capable of harvesting the optical energy. The proposed VLC system serves multiple users using time division multiple access (TDMA) with unequal time and power allocation, which are allocated to improve the system performance. The adopted optical system provides users with illumination and data communication services. The outdoor optical design objective is to maximize the illumination, while the communication design objective is to maximize the spectral efficiency (SE). The design objectives are shown to be conflicting, therefore, a multiobjective optimization problem is formulated to obtain the Pareto front performance curve for the proposed system. To this end, the marginal optimization problems are solved first using low complexity algorithms. Then, based on the proposed algorithms, a low complexity algorithm is developed to obtain an inner bound of the Pareto front for the illumination-SE tradeoff. The inner bound for the Pareto-front is shown to be close to the optimal Pareto-frontier via several simulation scenarios for different system parameters.