For more information visit: https://stat.kaust.edu.sa/

Recent Submissions

  • Semiparametric estimation of cross-covariance functions for multivariate random fields

    Qadir, Ghulam A.; Sun, Ying (Biometrics, Wiley, 2020-07-06) [Article]
    The prevalence of spatially referenced multivariate data has impelled researchers to develop procedures for joint modeling of multiple spatial processes. This ordinarily involves modeling marginal and cross-process dependence for any arbitrary pair of locations using a multivariate spatial covariance function. However, building a flexible multivariate spatial covariance function that is nonnegative definite is challenging. Here, we propose a semiparametric approach for multivariate spatial covariance function estimation with approximate Matérn marginals and highly flexible cross-covariance functions via their spectral representations. The flexibility in our cross-covariance function arises due to B-spline based specification of the underlying coherence functions, which in turn allows us to capture non-trivial cross-spectral features. We then develop a likelihood-based estimation procedure and perform multiple simulation studies to demonstrate the performance of our method, especially on the coherence function estimation. Finally, we analyze particulate matter concentrations (PM2.5) and wind speed data over the West-North-Central climatic region of the United States, where we illustrate that our proposed method outperforms the commonly used full bivariate Matérn model and the linear model of coregionalization for spatial prediction.
  • Break Point Detection for Functional Covariance

    Jiao, Shuhao; Frostig, Ron D.; Ombao, Hernando (arXiv, 2020-06-24) [Preprint]
    Many experiments record sequential trajectories that oscillate around zero. Such trajectories can be viewed as zero-mean functional data. When there are structural breaks (on the sequence of curves) in higher order moments, it is often difficult to spot these by mere visual inspection. Thus, we propose a detection and testing procedure to find the change-points in functional covariance. The method is fully functional in the sense that no dimension reduction is needed. We establish the asymptotic properties of the estimated change-point. The effectiveness of the proposed method is numerically validated in the simulation studies and an application to study structural changes in rat brain signals in a stroke experiment.
  • Conditional Normal Extreme-Value Copulas

    Krupskii, Pavel; Genton, Marc G. (arXiv, 2020-06-21) [Preprint]
    We propose a new class of extreme-value copulas which are extreme-value limits of conditional normal models. Conditional normal models are generalizations of conditional independence models, where the dependence among observed variables is modeled using one unobserved factor. Conditional on this factor, the distribution of these variables is given by the Gaussian copula. This structure allows one to build flexible and parsimonious models for data with complex dependence structures, such as data with spatial or temporal dependence. We study the extreme-value limits of these models and show some interesting special cases of the proposed class of copulas. We develop estimation methods for the proposed models and conduct a simulation study to assess the performance of these algorithms. Finally, we apply these copula models to analyze data on monthly wind maxima and stock return minima.
  • A Better Alternative to Error Feedback for Communication-Efficient Distributed Learning

    Horvath, Samuel; Richtarik, Peter (arXiv, 2020-06-19) [Preprint]
    Modern large-scale machine learning applications require stochastic optimization algorithms to be implemented on distributed compute systems. A key bottleneck of such systems is the communication overhead for exchanging information across the workers, such as stochastic gradients. Among the many techniques proposed to remedy this issue, one of the most successful is the framework of compressed communication with error feedback (EF). EF remains the only known technique that can deal with the error induced by contractive compressors which are not unbiased, such as Top-$K$. In this paper, we propose a new and theoretically and practically better alternative to EF for dealing with contractive compressors. In particular, we propose a construction which can transform any contractive compressor into an induced unbiased compressor. Following this transformation, existing methods able to work with unbiased compressors can be applied. We show that our approach leads to vast improvements over EF, including reduced memory requirements, better communication complexity guarantees and fewer assumptions. We further extend our results to federated learning with partial participation following an arbitrary distribution over the nodes, and demonstrate the benefits thereof. We perform several numerical experiments which validate our theoretical findings.
  • High-resolution Bayesian mapping of landslide hazard with unobserved trigger event

    Opitz, Thomas; Bakka, Haakon; Huser, Raphaël; Lombardo, Luigi (arXiv, 2020-06-14) [Preprint]
    Statistical models for landslide hazard enable mapping of risk factors and landslide occurrence intensity by using geomorphological covariates available at high spatial resolution. However, the spatial distribution of the triggering event (e.g., precipitation or earthquakes) is often not directly observed. In this paper, we develop Bayesian spatial hierarchical models for point patterns of landslide occurrences using different types of log-Gaussian Cox processes. Starting from a competitive baseline model that captures the unobserved precipitation trigger through a spatial random effect at slope unit resolution, we explore novel complex model structures that take clusters of events arising at small spatial scales into account, as well as nonlinear or spatially-varying covariate effects. For a 2009 event of around 4000 precipitation-triggered landslides in Sicily, Italy, we show how to fit our proposed models efficiently using the integrated nested Laplace approximation (INLA), and rigorously compare the performance of our models both from a statistical and applied perspective. In this context, we argue that model comparison should not be based on a single criterion, and that different models of various complexity may provide insights into complementary aspects of the same applied problem. In our application, our models are found to have mostly the same spatial predictive performance, implying that key to successful prediction is the inclusion of a slope-unit resolved random effect capturing the precipitation trigger. Interestingly, a parsimonious formulation of space-varying slope effects reflects a physical interpretation of the precipitation trigger: in subareas with weak trigger, the slope steepness is shown to be mostly irrelevant.
  • DDOS-attacks detection using an efficient measurement-based statistical mechanism

    Bouyeddou, Benamar; Kadri, Benamar; Harrou, Fouzi; Sun, Ying (Engineering Science and Technology, an International Journal, Elsevier BV, 2020-06-09) [Article]
    A monitoring mechanism is vital for detecting malicious attacks against cyber systems. Detecting denial of service (DOS) and distributed DOS (DDOS) is one of the most important security challenges facing network technologies. This paper introduces a reliable detection mechanism based on the continuous ranked probability score (CRPS) statistical metric and exponentially smoothing (ES) scheme for enabling efficient detection of DOS and DDOS attacks. In this regard, the CRPS is used to quantify the dissimilarity between a new observation and the distribution of normal traffic. The ES scheme, which is sensitive in detecting small changes, is applied to CRPS measurements for anomaly detection. Moreover, in CRPS-ES approach, a nonparametric decision threshold computed via kernel density estimation is used to suitably detect anomalies. Tests on three publically available datasets proclaim the efficiency of the proposed mechanism in detecting cyber-attacks.
  • The diffusion-based extension of the Matérn field to space-time

    Bakka, Haakon; Krainski, Elias; Bolin, David; Rue, Haavard; Lindgren, Finn (arXiv, 2020-06-08) [Preprint]
    The Mat\' ern field is the most well known family of covariance functions used for Gaussian processes in spatial models. We build upon the original research of Whittle (1953, 1964) and develop the diffusion-based extension of the Mat\' ern field to space-time (DEMF). We argue that this diffusion-based extension is the natural extension of these processes, due to the strong physical interpretation. The corresponding non-separable spatio-temporal Gaussian process is a spatio-temporal analogue of the Mat\' ern field, with range parameters in space and time, smoothness parameters in space and time, and a separability parameter. We provide a sparse representation based on finite element methods that is well suited for statistical inference.
  • Assessing Non-Stationary Heatwave Hazard with Magnitude-Dependent Spatial Extremal Dependence

    Zhong, Peng; Huser, Raphaël; Opitz, Thomas (arXiv, 2020-06-02) [Preprint]
    The modeling of spatio-temporal trends in temperature extremes can help better understand the structure and frequency of heatwaves in a changing climate. Here, we study annual temperature maxima over Southern Europe using a century-spanning dataset observed at 44 monitoring stations. Extending the spectral representation of max-stable processes, our modeling framework relies on a novel construction of max-infinitely divisible processes, which include covariates to capture spatio-temporal non-stationarities. Our new model keeps a popular max-stable process on the boundary of the parameter space, while flexibly capturing weakening extremal dependence at increasing quantile levels and asymptotic independence. This is achieved by linking the overall magnitude of a spatial event to its spatial correlation range, in such a way that more extreme events become less spatially dependent, thus more localized. Our model reveals salient features of the spatio-temporal variability of European temperature extremes, and it clearly outperforms natural alternative models. Results show that the spatial extent of heatwaves is smaller for more severe events at higher altitudes, and that recent heatwaves are moderately wider. Our probabilistic assessment of the 2019 annual maxima confirms the severity of the 2019 heatwaves both spatially and at individual sites, especially when compared to climatic conditions prevailing in 1950-1975.
  • Robust Functional Multivariate Analysis of Variance with Environmental Applications

    Qu, Zhuo; Dai, Wenlin; Genton, Marc G. (Environmetrics, Wiley, 2020-05-31) [Article]
    We propose median polish for functional multivariate analysis of variance (FMANOVA) with the implementation of depth for multivariate functional data. As an alternative to classical mean estimation, functional median polish estimates the functional grand effect and factor effects based on functional medians in one-way and two-way additive FMANOVA models. Median polish estimates in FMANOVA are visually unbiased, independently of the choice of multivariate functional depth. The corresponding mean-based and rank-based tests are generalized to evaluate whether the functional medians in various levels of the factors are the same. Simulation studies illustrate the robustness of our functional median polish in various scenarios, compared with the results from classical FMANOVA fitted by means. The results are evaluated both marginally and jointly. Three environmental datasets are considered to illustrate that our median polish is robust against outliers in practical implementations. Functional boxplots and heatmaps are two ways of visualizing the functional factors, depending on whether the functional data are curves or images, respectively.
  • Multiscale modelling of replicated nonstationary time series

    Embleton, Jonathan; Knight, Marina I.; Ombao, Hernando (arXiv, 2020-05-19) [Preprint]
    Within the neurosciences, to observe variability across time in the dynamics of an underlying brain process is neither new nor unexpected. Wavelets are essential in analyzing brain signals because, even within a single trial, brain signals exhibit nonstationary behaviour. However, neurological signals generated within an experiment may also potentially exhibit evolution across trials (replicates). As neurologists consider localised spectra of brain signals to be most informative, here we develop a novel wavelet-based tool capable to formally represent process nonstationarities across both time and replicate dimensions. Specifically, we propose the Replicate Locally Stationary Wavelet (RLSW) process, that captures the potential nonstationary behaviour within and across trials. Estimation using wavelets gives a natural desired time- and replicate-localisation of the process dynamics. We develop the associated spectral estimation framework and establish its asymptotic properties. By means of thorough simulation studies, we demonstrate the theoretical estimator properties hold in practice. A real data investigation into the evolutionary dynamics of the hippocampus and nucleus accumbens during an associative learning experiment, demonstrate the applicability of our proposed methodology, as well as the new insights it provides.
  • On robust spectrum sensing using M-estimators of covariance matrix

    Liu, Zhedong; Kammoun, Abla; Alouini, Mohamed-Slim (Science China Information Sciences, Springer Science and Business Media LLC, 2020-05-18) [Article]
    Most of the spectrum sensing techniques are designed for Gaussian noise. These techniques do not consider the environment with the non-Gaussian (impulsive or heavy-tailed) noise. In a wireless communication system, impulsive noise frequently occurs and originates from numerous sources, for instance, switching transients in power lines, vehicle ignition, microwave ovens and devices with electromechanical switches. Under those circumstances, sensing techniques designed for Gaussian noise may be highly susceptible to severe degradation of performance.
  • Closing the gap between wind energy targets and implementation for emerging countries

    Giani, Paolo; Tagle, Felipe; Genton, Marc G.; Castruccio, Stefano; Crippa, Paola (Applied Energy, Elsevier BV, 2020-05-18) [Article]
    Policymakers worldwide have set challenging sustainable energy targets to decarbonize their economy. Despite the ambitious pledges, several emerging countries still lack an actual progress towards the envisioned goals, often due to the scarcity of accurate data. Here, we propose a practical methodology for bridging the gap between the wind energy targets and their implementation. We illustrate our new methodology by focusing on Saudi Arabia, which endeavors to play a leading role in the renewable energy sector and pledges to install 16GW of wind capacity by 2030. We propose a blueprint for the optimal wind farms buildout, combining novel high-resolution model simulations, a unique set of observations, land-use restrictions and a thorough cost assessment. The most suitable technological option is selected among multiple turbine models for each potential site. Our findings suggest that Saudi Arabia is well positioned to become a role model for wind energy development within the Middle East, with 26% of the electricity demand that could be met by wind power. The average levelized cost of energy of the proposed buildout is 39 USD MWh−1, which confirms the competitiveness of wind resources in Saudi Arabia. We identify the area close to Gulf of Aqaba as the most cost-effective region for wind harvesting, with turbines characterized by moderate specific rating (350 W m−2) at relatively low hub height (75 m). The modelling framework proposed in this work can be adopted by other countries aiming to start or strengthen their wind energy portfolio.
  • Bivariate Functional Quantile Envelopes with Application to Radiosonde Wind Data

    Agarwal, Gaurav; Sun, Ying (Technometrics, Informa UK Limited, 2020-05-18) [Article]
    The global radiosonde archives contain valuable weather data, such as temperature, humidity, wind speed, wind direction, and atmospheric pressure. Being the only direct measurement of these variables in the upper air, they are prone to errors. Therefore, a robust analysis and outlier detection of radiosonde data is essential. Among all the variables, the radiosonde winds, which consist of wind speed and direction, are particularly challenging to analyze. In this paper, we treat the wind profiles as bivariate functional data across several pressure levels. Since the bivariate distribution of the components of radiosonde winds at a given pressure level is not Gaussian but instead skewed and heavy-tailed, we propose a set of robust quantile methods to characterize the distribution as well as an outlier detection procedure to identify both magnitude and shape outliers. The proposed methods provide an informative visualization tool for bivariate functional data. We also introduce two methods of predicting this bivariate distribution at unobserved pressure levels. In our simulation study, we show that our methods are robust against different types of outliers and skewed data. Finally, we apply our methods to radiosonde wind data in order to illustrate our proposed quantile analysis methods for visualization, outlier detection, and prediction.
  • Collective spectral density estimation and clustering for spatially-correlated data

    Chen, Tianbo; Sun, Ying; Maadooliat, Mehdi (Spatial Statistics, Elsevier BV, 2020-05-16) [Article]
    In this paper, we develop a method for estimating and clustering two-dimensional spectral density functions (2D-SDFs) for spatial data from multiple subregions. We use a common set of adaptive basis functions to explain the similarities among the 2D-SDFs in a low-dimensional space and estimate the basis coefficients by maximizing the Whittle likelihood with two penalties. We apply these penalties to impose the smoothness of the estimated 2D-SDFs and the spatial dependence of the spatially-correlated subregions. The proposed technique provides a score matrix, that is comprised of the estimated coefficients associated with the common set of basis functions representing the 2D-SDFs. Instead of clustering the estimated SDFs directly, we propose to employ the score matrix for clustering purposes, taking advantage of its low-dimensional property. In a simulation study, we demonstrate that our proposed method outperforms other competing estimation procedures used for clustering. Finally, to validate the described clustering method, we apply the procedure to soil moisture data from the Mississippi basin to produce homogeneous spatial clusters. We produce animations to dynamically show the estimation procedure, including the estimated 2D-SDFs and the score matrix, which provide an intuitive illustration of the proposed method.
  • Estimating and forecasting COVID-19 attack rates and mortality

    Ketcheson, David I.; Ombao, Hernando; Moraga, Paula; Ballal, Tarig; Duarte, Carlos M. (Cold Spring Harbor Laboratory, 2020-05-15) [Preprint]
    <jats:p>{We describe a model for estimating past and current infections as well as future deaths due to the ongoing COVID-19 pandemic. The model does not use confirmed case numbers and is based instead on recorded numbers of deaths and on the age specific population distribution. A regularized deconvolution technique is used to infer past infections from recorded deaths. Forecasting is based on a compartmental SIR-type model, combined with a probability distribution for the time from infection to death. The effect of non-pharmaceutical interventions (NPIs) is modelled empirically, based on recent trends in the death rate. The model can also be used to study counterfactual scenarios based on hypothetical NPI policies.</jats:p>
  • Recent developments in complex and spatially correlated functional data

    Martinez Hernandez, Israel; Genton, Marc G. (Brazilian Journal of Probability and Statistics, Institute of Mathematical Statistics, 2020-05-04) [Article]
    As high-dimensional and high-frequency data are being collected on a large scale, the development of new statistical models is being pushed forward. Functional data analysis provides the required statistical methods to deal with large-scale and complex data by assuming that data are continuous functions, for example, realizations of a continuous process (curves) or continuous random field (surfaces), and that each curve or surface is considered as a single observation. Here, we provide an overview of functional data analysis when data are complex and spatially correlated. We provide definitions and estimators of the first and second moments of the corresponding functional random variable. We present two main approaches: The first assumes that data are realizations of a functional random field, that is, each observation is a curve with a spatial component. We call them spatial functional data. The second approach assumes that data are continuous deterministic fields observed over time. In this case, one observation is a surface or manifold, and we call them surface time series. For these two approaches, we describe software available for the statistical analysis. We also present a data illustration, using a high-resolution wind speed simulated dataset, as an example of the two approaches. The functional data approach offers a new paradigm of data analysis, where the continuous processes or random fields are considered as a single entity. We consider this approach to be very valuable in the context of big data.
  • A high-resolution bilevel skew- t stochastic generator for assessing Saudi Arabia's wind energy resources

    Tagle, Felipe; Genton, Marc G.; Yip, Andrew; Mostamandi, Suleiman; Stenchikov, Georgiy L.; Castruccio, Stefano (Environmetrics, Wiley, 2020-05-04) [Article]
    Saudi Arabia has recently established its renewable energy targets as part of its “Vision 2030” proposal, which represents a roadmap for reducing the country's dependence on oil over the next decade. This study provides a foundational assessment of the wind resource in Saudi Arabia that serves as a guide for the development of the outlined wind energy component. The assessment is based on a new high-resolution weather simulation of the region generated with the Weather Research and Forecasting (WRF) model. Furthermore, we propose a spatiotemporal stochastic generator of daily wind speeds that assists in characterizing the uncertainty of the energy estimates. The stochastic generator considers a vector autoregressive structure in time, with innovations from a novel biresolution model based on a skew-t distribution with a low-dimensional latent structure. Estimation of the spatial model parameters is performed using a Monte Carlo expectation-maximization (EM) algorithm, which achieves inference over approximately 184 million points and enables to capture the spatial patterns of the higher order moments that typically characterize high-resolution wind fields. Our results identify regions along the western mountain ranges and central escarpments that are suitable for the deployment of wind energy infrastructure. According to the assessment, between 30 and 70% of the national electricity demand could be met by wind energy.
  • Semiparametric time series models driven by latent factor

    Maia, Gisele O.; Barreto-Souza, Wagner; Bastos, Fernando S.; Ombao, Hernando (arXiv, 2020-04-23) [Preprint]
    We introduce a class of semiparametric time series models by assuming a quasi-likelihood approach driven by a latent factor process. More specifically, given the latent process, we only specify the conditional mean and variance of the time series and enjoy a quasi-likelihood function for estimating parameters related to the mean. This proposed methodology has three remarkable features: (i) no parametric form is assumed for the conditional distribution of the time series given the latent process; (ii) able for modelling non-negative, count, bounded/binary and real-valued time series; (iii) dispersion parameter is not assumed to be known. Further, we obtain explicit expressions for the marginal moments and for the autocorrelation function of the time series process so that a method of moments can be employed for estimating the dispersion parameter and also parameters related to the latent process. Simulated results aiming to check the proposed estimation procedure are presented. Real data analysis on unemployment rate and precipitation time series illustrate the potencial for practice of our methodology.
  • Detecting Dynamic Community Structure in Functional Brain Networks Across Individuals: A Multilayer Approach

    Ting, Chee-Ming; Samdin, S. Balqis; Tang, Meini; Ombao, Hernando (arXiv, 2020-04-09) [Preprint]
    We present a unified statistical framework for characterizing community structure of brain functional networks that captures variation across individuals and evolution over time. Existing methods for community detection focus only on single-subject analysis of dynamic networks; while recent extensions to multiple-subjects analysis are limited to static networks. To overcome these limitations, we propose a multi-subject, Markov-switching stochastic block model (MSS-SBM) to identify state-related changes in brain community organization over a group of individuals. We first formulate a multilayer extension of SBM to describe the time-dependent, multi-subject brain networks. We develop a novel procedure for fitting the multilayer SBM that builds on multislice modularity maximization which can uncover a common community partition of all layers (subjects) simultaneously. By augmenting with a dynamic Markov switching process, our proposed method is able to capture a set of distinct, recurring temporal states with respect to inter-community interactions over subjects and the change points between them. Simulation shows accurate community recovery and tracking of dynamic community regimes over multilayer networks by the MSS-SBM. Application to task fMRI reveals meaningful non-assortative brain community motifs, e.g., core-periphery structure at the group level, that are associated with language comprehension and motor functions suggesting their putative role in complex information integration. Our approach detected dynamic reconfiguration of modular connectivity elicited by varying task demands and identified unique profiles of intra and inter-community connectivity across different task conditions. The proposed multilayer network representation provides a principled way of detecting synchronous, dynamic modularity in brain networks across subjects.
  • Probabilistic Projection of the Sex Ratio at Birth and Missing Female Births by State and Union Territory in India

    Chao, Fengqing; Guilmoto, Christophe Z.; C., Samir K.; Ombao, Hernando (arXiv, 2020-04-05) [Preprint]
    The sex ratio at birth (SRB) in India has been reported imbalanced since the 1970s. Previous studies have shown a great variation in the SRB across geographic locations in India till 2016. As one of the most populous countries and in view of its great regional heterogeneity, it is crucial to produce probabilistic projections for the SRB in India at state level for the purpose of population projection and policy planning. In this paper, we implement a Bayesian hierarchical time series model to project SRB in India by state. We generate SRB probabilistic projections from 2017 to 2030 for 29 States and Union Territories (UTs) in India, and present results in 21 States/UTs with data from the Sample Registration System. Our analysis takes into account two state-specific factors that contribute to sex-selective abortion and resulting sex imbalances at birth: intensity of son preference and fertility squeeze. We project that the largest contribution to female births deficits is in Uttar Pradesh, with cumulative number of missing female births projected to be 2.0 (95% credible interval [1.9; 2.2]) million from 2017 to 2030. The total female birth deficits during 2017-2030 for the whole India is projected to be 6.8 [6.6; 7.0] million.

View more