For more information visit: https://stat.kaust.edu.sa/

Recent Submissions

  • Integrated Nested Laplace Approximations for Large-Scale Spatial-Temporal Bayesian Modeling

    Gaedke-Merzhäuser, Lisa; Krainski, Elias Teixeira; Janalik, Radim; Rue, Haavard; Schenk, Olaf (arXiv, 2023-03-27) [Preprint]
    Bayesian inference tasks continue to pose a computational challenge. This especially holds for spatial-temporal modeling where high-dimensional latent parameter spaces are ubiquitous. The methodology of integrated nested Laplace approximations (INLA) provides a framework for performing Bayesian inference applicable to a large subclass of additive Bayesian hierarchical models. In combination with the stochastic partial differential equations (SPDE) approach it gives rise to an efficient method for spatial-temporal modeling. In this work we build on the INLA-SPDE approach, by putting forward a performant distributed memory variant, INLA-DIST, for large-scale applications. To perform the arising computational kernel operations, consisting of Cholesky factorizations, solving linear systems, and selected matrix inversions, we present two numerical solver options, a sparse CPU-based library and a novel blocked GPU-accelerated approach which we propose. We leverage the recurring nonzero block structure in the arising precision (inverse covariance) matrices, which allows us to employ dense subroutines within a sparse setting. Both versions of INLA-DIST are highly scalable, capable of performing inference on models with millions of latent parameters. We demonstrate their accuracy and performance on synthetic as well as real-world climate dataset applications.
  • Towards black-box parameter estimation

    Lenzi, Amanda; Rue, Haavard (arXiv, 2023-03-27) [Preprint]
    Deep learning algorithms have recently shown to be a successful tool in estimating parameters of statistical models for which simulation is easy, but likelihood computation is challenging. But the success of these approaches depends on simulating parameters that sufficiently reproduce the observed data, and, at present, there is a lack of efficient methods to produce these simulations. We develop new black-box procedures to estimate parameters of statistical models based only on weak parameter structure assumptions. For well-structured likelihoods with frequent occurrences, such as in time series, this is achieved by pre-training a deep neural network on an extensive simulated database that covers a wide range of data sizes. For other types of complex dependencies, an iterative algorithm guides simulations to the correct parameter region in multiple rounds. These approaches can successfully estimate and quantify the uncertainty of parameters from non-Gaussian models with complex spatial and temporal dependencies. The success of our methods is a first step towards a fully flexible automatic black-box estimation framework.
  • The Changing Trend of Paediatric Emergency Department Visits in Malaysia Following the COVID-19 Pandemic

    Masrani, Afiqah Syamimi; Nik Husain, Nik Rosmawati; Musa, Kamarul Imran; Moraga, Paula; Ismail, Mohd Tahir (Cureus, Springer Science and Business Media LLC, 2023-03-22) [Article]
    Background: The coronavirus disease 2019 (COVID-19) pandemic has impacted the emergency department (ED) due to the surge in medical demand and changes in the characteristics of paediatric visits. Additionally, the trend for paediatric ED visits has decreased globally, secondary to implementing lockdowns to stop the spread of COVID-19. We aim to study the trend and characteristics of paediatric ED visits following Malaysia’s primary timeline of the COVID-19 pandemic. Methods and materials: A five-year time series observational study of paediatric ED patients from two tertiary hospitals in Malaysia was conducted from March 17, 2017 (week 11 2017) to March 17, 2022 (week 12 2022). Aggregated weekly data were analysed using R statistical software version 4.2.2 (R Foundation for Statistical Computing, Vienna, Austria) against significant events during the COVID-19 pandemic to detect influential changepoints in the trend. The data collected were the number of ED visits, triage severity, visit outcomes and ED discharge diagnosis. Results: Overall, 175,737 paediatric ED visits were recorded with a median age of three years and predominantly males (56.8%). A 57.57% (p<0.00) reduction in the average weekly ED visits was observed during the Movement Control Order (MCO) period. Despite the increase in the proportion of urgent (odds ratio (OR): 1.23, p<0.00) and emergent or life-threatening (OR: 1.79, p<0.00) cases, the proportion of admissions decreased. Whilst the changepoints during the MCO indicated a rise in respiratory, fever or other infectious diseases, or gastrointestinal conditions, diagnosis of complications originating from the perinatal period declined from July 19, 2021 (week 29 2021). Conclusion: The incongruent change in disease severity and hospital admission reflects the potential effects of the healthcare system reform and socioeconomic impact as the pandemic evolves. Future studies on parental motivation to seek emergency medical attention may provide insight into the timing and choice of healthcare service utilisation.
  • A Deep Recurrent Neural Network Framework for Swarm Motion Speed Prediction

    Khaldi, Belkacem; Harrou, Fouzi; Dairi, Abdelkader; Sun, Ying (Journal of Electrical Engineering and Technology, Springer Science and Business Media LLC, 2023-03-18) [Article]
    Controlling and maintaining swarm robotic systems executing daily collective actions and accomplishing tasks more successfully in groups requires a timely and accurate forecast of swarm motion speed, which becomes a challenging task owing to swarm motion’s high dynamic feature. In this work, six potent forecasting recurrent deep neural networks, including RNN, LSTM, GRU, ConvLSTM, Bidirectional LSTM (BiLSTM), and BiGRU, are explored and compared in forecasting the motion speed of miniature swarm mobile robots engaged in a simple aggregation formation task. Essentially, the introduced forecasting models take advantage of the viscoelastic control model in flexibly controlling swarm robots and the capabilities of DL models to capture patterns in time series data. To this end, sensor measurements from a simulated swarm of foot bots conducting a circle formation task through the viscoelastic controller are recorded every 0.1 s and used as input vectors for forecasting purposes. The results show the promising performance of DL for swarm motion forecasting. Moreover, obtained results report that BiGRU reaches the highest swarm motion speed forecasting performance with the no/with obstacles scenarios considered in this study.
  • Test and Visualization of Covariance Properties for Multivariate Spatio-Temporal Random Fields

    Huang, Huang; Sun, Ying; Genton, Marc G. (Journal of Computational and Graphical Statistics, Informa UK Limited, 2023-03-16) [Article]
    The prevalence of multivariate space-time data collected from monitoring networks and satellites, or generated from numerical models, has brought much attention to multivariate spatio-temporal statistical models, where the covariance function plays a key role in modeling, inference, and prediction. For multivariate space-time data, understanding the spatio-temporal variability, within and across variables, is essential in employing a realistic covariance model. Meanwhile, the complexity of generic covariances often makes model fitting very challenging, and simplified covariance structures, including symmetry and separability, can reduce the model complexity and facilitate the inference procedure. However, a careful examination of these properties is needed in real applications. In the work presented here, we formally define these properties for multivariate spatio-temporal random fields and use functional data analysis techniques to visualize them, hence providing intuitive interpretations. We then propose a rigorous rank-based testing procedure to conclude whether the simplified properties of covariance are suitable for the underlying multivariate space-time data. The good performance of our method is illustrated through synthetic data, for which we know the true structure. We also investigate the covariance of bivariate wind speed, a key variable in renewable energy, over a coastal and an inland area in Saudi Arabia. The Supplementary Material is available online, including the R code for our developed methods.
  • DDOS attacks detection based on attention-deep learning and local outlier factor

    Dairi, Abdelkader; Khaldi, Belkacem; Harrou, Fouzi; Sun, Ying (IEEE, 2023-03-14) [Conference Paper]
    One of the most significant security concerns confronting network technology is the detection of distributed denial of service (DDOS). This paper introduces a semi-supervised data-driven approach to the detection of DDOS attacks. The proposed method employs normal events data without labeling to train the detection model. Specifically, this approach introduces an improved autoencoder (AE) model by incorporating a Gated Recurrent Unit (GRU) based on the attention mechanism (AM) at the encoder and decoder sides of the AE model. GRU enhances the AE's ability to learn temporal dependencies, and the AM enables the selection of relevant features. For DDOS attacks detection, the local outlier factor (LOF) anomaly detection algorithm is applied to extracted features from the improved AE model. The performance of the proposed approach has been verified using DDOS publically available datasets.
  • Joint modeling and prediction of massive spatio-temporal wildfire count and burnt area data with the INLA-SPDE approach

    Zhang, Zhongwei; Krainski, Elias Teixeira; Zhong, Peng; Rue, Harvard; Huser, Raphaël (Extremes, Springer, 2023-03-14) [Article]
    This paper describes the methodology used by the team RedSea in the data competition organized for EVA 2021 conference. We develop a novel two-part model to jointly describe the wildfire count data and burnt area data provided by the competition organizers with covariates. Our proposed methodology relies on the integrated nested Laplace approximation combined with the stochastic partial differential equation (INLA-SPDE) approach. In the first part, a binary non-stationary spatio-temporal model is used to describe the underlying process that determines whether or not there is wildfire at a specific time and location. In the second part, we consider a non-stationary hurdle log-Gaussian Cox process (hurdle-LGCP) for the positive wildfire count data, i.e., an LGCP is used to model the shifted positive count data, and a non-stationary log-Gaussian model for positive burnt area data. Dependence between the positive count data and positive burnt area data is captured by a shared spatio-temporal random effect. Our two-part modeling approach performs well in terms of the prediction score criterion chosen by the data competition organizers. Moreover, our model results show that surface pressure is the most influential driver for the occurrence of a wildfire, whilst surface net solar radiation and surface pressure are the key drivers for large numbers of wildfires, and temperature and evaporation are the key drivers of large burnt areas.
  • Measuring Information Transfer Between Nodes in a Brain Network through Spectral Transfer Entropy

    Redondo, Paolo Victor; Huser, Raphaël; Ombao, Hernando (arXiv, 2023-03-11) [Preprint]
    Brain connectivity reflects how different regions of the brain interact during performance of a cognitive task. In studying brain signals such as electroencephalograms (EEG), this may be explored via Granger causality (GC) which tests if knowledge of the past values of a channel improves predictions on future values of another channel. However, the common approach to investigating GC is the vector autoregressive (VAR) model which is limited only to linear lead-lag relations. An alternative information-theoretic causal measure, transfer entropy (TE), becomes more appropriate since it does not impose any distributional assumption on the variables and covers any form of relationship (beyond linear) between them. To improve utility of TE in brain signal analysis, we propose a novel methodology to capture cross-channel information transfer in the frequency domain. Specifically, we introduce a new measure, the spectral transfer entropy (STE), to quantify the magnitude and direction of information flow from a certain frequency-band oscillation of a channel to an oscillation of another channel. In contrast with previous works on TE in the frequency domain, we differentiate our work by considering the magnitude of filtered series (frequency band-specific), instead of using the spectral representation (frequency-specific) of a series. The main advantage of our proposed approach is that it allows adjustments for multiple comparisons to control family-wise error rate (FWER). One novel contribution is a simple yet efficient estimation method based on vine copula theory that enables estimates to capture zero (boundary point) without the need for bias adjustments. We showcase the advantage of our proposed measure through some numerical experiments and provide interesting and novel findings on the analysis of EEG recordings linked to a visual task.
  • Goodness-of-fit tests for multivariate skewed distributions based on the characteristic function

    Karling, Maicon; Genton, Marc G.; Meintanis, Simos G. (arXiv, 2023-03-08) [Preprint]
    We employ a general Monte Carlo method to test composite hypotheses of goodness-of-fit for several popular multivariate models that can accommodate both asymmetry and heavy tails. Specifically, we consider weighted L2-type tests based on a discrepancy measure involving the distance between empirical characteristic functions and thus avoid the need for employing corresponding population quantities which may be unknown or complicated to work with. The only requirements of our tests are that we should be able to draw samples from the distribution under test and possess a reasonable method of estimation of the unknown distributional parameters. Monte Carlo studies are conducted to investigate the performance of the test criteria in finite samples for several families of skewed distributions. Real-data examples are also included to illustrate our method.
  • Non-linear INAR(1) processes under an alternative geometric thinning operator

    Barreto-Souza, Wagner; Ndreca, Sokol; Silva, Rodrigo B.; Silva, Roger W.C. (Test, Springer Science and Business Media LLC, 2023-02-25) [Article]
    We propose a novel class of first-order integer-valued AutoRegressive (INAR(1)) models based on a new operator, the so-called geometric thinning operator, which induces a certain non-linearity to the models. We show that this non-linearity can produce better results in terms of prediction when compared to the linear case commonly considered in the literature. The new models are named non-linear INAR(1) (in short NonLINAR(1)) processes. We explore both stationary and non-stationary versions of the NonLINAR processes. Inference on the model parameters is addressed and the finite-sample behavior of the estimators investigated through Monte Carlo simulations. Two real data sets are analyzed to illustrate the stationary and non-stationary cases and the gain of the non-linearity induced for our method over the existing linear methods. A generalization of the geometric thinning operator and an associated NonLINAR process are also proposed and motivated for dealing with zero-inflated or zero-deflated count time series data.
  • Reshaping Geostatistical Modeling and Prediction for Extreme-Scale Environmental Applications

    Cao, Qinglei; Abdulah, Sameh; Alomairy, Rabab M.; Pei, Yu; Nag, Pratik; Bosilca, George; Dongarra, Jack; Genton, Marc G.; Keyes, David E.; Ltaief, Hatem; Sun, Ying (IEEE, 2023-02-23) [Conference Paper]
    We extend the capability of space-time geostatistical modeling using algebraic approximations, illustrating application-expected accuracy worthy of double precision from majority low-precision computations and low-rank matrix approximations. We exploit the mathematical structure of the dense covariance matrix whose inverse action and determinant are repeatedly required in Gaussian log-likelihood optimization. Geostatistics augments first-principles modeling approaches for the prediction of environmental phenomena given the availability of measurements at a large number of locations; however, traditional Cholesky-based approaches grow cubically in complexity, gating practical extension to continental and global datasets now available. We combine the linear algebraic contributions of mixed-precision and low-rank computations within a tile based Cholesky solver with on-demand casting of precisions and dynamic runtime support from PaRSEC to orchestrate tasks and data movement. Our adaptive approach scales on various systems and leverages the Fujitsu A64FX nodes of Fugaku to achieve up to 12X performance speedup against the highly optimized dense Cholesky implementation.
  • Insights into the drivers and spatio-temporal trends of extreme wildfires with statistical deep-learning

    Richards, Jordan; Huser, Raphaël (Copernicus GmbH, 2023-02-22) [Presentation]
    Extreme wildfires continue to be a significant cause of human death and biodiversity destruction across the globe, with recent worrying trends in their activity (i.e., occurrence and spread) suggesting that wildfires are likely to be highly impacted by climate change. In order to facilitate appropriate risk mitigation for extreme wildfires, it is imperative to identify their main drivers and assess their spatio-temporal trends, with a view to understanding the impacts of global warming on fire activity. To this end, we analyse monthly burnt area due to wildfires using a hybrid statistical deep-learning framework that exploits extreme value theory and quantile regression. Three study regions are considered: the contiguous U.S., Mediterranean Europe and Australia.
  • Space-time modelling of co-seismic and post-seismic landslide hazard via Ensemble Neural Networks.

    Dahal, Ashok; Tanyas, Hakan; van Westen, C.J.; Van der Meijde, Mark; Mai, Paul Martin; Huser, Raphaël; Lombardo, Luigi (Copernicus GmbH, 2023-02-22) [Presentation]
    Until now, a full numerical description of the spatio-temporal dynamics of a landslide could be achieved only via physics-based models. The part of the geoscientific community developing data-driven model has instead focused on predicting where landslides may occur via susceptibility models. Moreover, they have estimated when landslides may occur via models that belong to the early-warning-system or to the rainfall-threshold themes. In this context, few published researches have explored a joint spatio-temporal model structure. Furthermore, the third element completing the hazard definition, i.e., the landslide size (i.e., areas or volumes), has hardly ever been modeled over space and time. However, technological advancements in data-driven models have reached a level of maturity that allows to model all three components (Where, When and Size). This work takes this direction and proposes for the first time a solution to the assessment of landslide hazard in a given area by jointly modeling landslide occurrences and their associated areal density per mapping unit, in space and time. To achieve this, we used a spatio-temporal landslide database generated for the Nepalese region affected by the Gorkha earthquake. The model relies on a deep-learning architecture trained using an Ensemble Neural Network, where the landslide occurrences and densities are aggregated over a squared mapping unit of 1x1 km and classified/regressed against a nested 30~m lattice. At the nested level, we have expressed predisposing and triggering factors. As for the temporal units, we have used an approximately 6-month resolution. The results are promising as our model performs satisfactorily both in the susceptibility (AUC = 0.93) and density prediction (Pearson r = 0.93) tasks. This model takes a significant distance from the common susceptibility literature, proposing an integrated framework for hazard modeling in a data-driven context. To promote reproducibility and repeatability of the analyses in this work, we share data and codes in a GitHub repository accessible from this link: https://github.com/ashokdahal/LandslideHazard.
  • A combined statistical and machine learning approach for spatial prediction of extreme wildfire frequencies and sizes

    Cisneros, Daniela; Gong, Yan; Yadav, Rishikesh; Hazra, Arnab; Huser, Raphaël (Extremes, Springer Science and Business Media LLC, 2023-02-21) [Article]
    Motivated by the Extreme Value Analysis 2021 (EVA 2021) data challenge, we propose a method based on statistics and machine learning for the spatial prediction of extreme wildfire frequencies and sizes. This method is tailored to handle large datasets, including missing observations. Our approach relies on a four-stage, bivariate, sparse spatial model for high-dimensional zero-inflated data that we develop using stochastic partial differential equations (SPDE), allowing sparse precision matrices for the latent processes. In Stage 1, the observations are separated in zero/nonzero categories and modeled using a two-layered hierarchical Bayesian sparse spatial model to estimate the probabilities of these two categories. In Stage 2, we first obtain empirical estimates of the spatially-varying mean and variance profiles across the spatial locations for the positive observations and smooth those estimates using fixed rank kriging. This approximate Bayesian inference method is employed to avoid the high computational burden of large spatial data modeling using spatially-varying coefficients. In Stage 3, we further model the standardized log-transformed positive observations from the second stage using a sparse bivariate spatial Gaussian process. The Gaussian distribution assumption for wildfire counts developed in the third stage is computationally effective but erroneous. Thus, in Stage 4, the predicted exceedance probabilities are post-processed using Random Forests. We draw posterior inference for Stages 1 and 3 using Markov chain Monte Carlo (MCMC) sampling. We then create a cross-validation scheme for the artificially generated gaps and compare the EVA 2021 prediction scores of the proposed model to those obtained using some competitors.
  • An Improved Unbiased Particle Filter

    Jasra, Ajay; Maama, Mohamed; Ombao, Hernando (arXiv, 2023-02-20) [Preprint]
    In this paper we consider the filtering of partially observed multi-dimensional diffusion processes that are observed regularly at discrete times. We assume that, for numerical reasons, one has to time-discretize the diffusion process which typically leads to filtering that is subject to discretization bias. The approach in [16] establishes that when only having access to the time-discretized diffusion it is possible to remove the discretization bias with an estimator of finite variance. We improve on the method in [16] by introducing a modified estimator based on the recent work of [17]. We show that this new estimator is unbiased and has finite variance. Moreover, we conjecture and verify in numerical simulations that substantial gains are obtained. That is, for a given mean square error (MSE) and a particular class of multi-dimensional diffusion, the cost to achieve the said MSE falls.
  • Equivalence of measures and asymptotically optimal linear prediction for Gaussian random fields with fractional-order covariance operators

    Bolin, David; Kirchner, Kristin (Bernoulli, Bernoulli Society for Mathematical Statistics and Probability, 2023-02-19) [Article]
    We consider two Gaussian measures μ, ˜μ on a separable Hilbert space, with fractional-order covariance operators A−2β and Ã−2˜β, respectively, and derive necessary and sufficient conditions on A, à and β, ˜β > 0 for I. equivalence of the measures μ and ˜μ, and II. uniform asymptotic optimality of linear predictions for μ based on the misspecified measure ˜μ. These results hold, e.g., for Gaussian processes on compact metric spaces. As an important special case, we consider the class of generalized Whittle–Matérn Gaussian random fields, where A and à are elliptic second-order differential operators, formulated on a bounded Euclidean domain D ⊂ Rd and augmented with homogeneous Dirichlet boundary conditions. Our outcomes explain why the predictive performances of stationary and non-stationary models in spatial statistics often are comparable, and provide a crucial first step in deriving consistency results for parameter estimation of generalized Whittle–Matérn fields.
  • Extended Excess Hazard Models for Spatially Dependent Survival Data

    Amaral, André Victor Ribeiro; Rubio, Francisco Javier; Quaresma, Manuela; Rodríguez-Cortés, Francisco J.; Moraga, Paula (arXiv, 2023-02-18) [Preprint]
    Relative survival represents the preferred framework for the analysis of population cancer survival data. The aim is to model the survival probability associated to cancer in the absence of information about the cause of death. Recent data linkage developments have allowed for incorporating the place of residence or the place where patients receive treatment into the population cancer data bases; however, modeling this spatial information has received little attention in the relative survival setting. We propose a flexible parametric class of spatial excess hazard models (along with inference tools), named ``Relative Survival Spatial General Hazard'' (RS-SGH), that allows for the inclusion of fixed and spatial effects in both time-level and hazard-level components. We illustrate the performance of the proposed model using an extensive simulation study, and provide guidelines about the interplay of sample size, censoring, and model misspecification. We present two case studies, using real data from colon cancer patients in England, aiming at answering epidemiological questions that require the use of a spatial model. These case studies illustrate how a spatial model can be used to identify geographical areas with low cancer survival, as well as how to summarize such a model through marginal survival quantities and spatial effects.
  • A multivariate modified skew-normal distribution

    Mondal, Sagnik; Arellano-Valle, Reinaldo B.; Genton, Marc G. (Statistical Papers, Springer Science and Business Media LLC, 2023-02-13) [Article]
    We introduce a multivariate version of the modified skew-normal distribution, which contains the multivariate normal distribution as a special case. Unlike the Azzalini multivariate skew-normal distribution, this new distribution has a nonsingular Fisher information matrix when the skewness parameters are all zero, and its profile log-likelihood of the skewness parameters is always a non-monotonic function. We study some basic properties of the proposed family of distributions and present an expectation-maximization (EM) algorithm for parameter estimation that we validate through simulation studies. Finally, we apply the proposed model to the univariate frontier data and to a trivariate wind speed data, and compare its performance with the Azzalini skew-normal model.
  • Regularity and numerical approximation of fractional elliptic differential equations on compact metric graphs

    Bolin, David; Kovács, Mihály; Kumar, Vivek; Simas, Alexandre B. (arXiv, 2023-02-08) [Preprint]
    The fractional differential equation Lβu=f posed on a compact metric graph is considered, where β>14 and L=κ−ddx(Hddx) is a second-order elliptic operator equipped with certain vertex conditions and sufficiently smooth and positive coefficients κ,H. We demonstrate the existence of a unique solution for a general class of vertex conditions and derive the regularity of the solution in the specific case of Kirchhoff vertex conditions. These results are extended to the stochastic setting when f is replaced by Gaussian white noise. For the deterministic and stochastic settings under generalized Kirchhoff vertex conditions, we propose a numerical solution based on a finite element approximation combined with a rational approximation of the fractional power L−β. For the resulting approximation, the strong error is analyzed in the deterministic case, and the strong mean squared error as well as the L2(Γ×Γ)-error of the covariance function of the solution are analyzed in the stochastic setting. Explicit rates of convergences are derived for all cases. Numerical experiments for the example L=κ2−Δ,κ>0 are performed to illustrate the theoretical results.
  • Editorial for the special issue on Time Series Analysis

    Fokianos, Konstantinos; Kirch, Claudia; Ombao, Hernando (Computational Statistics & Data Analysis, Elsevier BV, 2023-02-07) [Article]

View more