Statistics Program
Permanent URI for this collection
Browse
Recent Submissions
Article Non-stationary Bayesian Spatial Model for Disease Mapping based on Sub-regions
(SAGE Publications, 2024) Abdul Fattah, Esmail; Krainski, Elias Teixeira; Niekerk, Janet Van; Rue, Haavard; Statistics Program; Extreme Computing Research Center; Computer, Electrical and Mathematical Science and Engineering (CEMSE) DivisionThis paper aims to extend the Besag model, a widely used Bayesian spatial model in disease mapping, to a non-stationary spatial model for irregular lattice-type data. The goal is to improve the model’s ability to capture complex spatial dependence patterns and increase interpretability. The proposed model uses multiple precision parameters, accounting for different intensities of spatial dependence in different sub-regions. We derive a joint penalized complexity prior for the flexible local precision parameters to prevent overfitting and ensure contraction to the stationary model at a user-defined rate. The proposed methodology can be used as a basis for the development of various other non-stationary effects over other domains such as time. An accompanying R package fbesag equips the reader with the necessary tools for immediate use and application. We illustrate the novelty of the proposal by modeling the risk of dengue in Brazil, where the stationary spatial assumption fails and interesting risk profiles are estimated when accounting for spatial nonstationary. Additionally we model different causes of death in Brazil, where we use the new model to investigate the spatial stationarity of these causes.
Preprint Spatial Latent Gaussian Modelling with Change of Support
(arXiv, 2024-03-13) Chacon Montalvan, Erick; Atkinson, Peter M.; Nemeth, Christopher; Taylor, Benjamin M.; Moraga, Paula; Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia; Statistics Program; Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division; Lancaster Environment Centre, Lancaster University, United Kingdom; Department of Mathematics and Statistics, Lancaster University, United Kingdom; School of Mathematical Sciences, University College Cork, IrelandSpatial data are often derived from multiple sources (e.g. satellites, in-situ sensors, survey samples) with different supports, but associated with the same properties of a spatial phenomenon of interest. It is common for predictors to also be measured on different spatial supports than the response variables. Although there is no standard way to work with spatial data with different supports, a prevalent approach used by practitioners has been to use downscaling or interpolation to project all the variables of analysis towards a common support, and then using standard spatial models. The main disadvantage with this approach is that simple interpolation can introduce biases and, more importantly, the uncertainty associated with the change of support is not taken into account in parameter estimation. In this article, we propose a Bayesian spatial latent Gaussian model that can handle data with different rectilinear supports in both the response variable and predictors. Our approach allows to handle changes of support more naturally according to the properties of the spatial stochastic process being used, and to take into account the uncertainty from the change of support in parameter estimation and prediction. We use spatial stochastic processes as linear combinations of basis functions where Gaussian Markov random fields define the weights. Our hierarchical modelling approach can be described by the following steps: (i) define a latent model where response variables and predictors are considered as latent stochastic processes with continuous support, (ii) link the continuous-index set stochastic processes with its projection to the support of the observed data, (iii) link the projected process with the observed data. We show the applicability of our approach by simulation studies and modelling land suitability for improved grassland in Rhondda Cynon Taf, a county borough in Wales.
Preprint GPU-Accelerated Vecchia Approximations of Gaussian Processes for Geospatial Data using Batched Matrix Computations
(arXiv, 2024-03-12) Pan, Qilong; Abdulah, Sameh; Genton, Marc G.; Keyes, David E.; Ltaief, Hatem; Sun, Ying; Division of Computer, Electrical, and Mathematical Sciences and Engineering (CEMSE), Extreme Computing Research Center Technology, Thuwal, Jeddah 23955, Saudi Arabia; Extreme Computing Research Center; Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division; Statistics Program; Applied Mathematics and Computational Science Program; Office of the President; Division of Computer, Electrical, and Mathematical Sciences and Engineering (CEMSE), Statistics ProgramGaussian processes (GPs) are commonly used for geospatial analysis, but they suffer from high computational complexity when dealing with massive data. For instance, the log-likelihood function required in estimating the statistical model parameters for geospatial data is a computationally intensive procedure that involves computing the inverse of a covariance matrix with size n X n, where n represents the number of geographical locations. As a result, in the literature, studies have shifted towards approximation methods to handle larger values of n effectively while maintaining high accuracy. These methods encompass a range of techniques, including low-rank and sparse approximations. Vecchia approximation is one of the most promising methods to speed up evaluating the log-likelihood function. This study presents a parallel implementation of the Vecchia approximation, utilizing batched matrix computations on contemporary GPUs. The proposed implementation relies on batched linear algebra routines to efficiently execute individual conditional distributions in the Vecchia algorithm. We rely on the KBLAS linear algebra library to perform batched linear algebra operations, reducing the time to solution compared to the state-of-the-art parallel implementation of the likelihood estimation operation in the ExaGeoStat software by up to 700X, 833X, 1380X on 32GB GV100, 80GB A100, and 80GB H100 GPUs, respectively. We also successfully manage larger problem sizes on a single NVIDIA GPU, accommodating up to 1M locations with 80GB A100 and H100 GPUs while maintaining the necessary application accuracy. We further assess the accuracy performance of the implemented algorithm, identifying the optimal settings for the Vecchia approximation algorithm to preserve accuracy on two real geospatial datasets: soil moisture data in the Mississippi Basin area and wind speed data in the Middle East.
Article Space–time landslide hazard modeling via Ensemble Neural Networks
(Copernicus GmbH, 2024-03-08) Dahal, Ashok; Tanyas, Hakan; van Westen, Cees; van der Meijde, Mark; Mai, Paul Martin; Huser, Raphaël; Lombardo, Luigi; Earth Science and Engineering Program; Physical Science and Engineering (PSE) Division; Statistics Program; Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division; Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente, P.O. Box 217, Enschede, AE 7500, the NetherlandsAbstract. Until now, a full numerical description of the spatio-temporal dynamics of a landslide could be achieved only via physically based models. The part of the geoscientific community in developing data-driven models has instead focused on predicting where landslides may occur via susceptibility models. Moreover, they have estimate when landslides may occur via models that belong to the early-warning system or to the rainfall-threshold classes. In this context, few published research works have explored a joint spatio-temporal model structure. Furthermore, the third element completing the hazard definition, i.e., the landslide size (i.e., areas or volumes), has hardly ever been modeled over space and time. However, technological advancements in data-driven models have reached a level of maturity that allows all three components to be modeled (Location, Frequency, and Size). This work takes this direction and proposes for the first time a solution to the assessment of landslide hazard in a given area by jointly modeling landslide occurrences and their associated areal density per mapping unit, in space and time. To achieve this, we used a spatio-temporal landslide database generated for the Nepalese region affected by the Gorkha earthquake. The model relies on a deep-learning architecture trained using an Ensemble Neural Network, where the landslide occurrences and densities are aggregated over a squared mapping unit of 1 km × 1 km and classified or regressed against a nested 30 m lattice. At the nested level, we have expressed predisposing and triggering factors. As for the temporal units, we have used an approximately 6 month resolution. The results are promising as our model performs satisfactorily both in the susceptibility (AUC = 0.93) and density prediction (Pearson r = 0.93) tasks over the entire spatio-temporal domain. This model takes a significant distance from the common landslide susceptibility modeling literature, proposing an integrated framework for hazard modeling in a data-driven context.
Article Improved lithium-ion battery health prediction with data-based approach
(Elsevier BV, 2024-03) Merrouche, Walid; Harrou, Fouzi; Taghezouit, Bilal; Sun, Ying; Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division; Statistics Program; Center for Renewable Energies Development (CDER), Photovoltaic Solar Energy Division, Bouzareah, Algiers, AlgeriaThe accurate modeling and forecasting of Battery State of Health (SOH) is crucial for ensuring reliable performance and longevity of lithium-ion batteries. This article introduces a data-driven approach for SOH prediction using Gaussian Process Regression (GPR), selected for its ability to model complex data relationships and capture prediction uncertainty without relying on future load information. Recognizing the effect of reversible capacity recoveries on prediction accuracy, this work employs the ISEA-RWTH 48 Li-ion Cells dataset, deliberately devoid of such recoveries prior to training the GPR model. The GPR model was evaluated and compared with Support Vector Regression (SVR) using the publicly available dataset. First and second End of Life (EOL) scenarios were considered, relevant to primary and secondary battery applications. The results demonstrated the GPR model's superior performance. Particularly, mid-life and late-life predictions displayed better accuracy with GPR, showcasing higher R2 values and lower MAPE values (e.g., mid-life prediction: GPR's average R2 = 0.99, SVR's = 0.9789; GPR's average MAPE = 0.1916, SVR's = 1.3028). Moreover, GPR exhibited the ability to quantify uncertainty in capacity degradation and forecast first and second EOL instances effectively (e.g., mid-life predictions had 1.7 cycle error at 1st EOL and 8.9 cycle error at 2nd EOL). The research also offers valuable insights into the application of machine learning methods for predicting the health degradation of lithium-ion batteries.