Statistics Program

Permanent URI for this collection

For more information visit: https://stat.kaust.edu.sa/

Browse

Recent Submissions

Now showing 1 - 5 of 845
  • Article

    Enhancing Predictive Capabilities: Machine Learning Approaches for Predicting Mechanical Behavior in Friction Stir Welded Aluminum Alloys

    (Springer Science and Business Media LLC, 2024-04-08) Dorbane, Abdelhakim; Harrou, Fouzi; Dursun, Bekir; Sun, Ying; Computer, Electrical and Mathematical Sciences and Engineering; Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division; Statistics; Statistics Program; Smart Structures Laboratory (SSL), Department of Mechanical Engineering, University of Ain Temouchent, PO BOX 284, 46000, Ain Temouchent, Algeria; Technical Sciences Vocational School, Trakya University, Edirne, Turkey

    Accurate prediction of friction stir welding (FSW) joint behavior is crucial for optimizing welding processes and ensuring structural integrity. This study exploits machine learning to predict the mechanical behavior of aluminum alloy FSW joints under varying temperatures. It involves a comparison of predictive performance across 18 models, including support vector regression (SVR), Gaussian process regression (GPR), ensemble models, and five distinct types of neural networks (NN). The assessment used Al6061-T6 aluminum alloy with the FSW joining method at temperatures of 25, 100, 200, and 300 °C. To ensure robustness, the machine learning models were developed using a fivefold cross-validation approach, with Bayesian optimization applied for fine-tuning during training. Results revealed the ability of machine learning to precisely predict the mechanical behavior of FSW joints. Specifically, GPR and the triple NN model outperformed other models, achieving average R2 values of 0.9879 and 0.9703, respectively.

  • Article

    Integrated Nested Laplace Approximations for Large-Scale Spatio-Temporal Bayesian Modeling

    (SIAM, 2024) Gaedke-Merzhäuser, Lisa; Krainski, Elias Teixeira; Janalik, Radim; Rue, Haavard; Schenk, Olaf; Computer, Electrical and Mathematical Sciences and Engineering; Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division; Statistics; Statistics Program; Faculty of Informatics Università della Svizzer italiana Lugano, Switzerland

    Bayesian inference tasks continue to pose a computational challenge. This especially holds for spatial-temporal modeling where high-dimensional latent parameter spaces are ubiquitous. The methodology of integrated nested Laplace approximations (INLA) provides a framework for performing Bayesian inference applicable to a large subclass of additive Bayesian hierarchical models. In combination with the stochastic partial differential equations (SPDE) approach it gives rise to an efficient method for spatial-temporal modeling. In this work we build on the INLA-SPDE approach, by putting forward a performant distributed memory variant, INLA-DIST, for large-scale applications. To perform the arising computational kernel operations, consisting of Cholesky factorizations, solving linear systems, and selected matrix inversions, we present two numerical solver options, a sparse CPU-based library and a novel blocked GPU-accelerated approach which we propose. We leverage the recurring nonzero block structure in the arising precision (inverse covariance) matrices, which allows us to employ dense subroutines within a sparse setting. Both versions of INLA-DIST are highly scalable, capable of performing inference on models with millions of latent parameters. We demonstrate their accuracy and performance on synthetic as well as real-world climate dataset applications.

  • Article

    Spatial analysis of modern contraceptive use among women who need it in Ethiopia: Using geo-referenced data from performance monitoring for action

    (Public Library of Science (PLoS), 2024-04-04) Ejigu, Bedilu Alamirie; Shiferaw, Solomon; Moraga, Paula; Seme, Assefa; Yihdego, Mahari; Zebene, Addisalem; Amogne, Ayanaw; Zimmerman, Linnea; Computer, Electrical and Mathematical Sciences and Engineering; Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division; Statistics; Statistics Program; Department of Statistics, Addis Ababa University, Addis Ababa, Ethiopia; School of Public Health, Addis Ababa University, Addis Ababa, Ethiopia; PMA Ethiopia, Addis Ababa, Ethiopia; Department of Population Family and Reproductive Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States of America

    Introduction The challenge of achieving maternal and neonatal health-related goals in developing countries is significantly impacted by high fertility rates, which are partly attributed to limited access to family planning and access to the healthcare systems. The most widely used indicator to monitor family planning coverage is the proportion of women in reproductive age using contraception (CPR). However, this metric does not accurately reflect the true family planning coverage, as it fails to account for the diverse needs of women in reproductive age. Not all women in this category require contraception, including those who are pregnant, wish to become pregnant, sexually inactive, or infertile. To effectively address the contraceptive needs of those who require it, this study aims to estimate family planning coverage among this specific group. Further, we aimed to explore the geographical variation and factors influencing contraceptive uptake of contraceptive use among those who need. Method We used data from the Performance Monitoring for Action Ethiopia (PMA Ethiopia) survey of women of reproductive age and the service delivery point (SDP) survey conducted in 2019. A total of 4,390 women who need contraception were considered as the analytical sample. To account for the study design, sampling weights were considered to compute the coverage of modern contraceptive use disaggregated by socio-demographic factors. Bayesian geostatistical modeling was employed to identify potential factors associated with the uptake of modern contraception and produce spatial prediction to unsampled locations. Result The overall weighted prevalence of modern contraception use among women who need it was 44.2% (with 95% CI: 42.4%-45.9%). Across regions of Ethiopia, contraceptive use coverage varies from nearly 0% in Somali region to 52.3% in Addis Ababa. The average nearest distance from a woman's home to the nearest SDP was high in the Afar and Somali regions. The spatial mapping shows that contraceptive coverage was lower in the eastern part of the country. At zonal administrative level, relatively high (above 55%) proportion of modern contraception use coverage were observed in Adama Liyu Zone, Ilu Ababor, Misrak Shewa, and Kefa zone and the coverage were null in majority of Afar and Somali region zones. Among modern contraceptive users, use of the injectable dominated the methodmix. The modeling result reveals that, living closer to a SDP, having discussions about family planning with the partner, following a Christian religion, no pregnancy intention, being ever pregnant and being young increases the likelihood of using modern contraceptive methods. Conclusion Areas with low contraceptive coverage and lower access to contraception because of distance should be prioritized by the government and other supporting agencies. Women who discussed family planning with their partner were more likely to use modern contraceptives unlike those without such discussion. Thus, to improve the coverage of contraceptive use, it is very important to encourage/advocate women to have discussions with their partner and establish movable health systems for the nomadic community.

  • Article

    Locally tail-scale invariant scoring rules for evaluation of extreme value forecasts

    (Elsevier BV, 2024-04) Olafsdottir, Helga Kristin; Rootzén, Holger; Bolin, David; Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia; Statistics; Statistics Program; Computer, Electrical and Mathematical Sciences and Engineering; Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division; Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg, Gothenburg, Sweden

    Statistical analysis of extremes can be used to predict the probability of future extreme events, such as large rainfalls or devastating windstorms. The quality of these forecasts can be measured through scoring rules. Locally scale invariant scoring rules give equal importance to the forecasts at different locations regardless of differences in the prediction uncertainty. This is a useful feature when computing average scores but can be an unnecessarily strict requirement when one is mostly concerned with extremes. We propose the concept of local weight-scale invariance, describing scoring rules fulfilling local scale invariance in a certain region of interest, and as a special case, local tail-scale invariance for large events. Moreover, a new version of the weighted continuous ranked probability score (wCRPS) called the scaled wCRPS (swCRPS) that possesses this property is developed and studied. The score is a suitable alternative for scoring extreme value models over areas with a varying scale of extreme events, and we derive explicit formulas of the score for the generalised extreme value distribution. The scoring rules are compared through simulations, and their usage is illustrated by modelling extreme water levels and annual maximum rainfall, and in an application to non-extreme forecasts for the prediction of air pollution.

  • Preprint

    Spatial confounding under infill asymptotics

    (arXiv, 2024-03-27) Bolin, David; Wallin, Jonas; Statistics Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia; Statistics; Statistics Program; Computer, Electrical and Mathematical Sciences and Engineering; Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division; Department of Statistics, Lund University, Lund, Sweden

    The estimation of regression parameters in spatially referenced data plays a crucial role across various scientific domains. A common approach involves employing an additive regression model to capture the relationship between observations and covariates, accounting for spatial variability not explained by the covariates through a Gaussian random field. While theoretical analyses of such models have predominantly focused on prediction and covariance parameter inference, recent attention has shifted towards understanding the theoretical properties of regression coefficient estimates, particularly in the context of spatial confounding. This article studies the effect of misspecified covariates, in particular when the misspecification changes the smoothness. We analyze the theoretical properties of the generalize least-square estimator under infill asymptotics, and show that the estimator can have counter-intuitive properties. In particular, the estimated regression coefficients can converge to zero as the number of observations increases, despite high correlations between observations and covariates. Perhaps even more surprising, the estimates can diverge to infinity under certain conditions. Through an application to temperature and precipitation data, we show that both behaviors can be observed for real data. Finally, we propose a simple fix to the problem by adding a smoothing step in the regression.