Now showing items 21-40 of 542

    • An Effective Wind Power Prediction using Latent Regression Models

      Bouyeddou, Benamar; Harrou, Fouzi; Saidi, Ahmed; Sun, Ying (IEEE, 2021-08-02) [Conference Paper]
      Wind power is considered one of the most promising renewable energies. Efficient prediction of wind power will support in efficiently integrating wind power in the power grid. However, the major challenge in wind power is its high fluctuation and intermittent nature, making it challenging to predict. This paper investigated and compared the performance of two commonly latent variable regression methods, namely principal component regression (PCR) and partial least squares regression (PLSR), for predicting wind power. Actual measurements recorded every 10 minutes from an actual wind turbine are used to demonstrate the prediction precision of the investigated techniques. The result showed that the prediction performances of PCR and PLSR are relatively comparable. The investigated models in this study can represent a helpful tool for model-based anomaly detection in wind turbines.
    • A temporal model for vertical extrapolation of wind speed and wind energy assessment

      Crippa, Paola; Alifa, Mariana; Bolster, Diogo; Genton, Marc G.; Castruccio, Stefano (Applied Energy, Elsevier BV, 2021-07-28) [Article]
      Accurate wind speed estimates at turbine hub height are critical for wind farm operational purposes, such as forecasting and grid operation, but also for wind energy assessments at regional scales. Power law models have widely been used for vertical wind speed profiles due to their simplicity and suitability for many applications over diverse geographic regions. The power law requires estimation of a wind shear coefficient, α, linking the surface wind speed to winds at higher altitudes. Prior studies have mostly adopted simplified models for α, ranging from a single constant, to a site-specific constant in time value. In this work we (i) develop a new model for α which is able to capture hourly variability across a range of geographic/topographic features; (ii) quantify its improved skill compared to prior studies; and (iii) demonstrate implications for wind energy estimates over a large geographical area. To achieve this we use long-term high-resolution simulations by the Weather Research and Forecasting model, as well as met-mast and radiosonde observations of vertical profiles of wind speed and other atmospheric properties. The study focuses on Saudi Arabia, an emerging country with ambitious renewable energy plans, and is part of a bigger effort supported by the Saudi Arabian government to characterize wind energy resources over the country. Results from this study indicate that the proposed model outperforms prior formulations of α, with a domain average reduction of the wind speed RMSE of 23–33%. Further, we show how these improved estimates impact assessments of wind energy potential and associated wind farm siting.
    • Graph Autoencoders for Embedding Learning in Brain Networks and Major Depressive Disorder Identification

      Noman, Fuad; Ting, Chee-Ming; Kang, Hakmook; Phan, Raphael C. -W.; Boyd, Brian D.; Taylor, Warren D.; Ombao, Hernando (arXiv, 2021-07-27) [Preprint]
      Brain functional connectivity (FC) reveals biomarkers for identification of various neuropsychiatric disorders. Recent application of deep neural networks (DNNs) to connectome-based classification mostly relies on traditional convolutional neural networks using input connectivity matrices on a regular Euclidean grid. We propose a graph deep learning framework to incorporate the non-Euclidean information about graph structure for classifying functional magnetic resonance imaging (fMRI)- derived brain networks in major depressive disorder (MDD). We design a novel graph autoencoder (GAE) architecture based on the graph convolutional networks (GCNs) to embed the topological structure and node content of large-sized fMRI networks into low-dimensional latent representations. In network construction, we employ the Ledoit-Wolf (LDW) shrinkage method to estimate the high-dimensional FC metrics efficiently from fMRI data. We consider both supervised and unsupervised approaches for the graph embedded learning. The learned embeddings are then used as feature inputs for a deep fully-connected neural network (FCNN) to discriminate MDD from healthy controls. Evaluated on a resting-state fMRI MDD dataset with 43 subjects, results show that the proposed GAE-FCNN model significantly outperforms several state-of-the-art DNN methods for brain connectome classification, achieving accuracy of 72.50% using the LDW-FC metrics as node features. The graph embeddings of fMRI FC networks learned by the GAE also reveal apparent group differences between MDD and HC. Our new framework demonstrates feasibility of learning graph embeddings on brain networks to provide discriminative information for diagnosis of brain disorders.
    • Landslide size matters: A new data-driven, spatial prototype

      Lombardo, Luigi; Tanyas, Hakan; Huser, Raphaël; Guzzetti, Fausto; Castro-Camilo, Daniela (Engineering Geology, Elsevier BV, 2021-07-24) [Article]
      The standard definition of landslide hazard requires the estimation of where, when (or how frequently) and how large a given landslide event may be. The geoscientific community involved in statistical models has addressed the component pertaining to how large a landslide event may be by introducing the concept of landslide-event magnitude scale. This scale, which depends on the planimetric area of the given population of landslides, in analogy to the earthquake magnitude, has been expressed with a single value per landslide event. As a result, the geographic or spatially-distributed estimation of how large a population of landslide may be when considered at the slope scale, has been disregarded in statistically-based landslide hazard studies. Conversely, the estimation of the landslide extent has been commonly part of physically-based applications, though their implementation is often limited to very small regions. In this work, we initially present a review of methods developed for landslide hazard assessment since its first conception decades ago. Subsequently, we introduce for the first time a statistically-based model able to estimate the planimetric area of landslides aggregated per slope units. More specifically, we implemented a Bayesian version of a Generalized Additive Model where the maximum landslide size per slope unit and the sum of all landslide sizes per slope unit are predicted via a Log-Gaussian model. These “max” and “sum” models capture the spatial distribution of (aggregated) landslide sizes. We tested these models on a global dataset expressing the distribution of co-seismic landslides due to 24 earthquakes across the globe. The two models we present are both evaluated on a suite of performance diagnostics that suggest our models suitably predict the aggregated landslide extent per slope unit. In addition to a complex procedure involving variable selection and a spatial uncertainty estimation, we built our model over slopes where landslides triggered in response to seismic shaking, and simulated the expected failing surface over slopes where the landslides did not occur in the past. What we achieved is the first statistically-based model in the literature able to provide information about the extent of the failed surface across a given landscape. This information is vital in landslide hazard studies and should be combined with the estimation of landslide occurrence locations. This could ensure that governmental and territorial agencies have a complete probabilistic overview of how a population of landslides could behave in response to a specific trigger. The predictive models we present are currently valid only for the 25 cases we tested. Statistically estimating landslide extents is still at its infancy stage. Many more applications should be successfully validated before considering such models in an operational way. For instance, the validity of our models should still be verified at the regional or catchment scale, as much as it needs to be tested for different landslide types and triggers. However, we envision that this new spatial predictive paradigm could be a breakthrough in the literature and, in time, could even become part of official landslide risk assessment protocols.
    • Fault Detection in Solar PV Systems Using Hypothesis Testing

      Harrou, Fouzi; Taghezouit, Bilal; Bouyeddou, Benamar; Sun, Ying; Arab, Amar Hadj (IEEE, 2021-07-21) [Conference Paper]
      The demand for solar energy has rapidly increased throughout the world in recent years. However, anomalies in photovoltaic (PV) plants can reduce performances and result in serious consequences. Developing reliable statistical approaches able to detect anomalies in PV plants is vital to improving the management of these plants. Here, we present a statistical approach for detecting anomalies in the DC part of PV plants and partial shading. Firstly, we model the monitored PV plant. Then, we employ a generalized likelihood ratio test, which is a powerful anomaly detection tool, to check the residuals from the model and reveal anomalies in the supervised PV array. The proposed strategy is illustrated via actual measurements from a 9.54 PV plant.
    • BICNet: A Bayesian Approach for Estimating Task Effects on Intrinsic Connectivity Networks in fMRI Data

      Tang, Meini; Ting, Chee-Ming; Ombao, Hernando (arXiv, 2021-07-19) [Preprint]
      Intrinsic connectivity networks (ICNs) are specific dynamic functional brain networks that are consistently found under various conditions including rest and task. Studies have shown that some stimuli actually activate intrinsic connectivity through either suppression, excitation, moderation or modification. Nevertheless, the structure of ICNs and task-related effects on ICNs are not yet fully understood. In this paper, we propose a Bayesian Intrinsic Connectivity Network (BICNet) model to identify the ICNs and quantify the task-related effects on the ICN dynamics. Using an extended Bayesian dynamic sparse latent factor model, the proposed BICNet has the following advantages: (1) it simultaneously identifies the individual ICNs and group-level ICN spatial maps; (2) it robustly identifies ICNs by jointly modeling resting-state functional magnetic resonance imaging (rfMRI) and task-related functional magnetic resonance imaging (tfMRI); (3) compared to independent component analysis (ICA)-based methods, it can quantify the difference of ICNs amplitudes across different states; (4) it automatically performs feature selection through the sparsity of the ICNs rather than ad-hoc thresholding. The proposed BICNet was applied to the rfMRI and language tfMRI data from the Human Connectome Project (HCP) and the analysis identified several ICNs related to distinct language processing functions.
    • Multivariate Conway-Maxwell-Poisson Distribution: Sarmanov Method and Doubly-Intractable Bayesian Inference

      Piancastelli, Luiza S. C.; Friel, Nial; Barreto-Souza, Wagner; Ombao, Hernando (arXiv, 2021-07-15) [Preprint]
      In this paper, a multivariate count distribution with Conway-Maxwell (COM)-Poisson marginals is proposed. To do this, we develop a modification of the Sarmanov method for constructing multivariate distributions. Our multivariate COM-Poisson (MultCOMP) model has desirable features such as (i) it admits a flexible covariance matrix allowing for both negative and positive non-diagonal entries; (ii) it overcomes the limitation of the existing bivariate COM-Poisson distributions in the literature that do not have COM-Poisson marginals; (iii) it allows for the analysis of multivariate counts and is not just limited to bivariate counts. Inferential challenges are presented by the likelihood specification as it depends on a number of intractable normalizing constants involving the model parameters. These obstacles motivate us to propose a Bayesian inferential approach where the resulting doubly-intractable posterior is dealt with via the exchange algorithm and the Grouped Independence Metropolis-Hastings algorithm. Numerical experiments based on simulations are presented to illustrate the proposed Bayesian approach. We analyze the potential of the MultCOMP model through a real data application on the numbers of goals scored by the home and away teams in the Premier League from 2018 to 2021. Here, our interest is to assess the effect of a lack of crowds during the COVID-19 pandemic on the well-known home team advantage. A MultCOMP model fit shows that there is evidence of a decreased number of goals scored by the home team, not accompanied by a reduced score from the opponent. Hence, our analysis suggests a smaller home team advantage in the absence of crowds, which agrees with the opinion of several football experts.
    • Sex ratio at birth in Vietnam among six subnational regions during 1980–2050, estimation and probabilistic projection using a Bayesian hierarchical time series model with 2.9 million birth records

      Chao, Fengqing; Guilmoto, Christophe Z.; Ombao, Hernando (PLOS ONE, Public Library of Science (PLoS), 2021-07-14) [Article]
      The sex ratio at birth (SRB, i.e., the ratio of male to female births) in Vietnam has been imbalanced since the 2000s. Previous studies have revealed a rapid increase in the SRB over the past 15 years and the presence of important variations across regions. More recent studies suggested that the nation’s SRB may have plateaued during the 2010s. Given the lack of exhaustive birth registration data in Vietnam, it is necessary to estimate and project levels and trends in the regional SRBs in Vietnam based on a reproducible statistical approach. We compiled an extensive database on regional Vietnam SRBs based on all publicly available surveys and censuses and used a Bayesian hierarchical time series mixture model to estimate and project SRB in Vietnam by region from 1980 to 2050. The Bayesian model incorporates the uncertainties from the observations and year-by-year natural fluctuation. It includes a binary parameter to detect the existence of sex ratio transitions among Vietnamese regions. Furthermore, we model the SRB imbalance using a trapezoid function to capture the increase, stagnation, and decrease of the sex ratio transition by Vietnamese regions. The model results show that four out of six Vietnamese regions, namely, Northern Midlands and Mountain Areas, Northern Central and Central Coastal Areas, Red River Delta, and South East, have existing sex imbalances at birth. The rise in SRB in the Red River Delta was the fastest, as it took only 12 years and was more pronounced, with the SRB reaching the local maximum of 1.146 with a 95% credible interval (1.129, 1.163) in 2013. The model projections suggest that the current decade will record a sustained decline in sex imbalances at birth, and the SRB should be back to the national SRB baseline level of 1.06 in all regions by the mid-2030s.
    • A Field Guide to Federated Optimization

      Wang, Jianyu; Charles, Zachary; Xu, Zheng; Joshi, Gauri; McMahan, H. Brendan; Arcas, Blaise Aguera y; Al-Shedivat, Maruan; Andrew, Galen; Avestimehr, Salman; Daly, Katharine; Data, Deepesh; Diggavi, Suhas; Eichner, Hubert; Gadhikar, Advait; Garrett, Zachary; Girgis, Antonious M.; Hanzely, Filip; Hard, Andrew; He, Chaoyang; Horvath, Samuel; Huo, Zhouyuan; Ingerman, Alex; Jaggi, Martin; Javidi, Tara; Kairouz, Peter; Kale, Satyen; Karimireddy, Sai Praneeth; Konecny, Jakub; Koyejo, Sanmi; Li, Tian; Liu, Luyang; Mohri, Mehryar; Qi, Hang; Reddi, Sashank J.; Richtarik, Peter; Singhal, Karan; Smith, Virginia; Soltanolkotabi, Mahdi; Song, Weikang; Suresh, Ananda Theertha; Stich, Sebastian U.; Talwalkar, Ameet; Wang, Hongyi; Woodworth, Blake; Wu, Shanshan; Yu, Felix X.; Yuan, Honglin; Zaheer, Manzil; Zhang, Mi; Zhang, Tong; Zheng, Chunxiang; Zhu, Chen; Zhu, Wennan (arXiv, 2021-07-14) [Preprint]
      Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection. The distributed learning process can be formulated as solving federated optimization problems, which emphasize communication efficiency, data heterogeneity, compatibility with privacy and system requirements, and other constraints that are not primary considerations in other problem settings. This paper provides recommendations and guidelines on formulating, designing, evaluating and analyzing federated optimization algorithms through concrete examples and practical implementation, with a focus on conducting effective simulations to infer real-world performance. The goal of this work is not to survey the current literature, but to inspire researchers and practitioners to design federated learning algorithms that can be used in various practical applications.
    • Spatial cluster detection with threshold quantile regression

      Lee, Junho; Sun, Ying; Judy Wang, Huixia (Environmetrics, Wiley, 2021-07-13) [Article]
      Spatial cluster detection, which is the identification of spatial units adjacent in space associated with distinctive patterns of data of interest relative to background variation, is useful for discerning spatial heterogeneity in regression coefficients. Some real studies with regression-based models on air quality data show that there exists not only spatial heterogeneity but also heteroscedasticity between air pollution and its predictors. Since the low air quality is a well-known risk factor for mortality, various cardiopulmonary diseases, and preterm birth, the analysis at the tail would be of more interest than the center of air pollution distribution. In this article, we develop a spatial cluster detection approach using a threshold quantile regression model to capture the spatial heterogeneity and heteroscedasticity. We introduce two threshold variables in the quantile regression model to define a spatial cluster. The proposed test statistic for identifying the spatial cluster is the supremum of the Wald process over the space of threshold parameters. We establish the limiting distribution of the test statistic under the null hypothesis that the quantile regression coefficient is the same over the entire spatial domain at the given quantile level. The performance of our proposed method is assessed by simulation studies. The proposed method is also applied to analyze the particulate matter (PM 2.5 ) concentration and aerosol optical depth (AOD) data in the Northeastern United States in order to study geographical heterogeneity in the association between AOD and PM 2.5 at different quantile levels.
    • Stationary and Cyclostationary Processes for Time Series and Spatio-Temporal Data

      Das, Soumya (2021-07-10) [Dissertation]
      Advisor: Genton, Marc G.
      Committee members: Stenchikov, Georgiy L.; Ombao, Hernando; Pourahmadi, Mohsen
      Due essentially to the difficulties associated with obtaining explicit forms of stationary marginal distributions of non-linear stationary processes, appropriate characterizations of such processes are worked upon little. After discussing an elaborate motivation behind this thesis and presenting preliminaries in Chapter 1, we characterize, in Chapter 2, the stationary marginal distributions of certain non-linear multivariate stationary processes. To do so, we show that the stationary marginal distributions of these processes belong to specific skew-distribution families, and for a given skew-distribution from the corresponding family, a process, with stationary marginal distribution identical to that given skew-distribution, can be found. While conventional time series analysis greatly depends on the assumption of stationarity, measurements taken from many physical systems, which consist of both periodicity and randomness, often exhibit cyclostationarity (i.e., a periodic structure in their first- and second-order moments). Identifying the hourly global horizontal irradiances (GHIs), collected at a solar monitoring station of Saudi Arabia, as a cyclostationary process and considering the significant impact of that on the energy production in Saudi Arabia, Chapter 3 provides a temporal model of GHIs. Chapter 4 extends the analysis to a spatio-temporal cyclostationary modeling of 45 different solar monitoring stations of the Kingdom. Both the proposed models are shown to produce better forecasts, more realistic simulations, and reliable photovoltaic power estimates in comparison to a classical model that fails to recognize the GHI data as cyclostationary. Chapter 5 extends the notion of cyclostationarity to a novel and flexible class of processes, coined evolving period and amplitude cyclostationary (EPACS) processes, that allows periods and amplitudes of the mean and covariance functions to evolve and, therefore, accommodates a much larger class of processes than the cyclostationary processes. Thereafter, we investigate its properties, provide methodologies for statistical inference, and illustrate the presented methods using a simulation study and a real data example, from the heavens, of the magnitudes of the light emitted from the variable star R Hydrae. Finally, Chapter 6 summarizes the findings of the thesis and discusses its significance and possible future extensions.
    • The Bayesian Learning Rule

      Khan, Mohammad Emtiyaz; Rue, Haavard (arXiv, 2021-07-09) [Preprint]
      We show that many machine-learning algorithms are specific instances of a single algorithm called the Bayesian learning rule. The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models. This includes classical algorithms such as ridge regression, Newton's method, and Kalman filter, as well as modern deep-learning algorithms such as stochastic-gradient descent, RMSprop, and Dropout. The key idea in deriving such algorithms is to approximate the posterior using candidate distributions estimated by using natural gradients. Different candidate distributions result in different algorithms and further approximations to natural gradients give rise to variants of those algorithms. Our work not only unifies, generalizes, and improves existing algorithms, but also helps us design new ones.
    • Competition on Spatial Statistics for Large Datasets

      Huang, Huang; Abdulah, Sameh; Sun, Ying; Ltaief, Hatem; Keyes, David E.; Genton, Marc G. (Journal of Agricultural, Biological and Environmental Statistics, Springer Science and Business Media LLC, 2021-07-08) [Article]
      As spatial datasets are becoming increasingly large and unwieldy, exact inference on spatial models becomes computationally prohibitive. Various approximation methods have been proposed to reduce the computational burden. Although comprehensive reviews on these approximation methods exist, comparisons of their performances are limited to small and medium sizes of datasets for a few selected methods. To achieve a comprehensive comparison comprising as many methods as possible, we organized the Competition on Spatial Statistics for Large Datasets. This competition had the following novel features: (1) we generated synthetic datasets with the ExaGeoStat software so that the number of generated realizations ranged from 100 thousand to 1 million; (2) we systematically designed the data-generating models to represent spatial processes with a wide range of statistical properties for both Gaussian and non-Gaussian cases; (3) the competition tasks included both estimation and prediction, and the results were assessed by multiple criteria; and (4) we have made all the datasets and competition results publicly available to serve as a benchmark for other approximation methods. In this paper, we disclose all the competition details and results along with some analysis of the competition outcomes.
    • Ridge-penalized adaptive Mantel test and its application in imaging genetics

      Pluta, Dustin; Shen, Tong; Xue, Gui; Chen, Chuansheng; Ombao, Hernando; Yu, Zhaoxia (Statistics in Medicine, Wiley, 2021-07-02) [Article]
      We propose a ridge-penalized adaptive Mantel test (AdaMant) for evaluating the association of two high-dimensional sets of features. By introducing a ridge penalty, AdaMant tests the association across many metrics simultaneously. We demonstrate how ridge penalization bridges Euclidean and Mahalanobis distances and their corresponding linear models from the perspective of association measurement and testing. This result is not only theoretically interesting but also has important implications in penalized hypothesis testing, especially in high-dimensional settings such as imaging genetics. Applying the proposed method to an imaging genetic study of visual working memory in healthy adults, we identified interesting associations of brain connectivity (measured by electroencephalogram coherence) with selected genetic features.
    • Practical strategies for GEV-based regression models for extremes

      Castro-Camilo, Daniela; Huser, Raphaël; Rue, Haavard (arXiv, 2021-06-24) [Preprint]
      The generalised extreme value (GEV) distribution is a three parameter family that describes the asymptotic behaviour of properly renormalised maxima of a sequence of independent and identically distributed random variables. If the shape parameter ξ is zero, the GEV distribution has unbounded support, whereas if ξ is positive, the limiting distribution is heavy-tailed with infinite upper endpoint but finite lower endpoint. In practical applications, we assume that the GEV family is a reasonable approximation for the distribution of maxima over blocks, and we fit it accordingly. This implies that GEV properties, such as finite lower endpoint in the case ξ > 0, are inherited by the finite-sample maxima, which might not have bounded support. This is particularly problematic when predicting extreme observations based on multiple and interacting covariates. To tackle this usually overlooked issue, we propose a blended GEV distribution, which smoothly combines the left tail of a Gumbel distribution (GEV with ξ = 0) with the right tail of a Fréchet distribution (GEV with ξ > 0) and, therefore, has unbounded support. Using a Bayesian framework, we reparametrise the GEV distribution to offer a more natural interpretation of the (possibly covariate-dependent) model parameters. Independent priors over the new location and spread parameters induce a joint prior distribution for the original location and scale parameters. We introduce the concept of property-preserving penalised complexity (P3C) priors and apply it to the shape parameter to preserve first and second moments. We illustrate our methods with an application to NO2 pollution levels in California, which reveals the robustness of the bGEV distribution, as well as the suitability of the new parametrisation and the P3C prior framework.
    • Smart Gradient -- An Adaptive Technique for Improving Gradient Estimation

      Fattah, Esmail Abdul; Niekerk, Janet Van; Rue, Haavard (arXiv, 2021-06-14) [Preprint]
      Computing the gradient of a function provides fundamental information about its behavior. This information is essential for several applications and algorithms across various fields. One common application that require gradients are optimization techniques such as stochastic gradient descent, Newton's method and trust region methods. However, these methods usually requires a numerical computation of the gradient at every iteration of the method which is prone to numerical errors. We propose a simple limited-memory technique for improving the accuracy of a numerically computed gradient in this gradient-based optimization framework by exploiting (1) a coordinate transformation of the gradient and (2) the history of previously taken descent directions. The method is verified empirically by extensive experimentation on both test functions and on real data applications. The proposed method is implemented in the R package smartGrad and in C++.
    • Lagrangian Spatio-Temporal Covariance Functions for Multivariate Nonstationary Random Fields

      Salvaña, Mary Lai O. (2021-06-14) [Thesis]
      Advisor: Genton, Marc G.
      Committee members: Ombao, Hernando; Sang, Huiyan; Stenchikov, Georgiy L.
      The modeling of spatio-temporal and multivariate spatial random fields has been an important and growing area of research due to the increasing availability of spacetime-referenced data in a large number of scientific applications. In geostatistics, the covariance function plays a crucial role in describing the spatio-temporal dependence in the data and is key to statistical modeling, inference, stochastic simulation and prediction. Therefore, the development of flexible covariance models, which can accomodate the inherent variability of the real data, is necessary for an advantageous modeling of random fields. This thesis is composed of four significant contributions in the development and applications of new covariance models for stationary multivariate spatial processes, and nonstationary spatial and spatio-temporal processes. The first focus of the thesis is on modeling of stationary multivariate spatial random fields through flexible multivariate covariance functions. Chapter 2 proposes a semiparametric approach for multivariate covariance function estimation with flexible specification of the cross-covariance functions via their spectral representations. The proposed method is applied to model and predict the bivariate data of particulate matter concentration (PM2.5) and wind speed (WS) in the United States. Chapter 3 introduces a parametric class of multivariate covariance functions with asymmetric cross-covariance functions. The proposed covariance model is applied to analyze the asymmetry and perform prediction in a trivariate data of PM2.5, WS and relative humidity (RH) in the United States. The second focus of the thesis is on nonstationary spatial and spatio-temporal random fields. Chapter 4 presents a space deformation method which imparts nonstationarity to any stationary covariance function. The proposed method utilizes the functional data registration algorithm and classical multidimensional scaling to estimate the spatial deformation. The application of the proposed method is demonstrated on a precipitation data. Finally, chapter 5 proposes a parametric class of time-varying spatio-temporal covariance functions, which are nonstationary in time. The proposed class is a time-varying generalization of an existing nonseparable stationary class of spatio-temporal covariance functions. The proposed time-varying model is then used to study the seasonality effect and perform space-time predictions in the daily PM2.5 data from Oregon, United States.
    • Markov-Switching State-Space Models with Applications to Neuroimaging

      Degras, David; Ting, Chee-Ming; Ombao, Hernando (arXiv, 2021-06-09) [Preprint]
      State-space models (SSM) with Markov switching offer a powerful framework for detecting multiple regimes in time series, analyzing mutual dependence and dynamics within regimes, and asserting transitions between regimes. These models however present considerable computational challenges due to the exponential number of possible regime sequences to account for. In addition, high dimensionality of time series can hinder likelihood-based inference. This paper proposes novel statistical methods for Markov-switching SSMs using maximum likelihood estimation, Expectation-Maximization (EM), and parametric bootstrap. We develop solutions for initializing the EM algorithm, accelerating convergence, and conducting inference that are ideally suited to massive spatio-temporal data such as brain signals. We evaluate these methods in simulations and present applications to EEG studies of epilepsy and of motor imagery. All proposed methods are implemented in a MATLAB toolbox available at https://github.com/ddegras/switch-ssm.
    • Copula-based multiple indicator kriging for non-Gaussian random fields

      Agarwal, Gaurav; Sun, Ying; Wang, Huixia J. (Spatial Statistics, Elsevier BV, 2021-06-09) [Article]
      In spatial statistics, the kriging predictor is the best linear predictor at unsampled locations, but not the optimal predictor for non-Gaussian processes. In this paper, we introduce a copula-based multiple indicator kriging model for the analysis of non-Gaussian spatial data by thresholding the spatial observations at a given set of quantile values. The proposed copula model allows for flexible marginal distributions while modeling the spatial dependence via copulas. We show that the covariances required by kriging have a direct link to the chosen copula function. We then develop a semiparametric estimation procedure. The proposed method provides the entire predictive distribution function at a new location, and thus allows for both point and interval predictions. The proposed method demonstrates better predictive performance than the commonly used variogram approach and Gaussian kriging in the simulation studies. We illustrate our methods on precipitation data in Spain during November 2019, and heavy metal dataset in topsoil along the river Meuse, and obtain probability exceedance maps.
    • A Data-Driven Soft Sensor for Swarm Motion Speed Prediction using Ensemble Learning Methods

      Khaldi, Belkacem; Harrou, Fouzi; Benslimane, Sidi Mohammed; Sun, Ying (IEEE Sensors Journal, IEEE, 2021-06-08) [Article]
      Machine Learning (ML) for swarm motion prediction is a relatively unexplored area that could help sustain and monitor daily swarm robotics collective tasks. This paper focuses on a specific application of swarm robotics which is pattern formation, to demonstrate the ability of Ensemble Learning (EL) approaches to predict the motion speed of swarm robots. Specifically, the boosted trees (BST) and bagged trees (BT) algorithms are introduced to predict the motion speed of a swarm of miniature two-wheels differential driver mobile robots performing a circle-formation via the viscoelastic control model. This choice’s motivation is due to EL-based models’ ability to improve the performance of ML models by combining multiple learners versus single regressors. Both BST and BT algorithms’ performances are compared to ten commonly known prediction models based on Support Vector Regressors (SVRs) and Gaussian Process Regressors (GPRs) with different kernel functions. Using simulated measurements recorded every 0.1 second from the robots’ sensors, we demonstrate the effectiveness of the developed methods over conventional ML models (SVR and GPR) in a free/non-free obstacles environment. Results showed that the BST and BT regression models reached the highest prediction performance with fully and partially connected swarms and even when involving different swarm sizes.