### Recent Submissions

• #### INLA goes extreme: Bayesian tail regression for the estimation of high spatio-temporal quantiles

(Springer Nature, 2018-05-25)
This work is motivated by the challenge organized for the 10th International Conference on Extreme-Value Analysis (EVA2017) to predict daily precipitation quantiles at the 99.8% level for each month at observed and unobserved locations. Our approach is based on a Bayesian generalized additive modeling framework that is designed to estimate complex trends in marginal extremes over space and time. First, we estimate a high non-stationary threshold using a gamma distribution for precipitation intensities that incorporates spatial and temporal random effects. Then, we use the Bernoulli and generalized Pareto (GP) distributions to model the rate and size of threshold exceedances, respectively, which we also assume to vary in space and time. The latent random effects are modeled additively using Gaussian process priors, which provide high flexibility and interpretability. We develop a penalized complexity (PC) prior specification for the tail index that shrinks the GP model towards the exponential distribution, thus preventing unrealistically heavy tails. Fast and accurate estimation of the posterior distributions is performed thanks to the integrated nested Laplace approximation (INLA). We illustrate this methodology by modeling the daily precipitation data provided by the EVA2017 challenge, which consist of observations from 40 stations in the Netherlands recorded during the period 1972–2016. Capitalizing on INLA’s fast computational capacity and powerful distributed computing resources, we conduct an extensive cross-validation study to select the model parameters that govern the smoothness of trends. Our results clearly outperform simple benchmarks and are comparable to the best-scoring approaches of the other teams.
• #### A note on intrinsic conditional autoregressive models for disconnected graphs

(Elsevier BV, 2018-05-23)
In this note we discuss (Gaussian) intrinsic conditional autoregressive (CAR) models for disconnected graphs, with the aim of providing practical guidelines for how these models should be defined, scaled and implemented. We show how these suggestions can be implemented in two examples, on disease mapping.
• #### Vision-based Human Action Classification Using Adaptive Boosting Algorithm

(Institute of Electrical and Electronics Engineers (IEEE), 2018-05-07)
Precise recognition of human action is a key enabler for the development of many applications including autonomous robots for medical diagnosis and surveillance of elderly people in home environment. This paper addresses the human action recognition based on variation in body shape. Specifically, we divide the human body into five partitions that correspond to five partial occupancy areas. For each frame, we calculated area ratios and used them as input data for recognition stage. Here, we consider six classes of activities namely: walking, standing, bending, lying, squatting, and sitting. In this paper, we proposed an efficient human action recognition scheme, which takes advantages of superior discrimination capacity of AdaBoost algorithm. We validated the effectiveness of this approach by using experimental data from two publicly available databases fall detection databases from the University of Rzeszow’s and the Universidad de Málaga fall detection datasets. We provided comparisons of the proposed approach with state-of-the-art classifiers based on the neural network, K-nearest neighbor, support vector machine and naïve Bayes and showed that we achieve better results in discriminating human gestures.
• #### Spectral synchronicity in brain signals

(Wiley, 2018-05-04)
This paper addresses the problem of identifying brain regions with similar oscillatory patterns detected from electroencephalograms. We introduce the hierarchical spectral merger (HSM) clustering method where the feature of interest is the spectral curve and the similarity metric used is the total variance distance. The HSM method is compared with clustering using features derived from independent-component analysis. Moreover, the HSM method is applied to 2 different electroencephalogram datasets. The first was recorded at resting state where the participant was not engaged in any cognitive task; the second was recorded during a spontaneous epileptic seizure. The results of the analyses using the HSM method demonstrate that clustering could evolve over the duration of the resting state and during epileptic seizure.
• #### Obstacle Detection for Intelligent Transportation Systems Using Deep Stacked Autoencoder and k-Nearest Neighbor Scheme

(Institute of Electrical and Electronics Engineers (IEEE), 2018-04-30)
Obstacle detection is an essential element for the development of intelligent transportation systems so that accidents can be avoided. In this study, we propose a stereovisionbased method for detecting obstacles in urban environment. The proposed method uses a deep stacked auto-encoders (DSA) model that combines the greedy learning features with the dimensionality reduction capacity and employs an unsupervised k-nearest neighbors algorithm (KNN) to accurately and reliably detect the presence of obstacles. We consider obstacle detection as an anomaly detection problem. We evaluated the proposed method by using practical data from three publicly available datasets, the Malaga stereovision urban dataset (MSVUD), the Daimler urban segmentation dataset (DUSD), and Bahnhof dataset. Also, we compared the efficiency of DSA-KNN approach to the deep belief network (DBN)-based clustering schemes. Results show that the DSA-KNN is suitable to visually monitor urban scenes.
• #### Linear factor copula models and their properties

(Wiley, 2018-04-25)
We consider a special case of factor copula models with additive common factors and independent components. These models are flexible and parsimonious with O(d) parameters where d is the dimension. The linear structure allows one to obtain closed form expressions for some copulas and their extreme‐value limits. These copulas can be used to model data with strong tail dependencies, such as extreme data. We study the dependence properties of these linear factor copula models and derive the corresponding limiting extreme‐value copulas with a factor structure. We show how parameter estimates can be obtained for these copulas and apply one of these copulas to analyse a financial data set.
• #### The Hierarchical Spectral Merger Algorithm: A New Time Series Clustering Procedure

(Springer Nature, 2018-04-12)
We present a new method for time series clustering which we call the Hierarchical Spectral Merger (HSM) method. This procedure is based on the spectral theory of time series and identifies series that share similar oscillations or waveforms. The extent of similarity between a pair of time series is measured using the total variation distance between their estimated spectral densities. At each step of the algorithm, every time two clusters merge, a new spectral density is estimated using the whole information present in both clusters, which is representative of all the series in the new cluster. The method is implemented in an R package HSMClust. We present two applications of the HSM method, one to data coming from wave-height measurements in oceanography and the other to electroencefalogram (EEG) data.
• #### Model-based Quantile Regression for Discrete Data

(arXiv, 2018-04-10)
Quantile regression is a class of methods voted to the modelling of conditional quantiles. In a Bayesian framework quantile regression has typically been carried out exploiting the Asymmetric Laplace Distribution as a working likelihood. Despite the fact that this leads to a proper posterior for the regression coefficients, the resulting posterior variance is however affected by an unidentifiable parameter, hence any inferential procedure beside point estimation is unreliable. We propose a model-based approach for quantile regression that considers quantiles of the generating distribution directly, and thus allows for a proper uncertainty quantification. We then create a link between quantile regression and generalised linear models by mapping the quantiles to the parameter of the response variable, and we exploit it to fit the model with R-INLA. We extend it also in the case of discrete responses, where there is no 1-to-1 relationship between quantiles and distribution's parameter, by introducing continuous generalisations of the most common discrete variables (Poisson, Binomial and Negative Binomial) to be exploited in the fitting.
• #### Directional outlyingness for multivariate functional data

(Elsevier BV, 2018-04-07)
The direction of outlyingness is crucial to describing the centrality of multivariate functional data. Motivated by this idea, classical depth is generalized to directional outlyingness for functional data. Theoretical properties of functional directional outlyingness are investigated and the total outlyingness can be naturally decomposed into two parts: magnitude outlyingness and shape outlyingness which represent the centrality of a curve for magnitude and shape, respectively. This decomposition serves as a visualization tool for the centrality of curves. Furthermore, an outlier detection procedure is proposed based on functional directional outlyingness. This criterion applies to both univariate and multivariate curves and simulation studies show that it outperforms competing methods. Weather and electrocardiogram data demonstrate the practical application of our proposed framework.
• #### Statistical Monitoring of Changes to Land Cover

(Institute of Electrical and Electronics Engineers (IEEE), 2018-04-06)
Accurate detection of changes in land cover leads to better understanding of the dynamics of landscapes. This letter reports the development of a reliable approach to detecting changes in land cover based on remote sensing and radiometric data. This approach integrates the multivariate exponentially weighted moving average (MEWMA) chart with support vector machines (SVMs) for accurate and reliable detection of changes to land cover. Here, we utilize the MEWMA scheme to identify features corresponding to changed regions. Unfortunately, MEWMA schemes cannot discriminate between real changes and false changes. If a change is detected by the MEWMA algorithm, then we execute the SVM algorithm that is based on features corresponding to detected pixels to identify the type of change. We assess the effectiveness of this approach by using the remote-sensing change detection database and the SZTAKI AirChange benchmark data set. Our results show the capacity of our approach to detect changes to land cover.
• #### Bayesian Modeling of Air Pollution Extremes Using Nested Multivariate Max-Stable Processes

(arXiv, 2018-03-18)
Capturing the potentially strong dependence among the peak concentrations of multiple air pollutants across a spatial region is crucial for assessing the related public health risks. In order to investigate the multivariate spatial dependence properties of air pollution extremes, we introduce a new class of multivariate max-stable processes. Our proposed model admits a hierarchical tree-based formulation, in which the data are conditionally independent given some latent nested $\alpha$-stable random factors. The hierarchical structure facilitates Bayesian inference and offers a convenient and interpretable characterization. We fit this nested multivariate max-stable model to the maxima of air pollution concentrations and temperatures recorded at a number of sites in the Los Angeles area, showing that the proposed model succeeds in capturing their complex tail dependence structure.
• #### Statistical methods and challenges in connectome genetics

(Elsevier BV, 2018-03-12)
The study of genetic influences on brain connectivity, known as connectome genetics, is an exciting new direction of research in imaging genetics. We here review recent results and current statistical methods in this area, and discuss some of the persistent challenges and possible directions for future work.
• #### Proteome-level assessment of origin, prevalence and function of Leucine-Aspartic Acid (LD) motifs

(Cold Spring Harbor Laboratory, 2018-03-11)
Short Linear Motifs (SLiMs) contribute to almost every cellular function by connecting appropriate protein partners. Accurate prediction of SLiMs is difficult due to their shortness and sequence degeneracy. Leucine-aspartic acid (LD) motifs are SLiMs that link paxillin family proteins to factors controlling (cancer) cell adhesion, motility and survival. The existence and importance of LD motifs beyond the paxillin family is poorly understood. To enable a proteome-wide assessment of these motifs, we developed an active-learning based framework that iteratively integrates computational predictions with experimental validation. Our analysis of the human proteome identified a dozen proteins that contain LD motifs, all being involved in cell adhesion and migration, and revealed a new type of inverse LD motif consensus. Our evolutionary analysis suggested that LD motif signalling originated in the common unicellular ancestor of opisthokonts and amoebozoa by co-opting nuclear export sequences. Inter-species comparison revealed a conserved LD signalling core, and reveals the emergence of species-specific adaptive connections, while maintaining a strong functional focus of the LD motif interactome. Collectively, our data elucidate the mechanisms underlying the origin and adaptation of an ancestral SLiM.
• #### Dynamic Classification using Multivariate Locally Stationary Wavelet Processes

(Elsevier BV, 2018-03-11)
Methods for the supervised classification of signals generally aim to assign a signal to one class for its entire time span. In this paper we present an alternative formulation for multivariate signals where the class membership is permitted to change over time. Our aim therefore changes from classifying the signal as a whole to classifying the signal at each time point to one of a fixed number of known classes. We assume that each class is characterised by a different stationary generating process, the signal as a whole will however be nonstationary due to class switching. To capture this nonstationarity we use the recently proposed Multivariate Locally Stationary Wavelet model. To account for uncertainty in class membership at each time point our goal is not to assign a definite class membership but rather to calculate the probability of a signal belonging to a particular class. Under this framework we prove some asymptotic consistency results. This method is also shown to perform well when applied to both simulated and accelerometer data. In both cases our method is able to place a high probability on the correct class for the majority of time points.
• #### Reducing storage of global wind ensembles with stochastic generators

(Institute of Mathematical Statistics, 2018-03-09)
Wind has the potential to make a significant contribution to future energy resources. Locating the sources of this renewable energy on a global scale is however extremely challenging, given the difficulty to store very large data sets generated by modern computer models. We propose a statistical model that aims at reproducing the data-generating mechanism of an ensemble of runs via a Stochastic Generator (SG) of global annual wind data. We introduce an evolutionary spectrum approach with spatially varying parameters based on large-scale geographical descriptors such as altitude to better account for different regimes across the Earth’s orography. We consider a multi-step conditional likelihood approach to estimate the parameters that explicitly accounts for nonstationary features while also balancing memory storage and distributed computation. We apply the proposed model to more than 18 million points of yearly global wind speed. The proposed SG requires orders of magnitude less storage for generating surrogate ensemble members from wind than does creating additional wind fields from the climate model, even if an effective lossy data compression algorithm is applied to the simulation output.
• #### Scale and shape mixtures of multivariate skew-normal distributions

(Elsevier BV, 2018-02-26)
We introduce a broad and flexible class of multivariate distributions obtained by both scale and shape mixtures of multivariate skew-normal distributions. We present the probabilistic properties of this family of distributions in detail and lay down the theoretical foundations for subsequent inference with this model. In particular, we study linear transformations, marginal distributions, selection representations, stochastic representations and hierarchical representations. We also describe an EM-type algorithm for maximum likelihood estimation of the parameters of the model and demonstrate its implementation on a wind dataset. Our family of multivariate distributions unifies and extends many existing models of the literature that can be seen as submodels of our proposal.
• #### Principles for statistical inference on big spatio-temporal data from climate models

(Elsevier BV, 2018-02-24)
The vast increase in size of modern spatio-temporal datasets has prompted statisticians working in environmental applications to develop new and efficient methodologies that are still able to achieve inference for nontrivial models within an affordable time. Climate model outputs push the limits of inference for Gaussian processes, as their size can easily be larger than 10 billion data points. Drawing from our experience in a set of previous work, we provide three principles for the statistical analysis of such large datasets that leverage recent methodological and computational advances. These principles emphasize the need of embedding distributed and parallel computing in the inferential process.
• #### Spatial modelling with R-INLA: A review

(arXiv, 2018-02-18)
Coming up with Bayesian models for spatial data is easy, but performing inference with them can be challenging. Writing fast inference code for a complex spatial model with realistically-sized datasets from scratch is time-consuming, and if changes are made to the model, there is little guarantee that the code performs well. The key advantages of R-INLA are the ease with which complex models can be created and modified, without the need to write complex code, and the speed at which inference can be done even for spatial problems with hundreds of thousands of observations. R-INLA handles latent Gaussian models, where fixed effects, structured and unstructured Gaussian random effects are combined linearly in a linear predictor, and the elements of the linear predictor are observed through one or more likelihoods. The structured random effects can be both standard areal model such as the Besag and the BYM models, and geostatistical models from a subset of the Mat\'ern Gaussian random fields. In this review, we discuss the large success of spatial modelling with R-INLA and the types of spatial models that can be fitted, we give an overview of recent developments for areal models, and we give an overview of the stochastic partial differential equation (SPDE) approach and some of the ways it can be extended beyond the assumptions of isotropy and separability. In particular, we describe how slight changes to the SPDE approach leads to straight-forward approaches for non-stationary spatial models and non-separable space-time models.
• #### Model-based fault detection algorithm for photovoltaic system monitoring

(IEEE, 2018-02-12)
Reliable detection of faults in PV systems plays an important role in improving their reliability, productivity, and safety. This paper addresses the detection of faults in the direct current (DC) side of photovoltaic (PV) systems using a statistical approach. Specifically, a simulation model that mimics the theoretical performances of the inspected PV system is designed. Residuals, which are the difference between the measured and estimated output data, are used as a fault indicator. Indeed, residuals are used as the input for the Multivariate CUmulative SUM (MCUSUM) algorithm to detect potential faults. We evaluated the proposed method by using data from an actual 20 MWp grid-connected PV system located in the province of Adrar, Algeria.
• #### Enhanced dynamic data-driven fault detection approach: Application to a two-tank heater system

(IEEE, 2018-02-12)
Principal components analysis (PCA) has been intensively studied and used in monitoring industrial systems. However, data generated from chemical processes are usually correlated in time due to process dynamics, which makes the fault detection based on PCA approach a challenging task. Accounting for the dynamic nature of data can also reflect the performance of the designed fault detection approaches. In PCA-based methods, this dynamic characteristic of the data can be accounted for by using dynamic PCA (DPCA), in which lagged variables are used in the PCA model to capture the time evolution of the process. This paper presents a new approach that combines the DPCA to account for autocorrelation in data and generalized likelihood ratio (GLR) test to detect faults. A DPCA model is applied to perform dimension reduction while appropriately considering the temporal relationships in the data. Specifically, the proposed approach uses the DPCA to generate residuals, and then apply GLR test to reveal any abnormality. The performances of the proposed method are evaluated through a continuous stirred tank heater system.