### Recent Submissions

• #### High Performance Multivariate Geospatial Statistics on Manycore Systems

(IEEE Transactions on Parallel and Distributed Systems, IEEE, 2021-04-06) [Article]
Modeling and inferring spatial relationships and predicting missing values of environmental data are some of the main tasks of geospatial statisticians. These routine tasks are accomplished using multivariate geospatial models and the cokriging technique, which requires the evaluation of the expensive Gaussian log-likelihood function. This large-scale cokriging challenge provides a fertile ground for supercomputing implementations for the geospatial statistics community as it is paramount to scale computational capability to match the growth in environmental data. In this paper, we develop large-scale multivariate spatial modeling and inference on parallel hardware architectures. To tackle the increasing complexity in matrix operations and the massive concurrency in parallel systems, we leverage low-rank matrix approximation techniques with task-based programming models and schedule the asynchronous computational tasks using a dynamic runtime system. The proposed framework provides both the dense and approximated computations of the Gaussian log-likelihood function. It demonstrates accuracy robustness and performance scalability on a variety of computer systems. Using both synthetic and real datasets, the low-rank matrix approximation shows better performance compared to exact computation, while preserving the application requirements in both parameter estimation and prediction accuracy. We also propose a novel algorithm to assess the prediction accuracy after the online parameter estimation.
• #### Spectral Dependence

(arXiv, 2021-03-31) [Preprint]
This paper presents a general framework for modeling dependence in multivariate time series. Its fundamental approach relies on decomposing each signal in a system into various frequency components and then studying the dependence properties through these oscillatory activities.The unifying theme across the paper is to explore the strength of dependence and possible lead-lag dynamics through filtering. The proposed framework is capable of representing both linear and non-linear dependencies that could occur instantaneously or after some delay(lagged dependence). Examples for studying dependence between oscillations are illustrated through multichannel electroencephalograms. These examples emphasized that some of the most prominent frequency domain measures such as coherence, partial coherence,and dual-frequency coherence can be derived as special cases under this general framework.This paper also introduces related approaches for modeling dependence through phase-amplitude coupling and causality of (one-sided) filtered signals.
• #### The Negative Binomial Process: A Tractable Model with Composite Likelihood-Based Inference

(Scandinavian Journal of Statistics, Wiley, 2021-03-24) [Article]
We propose a log-linear Poisson regression model driven by a stationary latent gamma autoregression. This process has negative binomial (NB) marginals to analyze overdispersed count time series data. Estimation and statistical inference are performed using a composite (CL) likelihood function. We establish theoretical properties of the proposed count model, in particular, the strong consistency and asymptotic normality of the maximum CL estimator. A procedure for calculating the standard error of the parameter estimator and confidence intervals is derived based on the parametric bootstrap. Monte Carlo experiments were conducted to study and compare the finite-sample properties of the proposed estimators. The simulations demonstrate that, compared to the approach that combines generalized linear models with the ordinary least squares method, the proposed composite likelihood approach provides satisfactory results for estimating the parameters related to the correlation structure of the process, even under model misspecification. An empirical illustration of the NB process is presented for the monthly number of viral hepatitis cases in Goiânia (capital and largest city of the Brazilian state of Goiás) from January 2001 to December 2018.
• #### Conditional normal extreme-value copulas

(Extremes, Springer Science and Business Media LLC, 2021-03-19) [Article]
We propose a new class of extreme-value copulas which are extreme-value limits of conditional normal models. Conditional normal models are generalizations of conditional independence models, where the dependence among observed variables is modeled using one unobserved factor. Conditional on this factor, the distribution of these variables is given by the Gaussian copula. This structure allows one to build flexible and parsimonious models for data with complex dependence structures, such as data with spatial dependence or factor structure. We study the extreme-value limits of these models and show some interesting special cases of the proposed class of copulas. We develop estimation methods for the proposed models and conduct a simulation study to assess the performance of these algorithms. Finally, we apply these copula models to analyze data on monthly wind maxima and stock return minima.
• #### Sparse Functional Boxplots for Multivariate Curves

(arXiv, 2021-03-14) [Preprint]
This paper introduces the sparse functional boxplot and the intensity sparse functional boxplot as practical exploratory tools that make visualization possible for both complete and sparse functional data. These visualization tools can be used either in the univariate or multivariate functional setting. The sparse functional boxplot, which is based on the functional boxplot, depicts sparseness characteristics in the envelope of the 50\% central region, the median curve, and the outliers. The proportion of missingness at each time index within the central region is colored in gray. The intensity sparse functional boxplot displays the relative intensity of sparse points in the central region, revealing where data are more or less sparse. The two-stage functional boxplot, a derivation from the functional boxplot to better detect outliers, is also extended to its sparse form. Several depth proposals for sparse multivariate functional data are evaluated and outlier detection is tested in simulations under various data settings and sparseness scenarios. The practical applications of the sparse functional boxplot and intensity sparse functional boxplot are illustrated with two public health datasets.
• #### A Generalized Heckman Model With Varying Sample Selection Bias and Dispersion Parameters

(Statistica Sinica, Statistica Sinica (Institute of Statistical Science), 2021-03-09) [Article]
Many proposals have emerged as alternatives to the Heckman selection model, mainly to address the non-robustness of its normal assumption. The 2001 Medical Expenditure Panel Survey data is often used to illustrate this non-robustness of the Heckman model. In this paper, we propose a generalization of the Heckman sample selection model by allowing the sample selection bias and dispersion parameters to depend on covariates. We show that the non-robustness of the Heckman model may be due to the assumption of the constant sample selection bias parameter rather than the normality assumption. Our proposed methodology allows us to understand which covariates are important to explain the sample selection bias phenomenon rather than to only form conclusions about its presence. Further, our approach may attenuate the non-identifiability and multicollinearity problems faced by the existing sample selection models. We explore the inferential aspects of the maximum likelihood estimators (MLEs) for our proposed generalized Heckman model. More specifically, we show that this model satisfies some regularity conditions such that it ensures consistency.
• #### Ridge-penalized adaptive Mantel test and its application in imaging genetics

(arXiv, 2021-03-03) [Preprint]
We propose a ridge-penalized adaptive Mantel test (AdaMant) for evaluating the association of two high-dimensional sets of features. By introducing a ridge penalty, AdaMant tests the association across many metrics simultaneously. We demonstrate how ridge penalization bridges Euclidean and Mahalanobis distances and their corresponding linear models from the perspective of association measurement and testing. This result is not only theoretically interesting but also has important implications in penalized hypothesis testing, especially in high dimensional settings such as imaging genetics. Applying the proposed method to an imaging genetic study of visual working memory in health adults, we identified interesting associations of brain connectivity (measured by EEG coherence) with selected genetic features.
• #### Statistical Modeling of Non-Stationary Heatwave Hazard

(Copernicus GmbH, 2021-03-03) [Preprint]
The modeling of spatio-temporal trends in temperature extremes can help better understand the structure and frequency of heatwaves in a changing climate, as well as their environmental, societal, economic and global health-related risks. Here, we study annual temperature maxima over Southern Europe using a century-spanning dataset observed at 44 monitoring stations. Extending the spectral representation of max-stable processes, our modeling framework relies on a novel construction of max-infinitely divisible processes, which include covariates to capture spatio-temporal non-stationarities. Our new model keeps a popular max-stable process on the boundary of the parameter space, while flexibly capturing weakening extremal dependence at increasing quantile levels and asymptotic independence. It clearly outperforms natural alternative models. Results show that the spatial extent of heatwaves is smaller for more severe events at higher altitudes and that recent heatwaves are moderately wider. Our probabilistic assessment of the 2019 annual maxima confirms the severity of the 2019 heatwaves both spatially and at individual sites, especially when compared to climatic conditions prevailing in 1950-1975. Our applied results may be exploited in practice to understand the spatio-temporal dynamics, severity, and frequency of extreme heatwaves, and design suitable regional mitigation measures.
• #### Importance Sampling with the Integrated Nested Laplace Approximation

(arXiv, 2021-03-03) [Preprint]
The Integrated Nested Laplace Approximation (INLA) is a deterministic approach to Bayesian inference on latent Gaussian models (LGMs) and focuses on fast and accurate approximation of posterior marginals for the parameters in the models. Recently, methods have been developed to extend this class of models to those that can be expressed as conditional LGMs by fixing some of the parameters in the models to descriptive values. These methods differ in the manner descriptive values are chosen. This paper proposes to combine importance sampling with INLA (IS-INLA), and extends this approach with the more robust adaptive multiple importance sampling algorithm combined with INLA (AMIS-INLA). This paper gives a comparison between these approaches and existing methods on a series of applications with simulated and observed datasets and evaluates their performance based on accuracy, efficiency, and robustness. The approaches are validated by exact posteriors in a simple bivariate linear model; then, they are applied to a Bayesian lasso model, a Bayesian imputation of missing covariate values, and lastly, in parametric Bayesian quantile regression. The applications show that the AMIS-INLA approach, in general, outperforms the other methods, but the IS-INLA algorithm could be considered for faster inference when good proposals are available.
• #### Time-varying $\ell_0$ optimization for Spike Inference from Multi-Trial Calcium Recordings

(arXiv, 2021-03-02) [Preprint]
Optical imaging of genetically encoded calcium indicators is a powerful tool to record the activity of a large number of neurons simultaneously over a long period of time from freely behaving animals. However, determining the exact time at which a neuron spikes and estimating the underlying firing rate from calcium fluorescence data remains challenging, especially for calcium imaging data obtained from a longitudinal study. We propose a multi-trial time-varying $\ell_0$ penalized method to jointly detect spikes and estimate firing rates by robustly integrating evolving neural dynamics across trials. Our simulation study shows that the proposed method performs well in both spike detection and firing rate estimation. We demonstrate the usefulness of our method on calcium fluorescence trace data from two studies, with the first study showing differential firing rate functions between two behaviors and the second study showing evolving firing rate function across trials due to learning.
• #### Parallel Hierarchical Matrix Technique to Approximate Large Covariance Matrices, Likelihood Functions and Parameter Identi fication

(2021-03-01) [Presentation]
We develop the HLIBCov package, which is using parallel hierarchical (H-) matrices to: 1) Approximate large dense inhomogeneous covariance matrices with a log-linear computational cost and storage requirement. 2) Compute matrix-vector product, Cholesky factorization and inverse with a log-linear complexity. 3) Identify unknown parameters of the covariance function (variance, smoothness, and covariance length). These unknown parameters are estimated by maximizing the joint Gaussian log-likelihood function. To demonstrate the numerical performance, we identify three unknown parameters in an example with 2,000,000 locations on a PC-desktop.
• #### A principled distance-based prior for the shape of the Weibull model

(Statistics & Probability Letters, Elsevier BV, 2021-03) [Article]
We propose a principled prior for the shape parameter of the Weibull model, based on a distance. It can be used as a default non-objective choice, in complex models with a Weibull modeling component and is implemented in the R-INLA package.
• #### Exact Simulation of Max-Infinitely Divisible Processes

(arXiv, 2021-02-28) [Preprint]
Max-infinitely divisible (max-id) processes play a central role in extreme-value theory and include the subclass of all max-stable processes. They allow for a constructive representation based on the componentwise maximum of random functions drawn from a Poisson point process defined on a suitable functions space. Simulating from a max-id process is often difficult due to its complex stochastic structure, while calculating its joint density in high dimensions is often numerically infeasible. Therefore, exact and efficient simulation techniques for max-id processes are useful tools for studying the characteristics of the process and for drawing statistical inferences. Inspired by the simulation algorithms for max-stable processes, we here develop theory and algorithms to generalize simulation approaches tailored for certain flexible (existing or new) classes of max-id processes. Efficient simulation for a large class of models can be achieved by implementing an adaptive rejection sampling scheme to sidestep a numerical integration step in the algorithm. We present the results of a simulation study highlighting that our simulation algorithm works as expected and is highly accurate and efficient, such that it clearly outperforms customary approximate sampling schemes. As a byproduct, we also develop here new max-id models, which can be represented as pointwise maxima of general location scale mixtures, and which possess flexible tail dependence structures capturing a wide range of asymptotic dependence scenarios.
• #### Statistical Inference for Local Granger Causality

(arXiv, 2021-02-27) [Preprint]
Granger causality has been employed to investigate causality relations between components of stationary multiple time series. Here, we generalize this concept by developing statistical inference for local Granger causality for multivariate locally stationary processes. Thus, our proposed local Granger causality approach captures time-evolving causality relationships in nonstationary processes. The proposed local Granger causality is well represented in the frequency domain and estimated based on the parametric time-varying spectral density matrix using the local Whittle likelihood. Under regularity conditions, we demonstrate that the estimators converge weakly to a Gaussian process. Additionally, the test statistic for the local Granger causality is shown to be asymptotically distributed as a quadratic form of a multivariate normal distribution. The finite sample performance is confirmed with several simulation studies for multivariate time-varying VAR models. For practical demonstration, the proposed local Granger causality method uncovered new functional connectivity relationships between channels in brain signals. Moreover, the method was able to identify structural changes of Granger causality in financial data.
• #### Vector Autoregressive Models with Spatially Structured Coefficients for Time Series on a Spatial Grid

(Journal of Agricultural, Biological and Environmental Statistics, Springer Science and Business Media LLC, 2021-02-26) [Article]
Motivated by the need to analyze readily available data collected in space and time, especially in environmental sciences, we propose a parsimonious spatiotemporal model for time series data on a spatial grid. In essence, our model is a vector autoregressive model that utilizes the spatial structure to achieve parsimony of autoregressive matrices at two levels. The first level ensures the sparsity of the autoregressive matrices using a lagged-neighborhood scheme. The second level performs a spatial clustering of the nonzero autoregressive coefficients such that within some subregions, nearby locations share the same autoregressive coefficients while across different subregions the coefficients may have distinct values. The model parameters are estimated using the penalized maximum likelihood with an adaptive fused Lasso penalty. The estimation procedure can be tailored to accommodate the need and prior knowledge of a modeler. Performance of the proposed estimation algorithm is examined in a simulation study. Our method gives reliable estimation results that are interpretable and especially useful to identify geographical subregions, within each of which, the time series have similar dynamical behavior with homogeneous autoregressive coefficients. We apply our model to a wind speed time series dataset generated from a climate model over Saudi Arabia to illustrate its power in explaining the dynamics by the spatially structured coefficients. Moreover, the estimated model can be useful for building stochastic weather generators as an approximation of the computationally expensive climate model.
• #### FjORD: Fair and Accurate Federated Learning under heterogeneous targets with Ordered Dropout

(arXiv, 2021-02-26) [Preprint]
Federated Learning (FL) has been gaining significant traction across different ML tasks, ranging from vision to keyboard predictions. In large-scale deployments, client heterogeneity is a fact, and constitutes a primary problem for fairness, training performance and accuracy. Although significant efforts have been made into tackling statistical data heterogeneity, the diversity in the processing capabilities and network bandwidth of clients, termed as system heterogeneity, has remained largely unexplored. Current solutions either disregard a large portion of available devices or set a uniform limit on the model's capacity, restricted by the least capable participants. In this work, we introduce Ordered Dropout, a mechanism that achieves an ordered, nested representation of knowledge in Neural Networks and enables the extraction of lower footprint submodels without the need of retraining. We further show that for linear maps our Ordered Dropout is equivalent to SVD. We employ this technique, along with a self-distillation methodology, in the realm of FL in a framework called FjORD. FjORD alleviates the problem of client system heterogeneity by tailoring the model width to the client's capabilities. Extensive evaluation on both CNNs and RNNs across diverse modalities shows that FjORD consistently leads to significant performance gains over state-of-the-art baselines, while maintaining its nested structure.
• #### Hyperparameter Transfer Learning with Adaptive Complexity

(arXiv, 2021-02-25) [Preprint]
Bayesian optimization (BO) is a sample efficient approach to automatically tune the hyperparameters of machine learning models. In practice, one frequently has to solve similar hyperparameter tuning problems sequentially. For example, one might have to tune a type of neural network learned across a series of different classification problems. Recent work on multi-task BO exploits knowledge gained from previous tuning tasks to speed up a new tuning task. However, previous approaches do not account for the fact that BO is a sequential decision making procedure. Hence, there is in general a mismatch between the number of evaluations collected in the current tuning task compared to the number of evaluations accumulated in all previously completed tasks. In this work, we enable multi-task BO to compensate for this mismatch, such that the transfer learning procedure is able to handle different data regimes in a principled way. We propose a new multi-task BO method that learns a set of ordered, non-linear basis functions of increasing complexity via nested drop-out and automatic relevance determination. Experiments on a variety of hyperparameter tuning problems show that our method improves the sample ef
• #### Smooth Online Parameter Estimation for time varying VAR models with application to rat's LFP data

(arXiv, 2021-02-24) [Preprint]
Multivariate time series data appear often as realizations of non-stationary processes where the covariance matrix or spectral matrix smoothly evolve over time. Most of the current approaches estimate the time-varying spectral properties only retrospectively - that is, after the entire data has been observed. Retrospective estimation is a major limitation in many adaptive control applications where it is important to estimate these properties and detect changes in the system as they happen in real-time. One major obstacle in online estimation is the computational cost due to the high-dimensionality of the parameters. Existing methods such as the Kalman filter or local least squares are feasible. However, they are not always suitable because they provide noisy estimates and can become prohibitively costly as the dimension of the time series increases. In our brain signal application, it is critical to develop a robust method that can estimate, in real-time, the properties of the underlying stochastic process, in particular, the spectral brain connectivity measures. For these reasons we propose a new smooth online parameter estimation approach (SOPE) that has the ability to control for the smoothness of the estimates with a reasonable computational complexity. Consequently, the models are fit in real-time even for high dimensional time series. We demonstrate that our proposed SOPE approach is as good as the Kalman filter in terms of mean-squared error for small dimensions. However, unlike the Kalman filter, the SOPE has lower computational cost and hence scalable for higher dimensions. Finally, we apply the SOPE method to a rat's local field potential data during a hippocampus-dependent sequence-memory task. As demonstrated in the video, the proposed SOPE method is able to capture the dynamics of the connectivity as the rat performs the sequence of non-spatial working memory tasks.
• #### Brain Waves Analysis Via a Non-parametric Bayesian Mixture of Autoregressive Kernels

(arXiv, 2021-02-23) [Preprint]
The standard approach to analyzing brain electrical activity is to examine the spectral density function (SDF) and identify predefined frequency bands that have the most substantial relative contributions to the overall variance of the signal. However, a limitation of this approach is that the precise frequency and bandwidth of oscillations vary with cognitive demands. Thus they should not be arbitrarily defined a priori in an experiment. In this paper, we develop a data-driven approach that identifies (i) the number of prominent peaks, (ii) the frequency peak locations, and (iii) their corresponding bandwidths (or spread of power around the peaks). We propose a Bayesian mixture auto-regressive decomposition method (BMARD), which represents the standardized SDFas a Dirichlet process mixture based on a kernel derived from second-order auto-regressive processes which completely characterize the location (peak)and scale (bandwidth) parameters. We present a Metropolis-Hastings within Gibbs algorithm to sample from the posterior distribution of the mixture parameters. Simulation studies demonstrate the robustness and performance of the BMARD method. Finally, we use the proposed BMARD method to analyze local field potential (LFP) activity from the hippocampus of laboratory rats across different conditions in a non-spatial sequence memory experiment to identify the most interesting frequency bands and examine the link between specific patterns of activity and trial-specific cognitive demands.
• #### Joint tracking of multiple quantiles through conditional quantiles

(Information Sciences, Elsevier BV, 2021-02-21) [Article]
The estimation of quantiles is one of the most fundamental data mining tasks. As most real-time data streams vary dynamically over time, there is a quest for adaptive quantile estimators. The most well-known type of adaptive quantile estimators is the incremental one which documents the state-of-the art performance in tracking quantiles. However, the absolute vast majority of incremental quantile estimators fail to jointly estimate multiple quantiles in a consistent manner without violating the monotone property of quantiles. In this paper, first we introduce the concept of conditional quantiles that can be used to extend incremental estimators to jointly track multiple quantiles. Second, we resort to the concept of conditional quantiles to propose two new estimators. Extensive experimental results, based on both synthetic and real-life data, show that the proposed estimators clearly outperform legacy state-of-the-art joint quantile tracking algorithms in terms of accuracy while achieving faster adaptivity in the face of dynamically varying data streams.