Recent Submissions

  • Wind power prediction using bootstrap aggregating trees approach to enabling sustainable wind power integration in a smart grid

    Harrou, Fouzi; Saidi, Ahmed; Sun, Ying (Energy Conversion and Management, Elsevier BV, 2019-10-09) [Article]
    Precise prediction of wind power is important in sustainably integrating the wind power in a smart grid. The need for short-term predictions is increased with the increasing installed capacity. The main contribution of this work is adopting bagging ensembles of decision trees approach for wind power prediction. The choice of this regression approach is motivated by its ability to take advantage of many relatively weak single trees to reach a high prediction performance compared to single regressors. Moreover, it reduces the overall error and has the capacity to merge numerous models. The performance of bagged trees for predicting wind power has been compared to four commonly know prediction methods namely multivariate linear regression, support vector regression, principal component regression, and partial least squares regression. Real measurements recorded every ten minutes from an actual wind turbine are used to illustrate the prediction quality of the studied methods. Results showed that the bagged trees regression approach reached the highest prediction performance with a coefficient of determination of 0.982. The result showed that the bagged trees approach is followed by support vector regression with Gaussian kernel, the same model when using a quadratic kernel, and the multivariate linear regression, partial least squares, and principal component regression gave the lowest prediction. The investigated models in this study can represent a helpful tool for model-based anomaly detection in wind turbines.
  • Proteome-level assessment of origin, prevalence and function of Leucine-Aspartic Acid (LD) motifs.

    Alam, Tanvir; Alazmi, Meshari; Naser, Rayan Mohammad Mahmoud; Huser, Franceline; Momin, Afaque Ahmad Imtiyaz; Astro, Veronica; Hong, Seungbeom; Walkiewicz, Katarzyna Wiktoria; Canlas, Christian G; Huser, Raphaël; Ali, Amal J.; Merzaban, Jasmeen; Adamo, Antonio; Jaremko, Mariusz; Jaremko, Lukasz; Bajic, Vladimir B.; Gao, Xin; Arold, Stefan T. (Bioinformatics (Oxford, England), Oxford University Press (OUP), 2019-10-05) [Article]
    MOTIVATION:Leucine-aspartic acid (LD) motifs are short linear interaction motifs (SLiMs) that link paxillin family proteins to factors controlling cell adhesion, motility and survival. The existence and importance of LD motifs beyond the paxillin family is poorly understood. RESULTS:To enable a proteome-wide assessment of LD motifs, we developed an active-learning based framework (LDmotif finder; LDMF) that iteratively integrates computational predictions with experimental validation. Our analysis of the human proteome revealed a dozen new proteins containing LD motifs. We found that LD motif signalling evolved in unicellular eukaryotes more than 800 Myr ago, with paxillin and vinculin as core constituents, and nuclear export signal (NES) as a likely source of de novo LD motifs. We show that LD motif proteins form a functionally homogenous group, all being involved in cell morphogenesis and adhesion. This functional focus is recapitulated in cells by GFP-fused LD motifs, suggesting that it is intrinsic to the LD motif sequence, possibly through their effect on binding partners. Our approach elucidated the origin and dynamic adaptations of an ancestral SLiM, and can serve as a guide for the identification of other SLiMs for which only few representatives are known. AVAILABILITY:LDMF is freely available online at; Source code is available at SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
  • A Multi-Domain Connectome Convolutional Neural Network for Identifying Schizophrenia from EEG Connectivity Patterns

    Phang, Chun-Ren; Noman, Fuad Mohammed; Hussain, Hadri; Ting, Chee-Ming; Ombao, Hernando (IEEE Journal of Biomedical and Health Informatics, Institute of Electrical and Electronics Engineers (IEEE), 2019-09-13) [Article]
    Objective: We exploit altered patterns in brain functional connectivity as features for automatic discriminative analysis of neuropsychiatric patients. Deep learning methods have been introduced to functional network classification only very recently for fMRI, and the proposed architectures essentially focused on a single type of connectivity measure. Methods: We propose a deep convolutional neural network (CNN) framework for classification of electroencephalogram (EEG)-derived brain connectome in schizophrenia (SZ). To capture complementary aspects of disrupted connectivity in SZ, we explore combination of various connectivity features consisting of time and frequency-domain metrics of effective connectivity based on vector autoregressive model and partial directed coherence, and complex network measures of network topology. We design a novel multi-domain connectome CNN (MDC-CNN) based on a parallel ensemble of 1D and 2D CNNs to integrate the features from various domains and dimensions using different fusion strategies. We also consider an extension to dynamic brain connectivity using the recurrent neural networks. Results: Hierarchical latent representations learned by the multiple convolutional layers from EEG connectivity reveal apparent group differences between SZ and healthy controls (HC). Results on a large resting-state EEG dataset show that the proposed CNNs significantly outperform traditional support vector machine classifiers. The MDC-CNN with combined connectivity features further improves performance over single-domain CNNs using individual features, achieving remarkable accuracy of 91.69% with a decision-level fusion. Conclusion: The proposed MDC-CNN by integrating information from diverse brain connectivity descriptors is able to accurately discriminate SZ from HC. Significance: The new framework is potentially useful for developing diagnostic tools for SZ and other disorders.
  • Inference on Long-Range Temporal Correlations in Human EEG Data

    Smith, Rachel J.; Ombao, Hernando; Shrey, Daniel W.; Lopour, Beth A. (IEEE Journal of Biomedical and Health Informatics, Institute of Electrical and Electronics Engineers (IEEE), 2019-08-29) [Article]
    Detrended Fluctuation Analysis (DFA) is a statistical estimation algorithm used to assess long-range temporal dependence in neural time series. The algorithm produces a single number, the DFA exponent, that reflects the strength of long-range temporal correlations in the data. No methods have been developed to generate confidence intervals for the DFA exponent for a single time series segment. Thus, we present a statistical measure of uncertainty for the DFA exponent in electroencephalographic (EEG) data via application of a moving-block bootstrap (MBB). We tested the effect of three data characteristics on the DFA exponent: (1) time series length, (2) the presence of artifacts, and (3) the presence of discontinuities. We found that signal lengths of ~5 minutes produced stable measurements of the DFA exponent and that the presence of artifacts positively biased DFA exponent distributions. In comparison, the impact of discontinuities was small, even those associated with artifact removal. We show that it is possible to combine a moving block bootstrap with DFA to obtain an accurate estimate of the DFA exponent as well as its associated confidence intervals in both simulated data and human EEG data. We applied the proposed method to human EEG data to (1) calculate a time-varying estimate of long-range temporal dependence during a sleep-wake cycle of a healthy infant and (2) compare pre- and post-treatment EEG data within individual subjects with pediatric epilepsy. Our proposed method enables dynamic tracking of the DFA exponent across the entire recording period and permits within-subject comparisons, expanding the utility of the DFA algorithm by providing a measure of certainty and formal tests of statistical significance for the estimation of long-range temporal dependence in neural data.
  • Monitoring distillation column systems using improved nonlinear partial least squares-based strategies

    Madakyaru, Muddu; Harrou, Fouzi; Sun, Ying (IEEE Sensors Journal, Institute of Electrical and Electronics Engineers (IEEE), 2019-08-22) [Article]
    Fault detection in industrial systems plays a core role in improving their safety, productivity and avoiding expensive maintenance. This paper proposed and veried data-driven anomaly detection schemes based on a nonlinear latent variable model and statistical monitoring algorithms. Integrating both the suitable characteristics of partial least squares (PLS) and adaptive neural network fuzzy inference systems (ANFIS) procedure, PLS-ANFIS model is employed to allow for flexible modeling of multivariable nonlinear processes. Furthermore, PLS-ANFIS modeling was connected with k-nearest neighbors (kNN)-based data mining schemes and employed for nonlinear process monitoring. Specifically, residuals generated from the PLS-ANFIS model are used as the input to the kNN-based mechanism to uncover anomalies in the data. Moreover, kNN-based exponentially smoothing with parametric and nonparametric thresholds is adopted to better anomaly detection. The effectiveness of the proposed approach is evaluated using real measurements from an actual bubble cap distillation column.
  • An integrated vision-based approach for efficient human fall detection in a home environment

    Harrou, Fouzi; Zerrouki, Nabil; Sun, Ying; Houacine, Amrane (IEEE Access, Institute of Electrical and Electronics Engineers (IEEE), 2019-08-22) [Article]
    Falls are an important healthcare problem for vulnerable persons like seniors. Response to potential emergencies can be fastened timely detection and classification of falls. This paper addresses the detection of human falls using relevant pixel-based features reflecting variations in body shape. Specifically, the human body is divided into five partitions that correspond to five partial occupancy areas. For each frame, area ratios are calculated and used as input data for fall detection and classification. First, the detection of falls is addressed from a statistical point of view as an anomaly detection problem. Towards this end, an integrated approach merging a detection step with a classification step is proposed for enabling efficient human fall detection in a home environment. In this regard, an effective fall detection approach using generalized likelihood ratio (GLR) scheme is designed. However, a GLR scheme cannot discriminate between true falls and like-fall events, such as lying down. To mitigate this limitation, the support vector machine algorithm has been successfully applied on features of the detected fall to recognize the type of fall. Tests on two publicly available datasets show the effectiveness of the proposed approach to appropriately detecting and identifying falls. Compared with the neural network, k-nearest neighbor, decision tree and naïve Bayes procedures, the two steps approach achieved better detection performance.
  • Parametric variogram matrices incorporating both bounded and unbounded functions

    Chen, Wanfang; Genton, Marc G. (Stochastic Environmental Research and Risk Assessment, Springer Science and Business Media LLC, 2019-07-30) [Article]
    We construct a flexible class of parametric models for both traditional and pseudo variogram matrix (valued functions), where the off-diagonal elements are the traditional cross variograms and pseudo cross variograms, respectively, and the diagonal elements are the direct variograms, based on the method of latent dimensions and the linear model of coregionalization. The entries in the parametric variogram matrix allow for a smooth transition between boundedness and unboundedness by changing the values of parameters, and thus between joint second-order and intrinsically stationary vector random fields, or between multivariate geometric Gaussian processes and multivariate Brown–Resnick processes in spatial extreme analysis.
  • Max-and-Smooth: a two-step approach for approximate Bayesian inference in latent Gaussian models

    Hrafnkelsson, Birgir; Jóhannesson, Árni V.; Siegert, Stefan; Bakka, Haakon; Huser, Raphaël (arXiv, 2019-07-27) [Preprint]
    With modern high-dimensional data, complex statistical models are necessary,requiring computationally feasible inference schemes. We introduceMax-and-Smooth, an approximate Bayesian inference scheme for a flexible classof latent Gaussian models (LGMs) where one or more of the likelihood parametersare modeled by latent additive Gaussian processes. Max-and-Smooth consists oftwo-steps. In the first step (Max), the likelihood function is approximated bya Gaussian density with mean and covariance equal to either (a) the maximumlikelihood estimate and the inverse observed information, respectively, or (b)the mean and covariance of the normalized likelihood function. In the secondstep (Smooth), the latent parameters and hyperparameters are inferred andsmoothed with the approximated likelihood function. The proposed method ensuresthat the uncertainty from the first step is correctly propagated to the secondstep. Since the approximated likelihood function is Gaussian, the approximateposterior density of the latent parameters of the LGM (conditional on thehyperparameters) is also Gaussian, thus facilitating efficient posteriorinference in high dimensions. Furthermore, the approximate marginal posteriordistribution of the hyperparameters is tractable, and as a result, thehyperparameters can be sampled independently of the latent parameters. In thecase of a large number of independent data replicates, sparse precisionmatrices, and high-dimensional latent vectors, the speedup is substantial incomparison to an MCMC scheme that infers the posterior density from the exactlikelihood function. The proposed inference scheme is demonstrated on onespatially referenced real dataset and on simulated data mimicking spatial,temporal, and spatio-temporal inference problems. Our results show thatMax-and-Smooth is accurate and fast.
  • Flexible and Efficient Topological Approaches for a Reliable Robots Swarm Aggregation

    Khaldi, Belkacem; Harrou, Fouzi; Cherif, Foudil; Sun, Ying (IEEE Access, IEEE, 2019-07-23) [Article]
    Aggregation is a vital behavior when performing complex tasks in most of the swarm systems such as swarm robotics systems. In this paper, three new aggregation methods, namely the Distance-Angular, the Distance-Cosine, and the Distance-Minkowski k-nearest neighbor (k-NN) have been introduced. These aggregation methods are mainly built on well-known metrics: the Cosine, Angular and Minkowski distance functions, which are used here to compute distances among robots neighbors. Relying on these methods, each robot identifies its k nearest neighborhood set that will interact with. Then in order to achieve the aggregation, the interactions sensing capabilities among the set members are modeled using a virtual viscoelastic mesh. Analysis of the results obtained from the ARGoS simulator shows a significant improvement in the swarm aggregation performance while compared to the conventional distance-weighted k-NN aggregation method. Also, the aggregation performance of the methods is reported to be robust to partially faulty robots and accurate under noisy sensors.
  • A Spliced Gamma-Generalized Pareto Model for Short-Term Extreme Wind Speed Probabilistic Forecasting

    Castro, Daniela; Huser, Raphaël; Rue, Haavard (Journal of Agricultural, Biological and Environmental Statistics, Springer Nature, 2019-07-23) [Article]
    Renewable sources of energy such as wind power have become a sustainable alternative to fossil fuel-based energy. However, the uncertainty and fluctuation of the wind speed derived from its intermittent nature bring a great threat to the wind power production stability, and to the wind turbines themselves. Lately, much work has been done on developing models to forecast average wind speed values, yet surprisingly little has focused on proposing models to accurately forecast extreme wind speeds, which can damage the turbines. In this work, we develop a flexible spliced Gamma-Generalized Pareto model to forecast extreme and non-extreme wind speeds simultaneously. Our model belongs to the class of latent Gaussian models, for which inference is conveniently performed based on the integrated nested Laplace approximation method. Considering a flexible additive regression structure, we propose two models for the latent linear predictor to capture the spatio-temporal dynamics of wind speeds. Our models are fast to fit and can describe both the bulk and the tail of the wind speed distribution while producing short-term extreme and non-extreme wind speed probabilistic forecasts. Supplementary materials accompanying this paper appear online.
  • A Machine Learning-Based Approach for Land Cover Change Detection Using Remote Sensing and Radiometric Measurements

    Zerrouki, Nabil; Harrou, Fouzi; Sun, Ying; Hocini, Lotfi (IEEE Sensors Journal, Institute of Electrical and Electronics Engineers (IEEE), 2019-07-15) [Article]
    An approach combining the Hotelling $T^{2}$ control method with a weighted random forest classifier is proposed and used in the context of detecting land cover changes via remote sensing and radiometric measurements. Hotelling $T^{2}$ procedure is introduced to identify features corresponding to changed areas. Nevertheless, $T^{2}$ scheme is not able to separate real from false changes. To tackle this limitation, the weighted random forest algorithm, which is an efficient classification technique for imbalanced problems, has been successfully applied to the features of the detected pixels to recognize the type of change. The feasibility of the proposed procedure is verified using SZTAKI AirChange benchmark data. Results proclaim that the proposed detection scheme succeeds to effectively identify land cover changes. Also, the comparisons with other methods (i.e., neural network, random forest, support vector machine, and $k$ -nearest neighbors) highlight the superiority of the proposed method.
  • HLIBCov: Parallel hierarchical matrix approximation of large covariance matrices and likelihoods with applications in parameter identification

    Litvinenko, Alexander; Kriemann, Ronald; Genton, Marc G.; Sun, Ying; Keyes, David E. (MethodsX, Elsevier BV, 2019-07-12) [Article]
    We provide more technical details about the HLIBCov package, which is using parallel hierarchical (H-) matrices to: • approximates large dense inhomogeneous covariance matrices with a log-linear computational cost and storage requirement; •computes matrix-vector product, Cholesky factorization and inverse with a log-linear complexity; •identify unknown parameters of the covariance function (variance, smoothness, and covariance length); These unknown parameters are estimated by maximizing the joint Gaussian log-likelihood function. To demonstrate the numerical performance, we identify three unknown parameters in an example with 2,000,000 locations on a PC-desktop.
  • Approximate Bayesian inference for spatial flood frequency analysis

    Johannesson, Árni V.; Hrafnkelsson, Birgir; Huser, Raphaël; Bakka, Haakon; Siegert, Stefan (arXiv, 2019-07-10) [Preprint]
    Extreme floods cause casualties, widespread property damage, and damage tovital civil infrastructure. Predictions of extreme floods within gauged andungauged catchments is crucial to mitigate these disasters. A Bayesianframework is proposed for predicting extreme floods using the generalizedextreme-value (GEV) distribution. The methodological challenges consist ofchoosing a suitable parametrization for the GEV distribution when multiplecovariates and/or latent spatial effects are involved, balancing modelcomplexity and parsimony using an appropriate model selection procedure, andmaking inference based on a reliable and computationally efficient approach. Wepropose a latent Gaussian model with a novel multivariate link function for thelocation, scale and shape parameters of the GEV distribution. This linkfunction is designed to separate the interpretation of the parameters at thelatent level and to avoid unreasonable estimates of the shape parameter.Structured additive regression models are proposed for the three parameters atthe latent level. Each of these regression models contains fixed linear effectsfor catchment descriptors. Spatial model components are added to the two firstlatent regression models, to model the residual spatial structure unexplainedby the catchment descriptors. To achieve computational efficiency for largedatasets with these richly parametrized models, we exploit a Gaussian-basedapproximation to the posterior density. This approximation relies on site-wiseestimates, but, contrary to typical plug-in approaches, the uncertainty inthese initial estimates is properly propagated through to the final posteriorcomputations. We applied the proposed modeling and inference framework toannual peak river flow data from 554 catchments across the United Kingdom. Theframework performed well in terms of flood predictions for ungauged catchments.
  • Geostatistical modeling to capture seismic-shaking patterns from earthquake-induced landslides

    Lombardo, Luigi; Bakka, Haakon; Tanyas, Hakan; Westen, Cees; Mai, Paul Martin; Huser, Raphaël (Journal of Geophysical Research: Earth Surface, American Geophysical Union (AGU), 2019-07-05) [Article]
    We investigate earthquake-induced landslides using a geostatistical model featuring a latent spatial effect (LSE). The LSE represents the spatially structured residuals in the data, which remain after adjusting for covariate effects. To determine whether the LSE captures the residual signal from a given trigger, we test the LSE in reproducing the pattern of seismic shaking from the distribution of seismically induced landslides, without prior knowledge of the earthquake being included in the model. We assessed the landslide intensity, i.e., the expected number of landslides per mapping unit, for the area in which landslides triggered by the Wenchuan and Lushan earthquakes overlap. We examined this area to test our method on landslide inventories located in near and far fields of the earthquake. We generated three models for both earthquakes: i) seismic parameters only (proxy for the trigger); ii}) the LSE only; and iii) both seismic parameters and the LSE. The three configurations share the same morphometric covariates. This allowed us to study the LSE pattern and assess whether it approximated the seismic effects. Our results show that the LSE reproduced the shaking patterns for both earthquakes. In addition, the models including the LSE perform better than conventional models featuring seismic parameters only. Due to computational limitations we carried out a detailed analysis for a relatively small area (2112 km2), using a dataset with higher spatial resolution. Results were consistent with those of a subsequent analysis for a larger area (14648 km2) using coarser resolution data.
  • Secrecy Analysis in DF Relay over Generalized-K Fading Channels

    Zhao, Hui; Liu, Zhedong; Yang, Liang; Alouini, Mohamed-Slim (IEEE Transactions on Communications, Institute of Electrical and Electronics Engineers (IEEE), 2019-07-03) [Article]
    In this paper, we analyze the secrecy performance of the decode-and-forward (DF) relay system in generalized-K fading channels. In a typical four-node communications model, a source (S) sends confidential information to a destination (D) via a relay (R) using DF strategy in two time slots, while an eavesdropper (E) wants to overhear the information from S to D over generalized-K fading channels. To be more realistic, we assume that E can receive the signals of two time slots, and there is no direct link between S and D because of heavy fading. Based on those assumptions, we derive closed-form expressions for the secrecy outage probability (SOP) and ergodic secrecy capacity (ESC) by using a tight approximate probability density function of the generalized-K model. Then, asymptotic expressions for the SOP and ESC are also derived in the high signal-to-noise ratio region, not only because we can get some insights about SOP and ESC, but also because expressions for SOP and ESC can be simplified significantly. The single relay system is subsequently extended into a multi-relay system, where the asymptotic SOP analysis of three proposed relay selection strategies is investigated. Further, the security-reliability tradeoff analysis in the multi-relay system is also presented given that S adopts a constant code rate. Finally, the Monte-Carlo simulation is used to demonstrate the accuracy of the derived closed-form expressions.
  • Deep learning approach for sustainable WWTP operation: A case study on data-driven influent conditions monitoring

    Dairi, Abdelkader; Cheng, Tuoyuan; Harrou, Fouzi; Sun, Ying; Leiknes, TorOve (Sustainable Cities and Society, Elsevier BV, 2019-06-30) [Article]
    Wastewater treatment plants (WWTPs) are sustainable solutions to water scarcity. As initial conditions offered to WWTPs, influent conditions (ICs) affect treatment units states, ongoing processes mechanisms, and product qualities. Anomalies in ICs, often raised by abnormal events, need to be monitored and detected promptly to improve system resilience and provide smart environments. This paper proposed and verified data-driven anomaly detection approaches based on deep learning methods and clustering algorithms. Combining both the ability to capture temporal auto-correlation features among multivariate time series from recurrent neural networks (RNNs), and the function to delineate complex distributions from restricted Boltzmann machines (RBM), RNN-RBM models were employed and connected with various classifiers for anomaly detection. The effectiveness of RNN based, RBM based, RNN-RBM based, or standalone individual detectors, including expectation maximization clustering, K-means clustering, mean-shift clustering, one-class support vector machine (OCSVM), spectral clustering, and agglomerative clustering algorithms were evaluated by importing seven years ICs data from a coastal municipal WWTP where more than 150 abnormal events occurred. Results demonstrated that RNN-RBM-based OCSVM approach outperformed all other scenarios with an area under the curve value up to 0.98, which validated the superiority in feature extraction by RNN-RBM, and the robustness in multivariate nonlinear kernels by OCSVM. The model was flexible for not requiring assumptions on data distribution, and could be shared and transferred among environmental data scientists.
  • A Hierarchical Spatiotemporal Statistical Model Motivated by Glaciology

    Gopalan, Giri; Hrafnkelsson, Birgir; Wikle, Christopher K.; Rue, Haavard; Aðalgeirsdóttir, Guðfinna; Jarosch, Alexander H.; Pálsson, Finnur (Journal of Agricultural, Biological and Environmental Statistics, Springer Science and Business Media LLC, 2019-06-12) [Article]
    In this paper, we extend and analyze a Bayesian hierarchical spatiotemporal model for physical systems. A novelty is to model the discrepancy between the output of a computer simulator for a physical process and the actual process values with a multivariate random walk. For computational efficiency, linear algebra for bandwidth limited matrices is utilized, and first-order emulator inference allows for the fast emulation of a numerical partial differential equation (PDE) solver. A test scenario from a physical system motivated by glaciology is used to examine the speed and accuracy of the computational methods used, in addition to the viability of modeling assumptions. We conclude by discussing how the model and associated methodology can be applied in other physical contexts besides glaciology.
  • System- and Unit-Level Care Quality Outcome Improvements After Integrating Clinical Nurse Leaders Into Frontline Care Delivery.

    Bender, Miriam; Murphy, Elizabeth A; Cruz, Maricela; Ombao, Hernando (The Journal of nursing administration, Ovid Technologies (Wolters Kluwer Health), 2019-05-29) [Article]
    OBJECTIVE:This study determined whether 1 health system's frontline nursing model redesign to integrate clinical nurse leaders (CNLs) improved care quality and outcome score consistency. METHODS:Interrupted time-series design was used to measure patient satisfaction with 7 metrics before and after formally integrating CNLs into a Michigan healthcare system. Analysis generated estimates of quality outcome: a) change point; b) level change; and c) variance, pre-post implementation. RESULTS:The lowest-performing unit showed significant increases in quality scores, but there were no significant increases at the hospital level. Quality metric consistency increased significantly for every indicator at the hospital and unit level. CONCLUSIONS:To our knowledge, this is the 1st study quantifying quality outcome consistency before and after nursing care delivery redesign with CNLs. The significant improvement suggests the CNL care model is associated with production of stable clinical microsystem practices that help to reduce clinical variability, thus improving care quality.
  • Spatial cluster detection of regression coefficients in a mixed-effects model

    Lee, Junho; Sun, Ying; Chang, Howard H. (Environmetrics, Wiley, 2019-05-22) [Article]
    Identifying spatial clusters of different regression coefficients is a useful tool for discerning the distinctive relationship between a response and covariates in space. Most of the existing cluster detection methods aim to identify the spatial similarity in responses, and the standard cluster detection algorithm assumes independent spatial units. However, the response variables are spatially correlated in many environmental applications. We propose a mixed-effects model for spatial cluster detection that takes spatial correlation into account. Compared to a fixed-effects model, the introduced random effects explain extra variability among the spatial responses beyond the cluster effect, thus reducing the false positive rate. The developed method exploits a sequential searching scheme and is able to identify multiple potentially overlapping clusters. We use simulation studies to evaluate the performance of our proposed method in terms of the true and false positive rates of a known cluster and the identification of multiple known clusters. We apply our proposed methodology to particulate matter (PM2.5) concentration data from the Northeastern United States in order to study the weather effect on PM2.5 and to investigate the association between the simulations from a numerical model and the satellite-derived aerosol optical depth data. We find geographical hot spots that show distinct features, comparing to the background.
  • Asymmetric tail dependence modeling, with application to cryptocurrency market data

    Gong, Yan; Huser, Raphaël (arXiv, 2019-05-13) [Preprint]
    Since the inception of Bitcoin in 2008, cryptocurrencies have played anincreasing role in the world of e-commerce, but the recent turbulence in thecryptocurrency market in 2018 has raised some concerns about their stabilityand associated risks. For investors, it is crucial to uncover the dependencerelationships between cryptocurrencies for a more resilient portfoliodiversification. Moreover, the stochastic behavior in both tails is important,as long positions are sensitive to a decrease in prices (lower tail), whileshort positions are sensitive to an increase in prices (upper tail). In orderto assess both risk types, we develop in this paper a flexible copula modelwhich is able to distinctively capture asymptotic dependence or independence inits lower and upper tails. Our proposed model is parsimonious and smoothlybridges (in each tail) both extremal dependence classes in the interior of theparameter space. Inference is performed using a full or censored likelihoodapproach, and we investigate by simulation the estimators' efficiency underthree different censoring schemes which reduce the impact of non-extremeobservations. We also develop a local likelihood approach to capture thetemporal dynamics of extremal dependence among two leading cryptocurrencies. Wehere apply our model to historical closing prices of Bitcoin and Ethereum,which share most of the cryptocurrency market capitalizations. The results showthat our proposed copula model outperforms alternative copula models and thatthe lower tail dependence level between Bitcoin and Ethereum has becomestronger over time, smoothly transitioning from an asymptotic independenceregime to an asymptotic dependence regime in recent years, whilst the uppertail has been more stable at a moderate dependence level.

View more