Recent Submissions

  • Tractable bayes of skew-elliptical link models for correlated binary data

    Zhang, Zhongwei; Arellano-Valle, Reinaldo B; Genton, Marc G.; Huser, Raphaël (Biometrics, Wiley, 2022-08-11) [Article]
    Correlated binary response data with covariates are ubiquitous in longitudinal or spatial studies. Among the existing statistical models the most well-known one for this type of data is the multivariate probit model, which uses a Gaussian link to model dependence at the latent level. However, a symmetric link may not be appropriate if the data are highly imbalanced. Here, we propose a multivariate skew-elliptical link model for correlated binary responses, which includes the multivariate probit model as a special case. Furthermore, we perform Bayesian inference for this new model and prove that the regression coefficients have a closed-form unified skew-elliptical posterior with an elliptical prior. The new methodology is illustrated by an application to COVID-19 data from three different counties of the state of California, USA. By jointly modeling extreme spikes in weekly new cases, our results show that the spatial dependence cannot be neglected. Furthermore, the results also show that the skewed latent structure of our proposed model improves the flexibility of the multivariate probit model and provides a better fit to our highly imbalanced dataset.
  • Nonseparable Space-Time Stationary Covariance Functions on Networks cross Time

    Porcu, Emilio; White, Philip A.; Genton, Marc G. (arXiv, 2022-08-09) [Preprint]
    The advent of data science has provided an increasing number of challenges with high data complexity. This paper addresses the challenge of space-time data where the spatial domain is not a planar surface, a sphere, or a linear network, but a generalized network (termed a graph with Euclidean edges). Additionally, data are repeatedly measured over different temporal instants. We provide new classes of nonseparable space-time stationary covariance functions where {\em space} can be a generalized network, a Euclidean tree, or a linear network, and where time can be linear or circular (seasonal). Because the construction principles are technical, we focus on illustrations that guide the reader through the construction of statistically interpretable examples. A simulation study demonstrates that we can recover the correct model when compared to misspecified models. In addition, our simulation studies show that we effectively recover simulation parameters. In our data analysis, we consider a traffic accident dataset that shows improved model performance based on covariance specifications and network-based metrics.
  • Maximum Principle Preserving Space and Time Flux Limiting for Diagonally Implicit Runge–Kutta Discretizations of Scalar Convection-diffusion Equations

    Quezada de Luna, Manuel; Ketcheson, David I. (Journal of Scientific Computing, Springer Science and Business Media LLC, 2022-08-01) [Article]
    We provide a framework for high-order discretizations of nonlinear scalar convection-diffusion equations that satisfy a discrete maximum principle. The resulting schemes can have arbitrarily high order accuracy in time and space, and can be stable and maximum-principle-preserving (MPP) with no step size restriction. The schemes are based on a two-tiered limiting strategy, starting with a high-order limiter-based method that may have small oscillations or maximum-principle violations, followed by an additional limiting step that removes these violations while preserving high order accuracy. The desirable properties of the resulting schemes are demonstrated through several numerical examples.
  • High-Performance Spatial Data Compression for Scientific Applications

    Kriemann, Ronald; Ltaief, Hatem; Luong, Minh Bau; Hernandez Perez, Francisco; Im, Hong G.; Keyes, David E. (Springer International Publishing, 2022-08-01) [Book Chapter]
    We implement an efficient data compression algorithm that reduces the memory footprint of spatial datasets generated during scientific simulations. Storing regularly these datasets is typically needed for checkpoint/restart or for post-processing purposes. Our lossy compression approach, codenamed HLRcompress (https://gitlab.mis.mpg.de/rok/HLRcompress), combines a hierarchical low-rank approximation technique with binary compression. This novel hybrid method is agnostic to the particular domain of application. We study the impact of HLRcompress on accuracy using synthetic datasets to demonstrate the software capabilities, including robustness and versatility. We assess different algebraic compression methods and report performance results on various parallel architectures. We then integrate it into a workflow of a direct numerical simulation solver for turbulent combustion on distributed-memory systems. We compress the generated snapshots during time integration using accuracy thresholds for each individual chemical species, without degrading the practical accuracy of the overall pressure and temperature. We eventually compare against state-of-the-art compression software. Our implementation achieves on average greater than 100-fold compression of the original size of the datasets.
  • Multivariate Functional Outlier Detection using the FastMUOD Indices

    Ojo, Oluwasegun Taiwo; Anta, Antonio Fernández; Genton, Marc G.; Lillo, Rosa E. (arXiv, 2022-07-26) [Preprint]
    We present definitions and properties of the fast massive unsupervised outlier detection (FastMUOD) indices, used for outlier detection (OD) in functional data. FastMUOD detects outliers by computing, for each curve, an amplitude, magnitude and shape index meant to target the corresponding types of outliers. Some methods adapting FastMUOD to outlier detection in multivariate functional data are then proposed. These include applying FastMUOD on the components of the multivariate data and using random projections. Moreover, these techniques are tested on various simulated and real multivariate functional datasets. Compared with the state of the art in multivariate functional OD, the use of random projections showed the most effective results with similar, and in some cases improved, OD performance.
  • Large-Scale Low-Rank Gaussian Process Prediction with Support Points

    Song, Yan; Dai, Wenlin; Genton, Marc G. (arXiv, 2022-07-26) [Preprint]
    Low-rank approximation is a popular strategy to tackle the "big n problem" associated with large-scale Gaussian process regressions. Basis functions for developing low-rank structures are crucial and should be carefully specified. Predictive processes simplify the problem by inducing basis functions with a covariance function and a set of knots. The existing literature suggests certain practical implementations of knot selection and covariance estimation; however, theoretical foundations explaining the influence of these two factors on predictive processes are lacking. In this paper, the asymptotic prediction performance of the predictive process and Gaussian process predictions is derived and the impacts of the selected knots and estimated covariance are studied. We suggest the use of support points as knots, which best represent data locations. Extensive simulation studies demonstrate the superiority of support points and verify our theoretical results. Real data of precipitation and ozone are used as examples, and the efficiency of our method over other widely used low-rank approximation methods is verified.
  • Phase equilibrium in the hydrogen energy chain

    Zhang, Tao; Zhang, Yanhui; Katterbauer, Klemens; Al Shehri, Abdallah; Sun, Shuyu; Hoteit, Ibrahim (Fuel, Elsevier BV, 2022-07-22) [Article]
    In this paper, a thorough review of the current state of the hydrogen phase equilibrium approaches is presented. Potential applications of phase equilibrium calculations for the accurate simulation of the entire process are then identified. Based on the first and second laws of thermodynamics, an advanced constant (N), volume (V), and temperature (T) (NVT) flash calculation scheme is developed for fluid mixtures containing hydrogen, which can be used to calculate the phase equilibrium for various feed compositions. We produce reasonable predictions of the phase transition under various environmental conditions for a number of engineering scenarios during the hydrogen production and storage processes, thus demonstrating the effectiveness and robustness of the proposed phase equilibrium calculation scheme.
  • Parallel space-time likelihood optimization for air pollution prediction on large-scale systems

    Salvaña, Mary Lai O.; Abdulah, Sameh; Ltaief, Hatem; Sun, Ying; Genton, Marc G.; Keyes, David E. (ACM, 2022-07-12) [Conference Paper]
    Gaussian geostatistical space-time modeling is an effective tool for performing statistical inference of field data evolving in space and time, generalizing spatial modeling alone at the cost of the greater complexity of operations and storage, and pushing geostatistical modeling even further into the arms of high-performance computing. It makes inferences for missing data by leveraging space-time measurements of one or more fields. We propose a high-performance implementation of a widely applied space-time model for large-scale systems using a two-level parallelization technique. At the inner level, we rely on state-of-the-art dense linear algebra libraries and parallel runtime systems to perform complex matrix operations required to evaluate the maximum likelihood estimation (MLE). At the outer level, we parallelize the optimization process using a distributed implementation of the particle swarm optimization (PSO) algorithm. At this level, parallelization is accomplished using MPI sub-communicators, such that the nodes in each sub-communicator perform a single MLE iteration at a time. To evaluate the effectiveness of the proposed methodology, we assess the accuracy of the newly implemented space-time model on a set of large-scale synthetic space-time datasets. Moreover, we use the proposed implementation to model two air pollution datasets from the Middle East and US regions with 550 spatial locations X730 time slots and 945 spatial locations X500 time slots, respectively. The evaluation shows that the proposed approach satisfies high prediction accuracy on both synthetic datasets and real particulate matter (PM) datasets in the context of the air pollution problem. We achieve up to 757.16 TFLOPS/s using 1024 nodes (75% of the peak performance) using 490K geospatial locations on Shaheen-II Cray XC40 system.
  • Are You All Normal? It Depends!

    Chen, Wanfang; Genton, Marc G. (International Statistical Review, Wiley, 2022-07-07) [Article]
    The assumption of normality has underlain much of the development of statistics, including spatial statistics, and many tests have been proposed. In this work, we focus on the multivariate setting and first review the recent advances in multivariate normality tests for i.i.d. data, with emphasis on the skewness and kurtosis approaches. We show through simulation studies that some of these tests cannot be used directly for testing normality of spatial data. We further review briefly the few existing univariate tests under dependence (time or space), and then propose a new multivariate normality test for spatial data by accounting for the spatial dependence. The new test utilises the union-intersection principle to decompose the null hypothesis into intersections of univariate normality hypotheses for projection data, and it rejects the multivariate normality if any individual hypothesis is rejected. The individual hypotheses for univariate normality are conducted using a Jarque–Bera type test statistic that accounts for the spatial dependence in the data. We also show in simulation studies that the new test has a good control of the type I error and a high empirical power, especially for large sample sizes. We further illustrate our test on bivariate wind data over the Arabian Peninsula.
  • Physical forcing of phytoplankton dynamics in the Al-Wajh lagoon (Red Sea)

    Zhan, Peng; Krokos, Georgios; Gittings, John; Raitsos, Dionysios E.; Guo, Daquan; Papagiannopoulos, Nikolaos; Hoteit, Ibrahim (Limnology and Oceanography Letters, Wiley, 2022-07-05) [Article]
    Coastal lagoons are biodiversity hotspots that support neighboring ecosystems and various services. They can exhibit distinct biophysical characteristics compared to the adjacent open sea and act paradoxically as autonomous ecosystems. Using remotely sensed observations and state-of-the-art numerical simulations, the role of water column hydrodynamics in shaping the seasonal succession of phytoplankton biomass was investigated for a non-estuarine coastal lagoon situated in the northeastern Red Sea. Observations reveal that seasonal phytoplankton blooms inside the lagoon occur during a distinctively different period compared to the adjacent open sea. We provide evidence that this striking difference is due to the contrasting hydrodynamic conditions between inside and outside the lagoon, through their effects on stratification that regulate nutrient availability and hence favorable conditions to sustain rapid phytoplankton growth. The proposed mechanism may offer new insights into understanding the biophysical dynamics of non-estuarine coastal lagoons in other tropical regions of the global oceans.
  • Mangrove distribution and afforestation potential in the Red Sea

    Blanco Sacristan, Javier; Johansen, Kasper; Duarte, Carlos M.; Daffonchio, Daniele; Hoteit, Ibrahim; McCabe, Matthew (Science of The Total Environment, Elsevier BV, 2022-06-30) [Article]
    Mangrove ecosystems represent one of the most effective natural environments for fixing and storing carbon (C). Mangroves also offer significant co-benefits, serving as nurseries for marine species, providing nutrients and food to support marine ecosystems, and stabilizing coastlines from erosion and extreme events. Given these considerations, mangrove afforestation and associated C sequestration has gained considerable attention as a nature-based solution to climate adaptation (e.g., protect against more frequent storm surges) and mitigation (e.g. offsetting other C-producing activities). To advance our understanding and description of these important ecosystems, we leverage Landsat-8 and Sentinel-2 satellite data to provide a current assessment of mangrove extent within the Red Sea region and also explore the effect of spatial resolution on mapping accuracy. We establish that Sentinel-2 provides a more precise spatial record of extent and subsequently use these data together with a maximum entropy (MaxEnt) modeling approach to: i) map the distribution of Red Sea mangrove systems, and ii) identify potential areas for future afforestation. From these current and potential mangrove distribution maps, we then estimate the carbon sequestration rate for the Red Sea (as well as for each bordering country) using a meta-analysis of sequestration values surveyed from the available literature. For the mangrove classification, we obtained mapping accuracies of 98 %, with a total Red Sea mangrove extent estimated at approximately 175 km2. Based on the MaxEnt approach, which used soil physical and environmental variables to identify the key factors limiting mangrove growth and distribution, an area of nearly 410 km2 was identified for potential mangrove afforestation expansion. The factors constraining the potential distribution of mangroves were related to soil physical properties, likely reflecting the low sediment load and limited nutrient input of the Red Sea. The current rate of carbon sequestration was calculated as 1034.09 ± 180.53 Mg C yr-1, and the potential sequestration rate as 2424.49 ± 423.26 Mg C yr-1. While our results confirm the maintenance of a positive trend in mangrove growth over the last few decades, they also provide the upper bounds on above ground carbon sequestration potential for the Red Sea mangroves.
  • Multi-task learning for low-frequency extrapolation and elastic model building from seismic data

    Ovcharenko, Oleg; Kazei, Vladimir; Alkhalifah, Tariq Ali; Peter, Daniel (IEEE Transactions on Geoscience and Remote Sensing, Institute of Electrical and Electronics Engineers (IEEE), 2022-06-23) [Article]
    Low-frequency signal content in seismic data as well as a realistic initial model are key ingredients for robust and efficient full-waveform inversions. However, acquiring low-frequency data is challenging in practice for active seismic surveys. Data-driven solutions show promise to extrapolate low-frequency data given a high-frequency counterpart. While being established for synthetic acoustic examples, the application of bandwidth extrapolation to field datasets remains non-trivial. Rather than aiming to reach superior accuracy in bandwidth extrapolation, we propose to jointly reconstruct low-frequency data and a smooth background subsurface model within a multi-task deep learning framework. We automatically balance data, model and trace-wise correlation loss terms in the objective functional and show that this approach improves the extrapolation capability of the network. We also design a pipeline for generating synthetic data suitable for field data applications. Finally, we apply the same trained network to synthetic and real marine streamer datasets and run an elastic full-waveform inversion from the extrapolated dataset.
  • A class of high-order weighted compact central schemes for solving hyperbolic conservation laws

    Shen, Hua; Al Jahdali, Rasha; Parsani, Matteo (Journal of Computational Physics, Elsevier BV, 2022-06-23) [Article]
    We propose a class of weighted compact central schemes for solving hyperbolic conservation laws. The linear version can be considered as a high-order extension of the central Lax–Friedrichs scheme and the central conservation element and solution element scheme. On every cell, the solution is approximated by a Pth-order polynomial of which all the DOFs are stored and updated separately. The cell average is updated by a classical finite volume scheme which is constructed based on space-time staggered meshes such that the fluxes are continuous across the interfaces of the adjacent control volumes and, therefore, the local Riemann problem is bypassed. The kth-order spatial derivatives are updated by a central difference of the (k-1)th-order spatial derivatives at cell vertices. All the space-time information is calculated by the Cauchy–Kovalewski procedure. By doing so, the schemes are able to achieve arbitrarily uniform space-time high-order on a compact stencil consisting of only neighboring cells with only one explicit time step. In order to capture discontinuities without spurious oscillations, a weighted essentially non-oscillatory type limiter is tailor-made for the schemes. The limiter preserves the compactness and high-order accuracy of the schemes. The schemes' accuracy, robustness, and efficiency are verified by several numerical examples of scalar conservation laws and the compressible Euler equations.
  • Curing Effect on Durability of Cement Mortar with GGBS: Experimental and Numerical Study.

    Ghostine, Rabih; Bur, Nicolas; Feugeas, Françoise; Hoteit, Ibrahim (Materials (Basel, Switzerland), MDPI AG, 2022-06-21) [Article]
    In this paper, supplementary cementitious materials are used as a substitute for cement to decrease carbon dioxide emissions. A by-product of the iron manufacturing industry, ground granulated blast-furnace slag (GGBS), known to improve some performance characteristics of concrete, is used as an effective cement replacement to manufacture mortar samples. Here, the influence of curing conditions on the durability of samples including various amounts of GGBS is investigated experimentally and numerically. Twelve high-strength Portland cement CEM I 52.5 N samples were prepared, in which 0%, 45%, 60%, and 80% of cement were substituted by GGBS. In addition, three curing conditions (standard, dry, and cold curing) were applied to the samples. Durability aspects were studied through porosity, permeability, and water absorption. Experimental results indicate that samples cured in standard conditions gave the best performance in comparison to other curing conditions. Furthermore, samples incorporating 45% of GGBS have superior durability properties. Permeability and water absorption were improved by 17% and 18%, respectively, compared to the reference sample. Thereafter, data from capillary suction experiments were used to numerically determine the hydraulic properties based on a Bayesian inversion approach, namely the Markov Chain Monte Carlo method. Finally, the developed numerical model accurately estimates the hydraulic characteristics of mortar samples and greatly matches the measured water inflow over time through the samples.
  • A Nonlinear Elimination Preconditioned Inexact Newton Algorithm

    Liu, Lulu; Hwang, Feng-Nan; Luo, Li; Cai, Xiao-Chuan; Keyes, David E. (SIAM Journal on Scientific Computing, Society for Industrial & Applied Mathematics (SIAM), 2022-06-21) [Article]
    A nonlinear elimination preconditioned inexact Newton (NEPIN) algorithm is proposed for problems with localized strong nonlinearities. Due to unbalanced nonlinearities ("nonlinear stiffness''), the traditional inexact Newton method often exhibits a long plateau in the norm of the nonlinear residual or even fails to converge. NEPIN implicitly removes the components causing trouble for the global convergence through a correction based on nonlinear elimination within a subspace that provides a modified direction for the global Newton iteration. Numerical experiments show that NEPIN can be more robust than global inexact Newton algorithms and maintain fast convergence even for challenging problems, such as full potential transonic flows. NEPIN complements several previously studied nonlinear preconditioners with which it compares favorably experimentally on a classic shocked duct flow problem considered herein. NEPIN is shown to be fairly insensitive to mesh resolution and “bad” subproblem identification based on the local Mach number or the local nonlinear residual for transonic flow over a wing.
  • Streaming Overlay Architecture for Lightweight LSTM Computation on FPGA SoCs

    Ioannou, Lenos; Fahmy, Suhaib A. (ACM Transactions on Reconfigurable Technology and Systems, Association for Computing Machinery (ACM), 2022-06-21) [Article]
    Long-Short Term Memory (LSTM) networks, and Recurrent Neural Networks (RNNs) in general, have demonstrated their suitability in many time series data applications, especially in Natural Language Processing (NLP). Computationally, LSTMs introduce dependencies on previous outputs in each layer that complicate their computation and the design of custom computing architectures, compared to traditional feed-forward networks. Most neural network acceleration work has focused on optimising the core matrix-vector operations on highly capable FPGAs in server environments. Research that considers the embedded domain has often been unsuitable for streaming inference, relying heavily on batch processing to achieve high throughput. Moreover, many existing accelerator architectures have not focused on fully exploiting the underlying FPGA architecture, resulting in designs that achieve lower operating frequencies than the theoretical maximum. This paper presents a flexible overlay architecture for LSTMs on FPGA SoCs that is built around a streaming dataflow arrangement, uses DSP block capabilities directly, and is tailored to keep parameters within the architecture while moving input data serially to mitigate external memory access overheads. The architecture is designed as an overlay that can be configured to implement alternative models or update model parameters at runtime. It achieves higher operating frequency and demonstrates higher performance than other lightweight LSTM accelerators, as demonstrated in an FPGA SoC implementation.
  • Reduction of wind-turbine-generated seismic noise with structural measures

    Abreu, Rafael; Peter, Daniel; Thomas, Christine (Wind Energy Science, Copernicus GmbH, 2022-06-20) [Article]
    Reducing wind turbine noise recorded at seismological stations promises to lower the conflict between renewable energy producers and seismologists. Seismic noise generated by the movement of wind turbines has been shown to travel large distances, affecting seismological stations used for seismic monitoring and/or the detection of seismic events. In this study, we use advanced 3D numerical techniques to study the possibility of using structural changes in the ground on the wave path between the wind turbine and the seismic station in order to reduce or mitigate the noise generated by the wind turbine. Testing a range of structural changes around the foundation of the wind turbine, such as open and filled cavities, we show that we are able to considerably reduce the seismic noise recorded by placing empty circular trenches approx. 10 m away from the wind turbines. We show the expected effects of filling the trenches with water. In addition, we study how relatively simple topographic elevations influence the propagation of the seismic energy generated by wind turbines and find that topography does help to reduce wind-turbine-induced seismic noise.
  • Simulated co-optimization of renewable energy and desalination systems in Neom, Saudi Arabia

    Riera, Jefferson A.; Lima, Ricardo; Hoteit, Ibrahim; Knio, Omar (Nature Communications, Springer Science and Business Media LLC, 2022-06-18) [Article]
    The interdependence between the water and power sectors is a growing concern as the need for desalination increases globally. Therefore, co-optimizing interdependent systems is necessary to understand the impact of one sector on another. We propose a framework to identify the optimal investment mix for a co-optimized water-power system and apply it to Neom, Saudi Arabia. Our results show that investment strategies that consider the co-optimization of both systems result in total cost savings for the power sector compared to independent approaches. Analysis results suggest that systems with higher shares of non-dispatchable renewables experience the most significant cost reductions.
  • Stochastic Multi-Dimensional Deconvolution

    Ravasi, Matteo; Selvan, Tamin; Luiken, Nick (IEEE Transactions on Geoscience and Remote Sensing, Institute of Electrical and Electronics Engineers (IEEE), 2022-06-01) [Article]
    Geophysical measurements such as seismic datasets contain valuable information that originate from areas of interest in the subsurface; these seismic reflections are however inevitably contaminated by other events created by waves reverberating in the overburden. Multi-Dimensional Deconvolution (MDD) is a powerful technique used at various stages of the seismic processing sequence to create ideal datasets deprived of such overburden effects. Whilst the underlying forward problem holds for a single source, a successful inversion of the MDD equations requires availability of a large number of sources alongside prior information, possibly introduced in the form of physical constraints (e.g., reciprocity and causality). In this work, we present a novel formulation of time-domain MDD based on a finite-sum functional. The associated inverse problem is then solved by means of stochastic gradient descent algorithms, where the gradients at each iteration are computed using a small subset of randomly selected sources. Through synthetic and field data examples, we show that the proposed method converges more stably than the conventional approach based on full gradients. Stochastic MDD represents a novel, efficient, and robust strategy to deconvolve seismic wavefields in a multi-dimensional fashion.
  • High throughput multidimensional tridiagonal system solvers on FPGAs

    Kamalakkannan, Kamalavasan; Mudalige, Gihan R.; Reguly, Istvan Z.; Fahmy, Suhaib A. (ACM, 2022-06) [Conference Paper]
    We present a high performance tridiagonal solver library for Xilinx FPGAs optimized for multiple multi-dimensional systems common in real-world applications. An analytical performance model is developed and used to explore the design space and obtain rapid performance estimates that are over 85% accurate. This library achieves an order of magnitude better performance when solving large batches of systems than previous FPGA work. A detailed comparison with a current state-of-the-art GPU library for multi-dimensional tridiagonal systems on an Nvidia V100 GPU shows the FPGA achieving competitive or better runtime and significant energy savings of over 30%. Through this design, we learn lessons about the types of applications where FPGAs can challenge the current dominance of GPUs.

View more