Recent Submissions

  • H2Opus: A distributed-memory multi-GPU software package for non-local operators

    Zampini, Stefano; Boukaram, Wagih Halim; Turkiyyah, George; Knio, Omar; Keyes, David E. (arXiv, 2021-09-12) [Preprint]
    Hierarchical $\mathcal{H}^2$-matrices are asymptotically optimal representations for the discretizations of non-local operators such as those arising in integral equations or from kernel functions. Their $O(N)$ complexity in both memory and operator application makes them particularly suited for large-scale problems. As a result, there is a need for software that provides support for distributed operations on these matrices to allow large-scale problems to be represented. In this paper, we present high-performance, distributed-memory GPU-accelerated algorithms and implementations for matrix-vector multiplication and matrix recompression of hierarchical matrices in the $\mathcal{H}^2$ format. The algorithms are a new module of H2Opus, a performance-oriented package that supports a broad variety of $\mathcal{H}^2$-matrix operations on CPUs and GPUs. Performance in the distributed GPU setting is achieved by marshaling the tree data of the hierarchical matrix representation to allow batched kernels to be executed on the individual GPUs. MPI is used for inter-process communication. We optimize the communication data volume and hide much of the communication cost with local compute phases of the algorithms. Results show near-ideal scalability up to 1024 NVIDIA V100 GPUs on Summit, with performance exceeding 2.3 Tflop/s/GPU for the matrix-vector multiplication, and 670 Gflops/s/GPU for matrix compression, which involves batched QR and SVD operations. We illustrate the flexibility and efficiency of the library by solving a 2D variable diffusivity integral fractional diffusion problem with an algebraic multigrid-preconditioned Krylov solver and demonstrate scalability up to 16M degrees of freedom problems on 64 GPUs.
  • Cycle-skipping mitigation using misfit measurements based on differentiable dynamic time warping

    chen, fuqiang; Peter, Daniel; Ravasi, Matteo (arXiv, 2021-09-09) [Preprint]
    The dynamic time warping (DTW) distance has been used as a misfit function for wave-equation inversion to mitigate the local minima issue. However, the original DTW distance is not smooth; therefore it can yield a strong discontinuity in the adjoint source. Such a weakness does not help nonlinear inverse problems converge to a plausible minimum by any means. We therefore introduce for the first time in geophysics the smooth DTW distance, which has demonstrated its performance in time series classification, clustering, and prediction as the loss function. The fundamental idea of such a distance is to replace the $\min$ operator with its smooth relaxation. Then it becomes possible to define the analytic derivative of DTW distance. The new misfit function is entitled to the differentiable DTW distance. Moreover, considering that the warping path is an indicator of the traveltime difference between the observed and synthetic trace, a penalization term is constructed based on the warping path such that the misfit accumulation by the penalized differentiable DTW distance is weighted in favor of the traveltime difference. Numerical examples demonstrate the advantage of the penalized differentiable DTW misfit function over the conventional non-differentiable one.
  • Dual-band generative learning for low-frequency extrapolation in seismic land data

    Ovcharenko, Oleg; Kazei, Vladimir; Peter, Daniel; Silvestrov, Ilya; Bakulin, Andrey; Alkhalifah, Tariq Ali (Society of Exploration Geophysicists, 2021-09-01) [Conference Paper]
    The presence of low-frequency energy in seismic data can help mitigate cycle-skipping problems in full-waveform inversion. Unfortunately, the generation and recording of low-frequency signals in seismic exploration remains a non-trivial task. Extrapolation of missing low-frequency content in field data might be addressed in a data-driven framework. In particular, deep learning models trained on synthetic data could be used for inference on the field data. Such an implementation of switching application domains remains challenging. We, therefore, propose the concept of generative dual-band learning to facilitate the knowledge transfer between synthetic and field seismic data applications of low-frequency data extrapolation. We first explain the two-step procedure for training a generative adversarial network (GAN) that extrapolates low frequencies. Then, we describe the workflow for synthetic dataset generation. Finally, we explore the feasibility of the dual-band learning concept on real near-surface land data acquired in Saudi Arabia. The presence of low-frequency energy in seismic data can help mitigate cycle-skipping problems in full-waveform inversion. Unfortunately, the generation and recording of low-frequency signals in seismic exploration remains a non-trivial task. Extrapolation of missing low-frequency content in field data might be addressed in a data-driven framework. In particular, deep learning models trained on synthetic data could be used for inference on the field data. Such an implementation of switching application domains remains challenging. We, therefore, propose the concept of generative dual-band learning to facilitate the knowledge transfer between synthetic and field seismic data applications of low-frequency data extrapolation. We first explain the two-step procedure for training a generative adversarial network (GAN) that extrapolates low frequencies. Then, we describe the workflow for synthetic dataset generation. Finally, we explore the feasibility of the dual-band learning concept on real near-surface land data acquired in Saudi Arabia.
  • An O(N) algorithm for computing expectation of N-dimensional truncated multi-variate normal distribution I: fundamentals

    Huang, Jingfang; Cao, Jian; Fang, Fuhui; Genton, Marc G.; Keyes, David E.; Turkiyyah, George (Advances in Computational Mathematics, Springer Science and Business Media LLC, 2021-09-01) [Article]
    In this paper, we present the fundamentals of a hierarchical algorithm for computing the N-dimensional integral ϕ(a,b;A)=∫abH(x)f(x|A)dx representing the expectation of a function H(X) where f(x|A) is the truncated multi-variate normal (TMVN) distribution with zero mean, x is the vector of integration variables for the N-dimensional random vector X, A is the inverse of the covariance matrix Σ, and a and b are constant vectors. The algorithm assumes that H(x) is “low-rank” and is designed for properly clustered X so that the matrix A has “low-rank” blocks and “low-dimensional” features. We demonstrate the divide-and-conquer idea when A is a symmetric positive definite tridiagonal matrix and present the necessary building blocks and rigorous potential theory–based algorithm analysis when A is given by the exponential covariance model. The algorithm overall complexity is O(N) for N-dimensional problems, with a prefactor determined by the rank of the off-diagonal matrix blocks and number of effective variables. Very high accuracy results for N as large as 2048 are obtained on a desktop computer with 16G memory using the fast Fourier transform (FFT) and non-uniform FFT to validate the analysis. The current paper focuses on the ideas using the simple yet representative examples where the off-diagonal matrix blocks are rank 1 and the number of effective variables is bounded by 2, to allow concise notations and easier explanation. In a subsequent paper, we discuss the generalization of current scheme using the sparse grid technique for higher rank problems and demonstrate how all the moments of kth order or less (a total of O(Nk) integrals) can be computed using O(Nk) operations for k ≥ 2 and O(NlogN) operations for k = 1.
  • Misfit functions based on differentiable dynamic time warping for waveform inversion

    chen, fuqiang; Peter, Daniel; Ravasi, Matteo (Society of Exploration Geophysicists, 2021-09-01) [Conference Paper]
    Misfit functions based on differentiable dynamic time warping (DTW) have demonstrated an excellent performance in various sequence-related tasks. We introduce this concept in the context of waveform inversion and discuss a fast algorithm to calculate the first and second derivatives of such misfits. The DTW distance is originally calculated by solving a series of min sorting problems, thus it is not differentiable. The fundamental idea of differentiable DTW is replacing the min operator with its smooth relaxation. It is then straightforward to solve the derivatives following the differentiation rules, which results in both the differentiable misfit measurements and warping path. This path weights the traveltime mismatch more than its amplitude counterpart. We can construct a penalization term based on this warping path. The penalized misfit function is adaptable to measure traveltime and amplitude separately. Numerical examples show the flexibility and the performance of the proposed misfit function in mitigating local minima issues and retrieving the long-wavelength model features
  • Multicycle Simulation of Strike-Slip Earthquake Rupture for Use in Near-Source Ground-Motion Simulations

    Galvez, Percy; Petukhin, Anatoly; Somerville, Paul; Ampuero, Jean-Paul; Miyakoshi, Ken; Peter, Daniel; Irikura, Kojiro (Bulletin of the Seismological Society of America, Seismological Society of America (SSA), 2021-08-31) [Article]
    ABSTRACT Realistic dynamic rupture modeling validated by observed earthquakes is necessary for estimating parameters that are poorly resolved by seismic source inversion, such as stress drop, rupture velocity, and slip rate function. Source inversions using forward dynamic modeling are increasingly used to obtain earthquake rupture models. In this study, to generate a large number of physically self-consistent rupture models, rupture process of which is consistent with the spatiotemporal heterogeneity of stress produced by previous earthquakes on the same fault, we use multicycle simulations under the rate and state (RS) friction law. We adopt a one-way coupling from multicycle simulations to dynamic rupture simulations; the quasidynamic solver QDYN is used to nucleate the seismic events and the spectral element dynamic solver SPECFEM3D to resolve their rupture process. To simulate realistic seismicity, with a wide range of magnitudes and irregular recurrence, several realizations of 2D-correlated heterogeneous random distributions of characteristic weakening distance (Dc) in RS friction are tested. Other important parameters are the normal stress, which controls the stress drop and rupture velocity during an earthquake, and the maximum value of Dc, which controls rupture velocity but not stress drop. We perform a parametric study on a vertical planar fault and generate a set of a hundred spontaneous rupture models in a wide magnitude range (Mw 5.5–7.4). We validate the rupture models by comparison of source scaling, ground motion (GM), and surface slip properties to observations. We compare the source-scaling relations between rupture area, average slip, and seismic moment of the modeled events with empirical ones derived from source inversions. Near-fault GMs are computed from the source models. Their peak ground velocities and peak ground accelerations agree well with the ground-motion prediction equation values. We also obtain good agreement of the surface fault displacements with observed values.
  • Space-Fractional Diffusion with Variable Order and Diffusivity: Discretization and Direct Solution Strategies

    Alzahrani, Hasnaa H.; Turkiyyah, George; Knio, Omar; Keyes, David E. (arXiv, 2021-08-29) [Preprint]
    We consider the multidimensional space-fractional diffusion equations with spatially varying diffusivity and fractional order. Significant computational challenges are encountered when solving these equations due both to the kernel singularity in the fractional integral operator and to the resulting dense discretized operators, which quickly become prohibitively expensive to handle because of their memory and arithmetic complexities. In this work, we present a singularity-aware discretization scheme that regularizes the singular integrals through a singularity subtraction technique adapted to the spatial variability of diffusivity and fractional order. This regularization strategy is conveniently formulated as a sparse matrix correction that is added to the dense operator, and is applicable to different formulations of fractional diffusion equations. We also present a block low rank representation to handle the dense matrix representations, by exploiting the ability to approximate blocks of the resulting formally dense matrix by low rank factorizations. A Cholesky factorization solver operates directly on this representation using the low rank blocks as its atomic computational tiles, and achieves high performance on multicore hardware. Numerical results show that the singularity treatment is robust, substantially reduces discretization errors, and attains the first-order convergence rate allowed by the regularity of the solutions. They also show that considerable savings are obtained in storage ($O(N^{1.5})$) and computational cost ($O(N^2)$) compared to dense factorizations. This translates to orders-of-magnitude savings in memory and time on multi-dimensional problems, and shows that the proposed methods offer practical tools for tackling large nonlocal fractional diffusion simulations.
  • H2OPUS-TLR: High Performance Tile Low Rank Symmetric Factorizations using Adaptive Randomized Approximation

    Boukaram, Wagih Halim; Zampini, Stefano; Turkiyyah, George; Keyes, David E. (arXiv, 2021-08-26) [Preprint]
    Tile low rank representations of dense matrices partition them into blocks of roughly uniform size, where each off-diagonal tile is compressed and stored as its own low rank factorization. They offer an attractive representation for many data-sparse dense operators that appear in practical applications, where substantial compression and a much smaller memory footprint can be achieved. TLR matrices are a compromise between the simplicity of a regular perfectly-strided data structure and the optimal complexity of the unbalanced trees of hierarchically low rank matrices, and provide a convenient performance-tuning parameter through their tile size that can be proportioned to take into account the cache size where the tiles reside in the memory hierarchy. There are currently no high-performance algorithms that can generate Cholesky and $LDL^T$ factorizations, particularly on GPUs. The difficulties in achieving high performance when factoring TLR matrices come from the expensive compression operations that must be performed during the factorization process and the adaptive rank distribution of the tiles that causes an irregular work pattern for the processing cores. In this work, we develop a dynamic batching operation and combine it with batched adaptive randomized approximations to achieve high performance both on GPUs and CPUs. Our implementation attains over 1.2 TFLOP/s in double precision on the V100 GPU, and is limited by the performance of batched GEMM operations. The Cholesky factorization of covariance matrix of size $N = 131K$ arising in spatial statistics can be factored to an accuracy $\epsilon=10^{-2}$ in just a few seconds. We believe the proposed GEMM-centric algorithm allows it to be readily ported to newer hardware such as the tensor cores that are optimized for small GEMM operations.
  • A Rotated Characteristic Decomposition Technique for High-Order Reconstructions in Multi-dimensions

    Shen, Hua; Parsani, Matteo (Journal of Scientific Computing, Springer Science and Business Media LLC, 2021-08-11) [Article]
    When constructing high-order schemes for solving hyperbolic conservation laws, the corresponding high-order reconstructions are commonly performed in characteristic spaces to eliminate spurious oscillations as much as possible. For multi-dimensional finite volume (FV) schemes, we need to perform the characteristic decomposition several times in different normal directions of the target cell, which is very time-consuming. In this paper, we propose a rotated characteristic decomposition technique which requires only one-time decomposition for multi-dimensional reconstructions. The rotated direction depends only on the gradient of a specific physical quantity which is cheap to calculate. This technique not only reduces the computational cost remarkably, but also controls spurious oscillations effectively. We take a third-order weighted essentially non-oscillatory finite volume (WENO-FV) scheme for solving the Euler equations as an example to demonstrate the efficiency of the proposed technique.
  • Towards autonomous event identifications in wave-equation traveltime inversion

    chen, fuqiang; Peter, Daniel (GEOPHYSICS, Society of Exploration Geophysicists, 2021-08-03) [Article]
    We present a method to automatically relate events between observed and synthetic data for wave-equation traveltime (WT) inversion. The scheme starts with local similarity measurements by applying cross-correlation to localized traces. We then develop a differentiable alternative of the argmax function. By applying the differentiable argmax to each time slice of the local similarity map, we build a traveltime difference function but keep this process differentiable. WT inversion requires only the traveltime difference of related events through a phase shift. Thus, we must reject events that are not apparently related between observed and synthetic data. The local similarity map implies the possibility of doing so but shows abrupt changes with time and offset. To mitigate this problem, we introduce a dynamic programming algorithm to define a warping function. Wave packets between observed and synthetic data are assumed to be related if they are connected by this warping function, and they also exhibit high local similarity. Only such related events are considered in the subsequent calculation of misfit and adjoint source. Numerical examples demonstrate that the proposed method successfully retrieves an informative model from roughly selected data. In contrast, WT inversion based on cross-correlation or deconvolution fails to do so.
  • Skewness effects on the turbulence structure in a high-speed compressible and multi-component inert mixing layers

    Boukharfane, Radouan; Er-Raiy, Aimad; Parsani, Matteo; Hadri, Bilel (American Institute of Aeronautics and Astronautics, 2021-07-28) [Conference Paper]
    This work presents the analysis of the effects of the misalignment angle between two asymptotic streams of fluid, $\zeta$, whose interaction leads to a turbulent mixing region. In fact, spatially evolving mixing layers may see their turbulent structure statistics altered in the presence of the skew angle $\zeta$. The investigation is conducted by analyzing a new set of direct numerical simulations of spatially-developing compressible non-reactive hydrogen--air shear layers. To assess the effects associated to misalignment angle, the turbulent structure statistics of a skewed configuration with $\zeta=15^{\circ}$ are compared to the reference case where no skewness is introduced. The analysis of the mixing layer time-averaged statistics reveals the ability of the skewness to accelerate the inlet structures growth which, consequently, yields to a substantial enhancement of the mixing efficiency.
  • Structure and dynamics of small-scale turbulence in vaporizing two-phase flows.

    Boukharfane, Radouan; Er-Raiy, Aimad; Parsani, Matteo; Chakraborty, Nilanjan (Scientific reports, Springer Science and Business Media LLC, 2021-07-28) [Article]
    Improving our fundamental understanding of multiphase turbulent flows will be beneficial for analyses of a wide range of industrial and geophysical processes. Herein, we investigate the topology of the local flow in vaporizing forced homogeneous isotropic turbulent two-phase flows. The invariants of the velocity-gradient, rate-of-strain, rate-of-rotation tensors, and scalar gradient were computed and conditioned for different distances from the liquid-gas surface. A Schur decomposition of the velocity gradient tensor into a normal and non-normal parts was undertaken to supplement the classical double decomposition into rotation and strain tensors. Using direct numerical simulations results, we show that the joint probability density functions of the second and third invariants have classical shapes in all carrier-gas regions but gradually change as they approach the carrier-liquid interface. Near the carrier-liquid interface, the distributions of the invariants are remarkably similar to those found in the viscous sublayer of turbulent wall-bounded flows. Furthermore, the alignment of both vorticity and scalar gradient with the strain-rate field changes spatially such that its universal behaviour occurs far from the liquid-gas interface. We found also that the non-normal effects of the velocity gradient tensor play a crucial role in explaining the preferred alignment.
  • Bayesian seismic inversion: A fast sampling Langevin dynamics Markov chain Monte Carlo method

    Izzatullah, Muhammad; van Leeuwen, Tristan; Peter, Daniel (Geophysical Journal International, Oxford University Press (OUP), 2021-07-22) [Article]
    Summary In this study, we aim to solve the seismic inversion in the Bayesian framework by generating samples from the posterior distribution. This distribution incorporates the uncertainties in the seismic data, forward model, and prior information about the subsurface model parameters; thus, we obtain more information through sampling than through a point estimate (e.g., Maximum a Posteriori method). Based on the numerical cost of solving the forward problem and the dimensions of the subsurface model parameters and observed data, sampling with Markov chain Monte Carlo (MCMC) algorithms can be prohibitively expensive. Herein, we consider the promising Langevin dynamics MCMC algorithm. However, this algorithm has two central challenges: (1) the step size requires prior tuning to achieve optimal performance, and (2) the Metropolis-Hastings acceptance step is computationally demanding. We approach these challenges by proposing an adaptive step-size rule and considering the suppression of the Metropolis-Hastings acceptance step. We highlight the proposed method’s potential through several numerical examples and rigorously validate it via qualitative and quantitative evaluation of the sample quality based on the kernelized Stein discrepancy (KSD) and other MCMC diagnostics such as trace and autocorrelation function (ACF) plots. We conclude that, by suppressing the Metropolis-Hastings step, the proposed method provides fast sampling at efficient computational costs for large-scale seismic Bayesian inference; however, this inflates the second statistical moment (variance) due to asymptotic bias. Nevertheless, the proposed method reliably recovers important aspects of the posterior, including means, variances, skewness, and one- and-twodimensional marginals. With larger computational budget, exact MCMC methods (i.e., with a Metropolis-Hastings step) should be favored. The results thus obtained can be considered a feasibility study for promoting the approximate Langevin dynamics MCMC method for Bayesian seismic inversion on limited computational resources.
  • A scheduling policy to save 10% of communication time in parallel fast Fourier transform

    Aseeri, Samar A.; Gopal Chatterjee, Anando; Verma, Mahendra K.; Keyes, David E. (Concurrency and Computation: Practice and Experience, Wiley, 2021-07-18) [Article]
    The fast Fourier transform (FFT) has applications in almost every frequency related study, for example, in image and signal processing, and radio astronomy. It is also used as a Poisson operator inversion kernel in partial differential equations in fluid flows, in density functional theory, many-body theory, and others. The three-dimensional (Formula presented.) FFT has large time complexity (Formula presented.). Hence, parallelization is used to compute such FFTs. Popular libraries perform slab division or pencil decomposition of (Formula presented.) data. None of the existing libraries achieve perfect inverse scaling of time with (Formula presented.) cores because FFT requires all-to-all communication and clusters hitherto do not have physical all-to-all connections. Dragonfly, one of the popular topologies for the interconnect, supports hierarchical connections among the components. We show that if we align the all-to-all communication of FFT with the physical connections of Dragonfly topology we will achieve a better scaling and reduce communication time.
  • Approximate Error Bounds on Solutions of Nonlinearly Preconditioned PDEs

    Liu, Lulu; Keyes, David E. (SIAM Journal on Scientific Computing, Society for Industrial & Applied Mathematics (SIAM), 2021-07-15) [Article]
    In many multiphysics applications, an ultimate quantity of interest can be written as a linear functional of the solution to the discretized governing nonlinear partial differential equations and finding a sufficiently accurate pointwise solution may be regarded as a step toward that end. In this paper, we derive a posteriori approximate error bounds for linear functionals corresponding to quantities of interest using two kinds of nonlinear preconditioning techniques. Nonlinear preconditioning, such as the inexact Newton with backtracking and nonlinear elimination algorithm and the multiplicative Schwarz preconditioned inexact Newton algorithm, may be effective in improving global convergence for Newton's method. It may prevent stagnation of the nonlinear residual norm and reduce the number of solutions of large ill-conditioned linear systems involving a global Jacobian required at each nonlinear iteration. We illustrate the effectiveness of the new bounds using canonical nonlinear PDE models: a flame sheet model and a nonlinear coupled lid-driven cavity problem.
  • Competition on Spatial Statistics for Large Datasets

    Huang, Huang; Abdulah, Sameh; Sun, Ying; Ltaief, Hatem; Keyes, David E.; Genton, Marc G. (Journal of Agricultural, Biological and Environmental Statistics, Springer Science and Business Media LLC, 2021-07-08) [Article]
    As spatial datasets are becoming increasingly large and unwieldy, exact inference on spatial models becomes computationally prohibitive. Various approximation methods have been proposed to reduce the computational burden. Although comprehensive reviews on these approximation methods exist, comparisons of their performances are limited to small and medium sizes of datasets for a few selected methods. To achieve a comprehensive comparison comprising as many methods as possible, we organized the Competition on Spatial Statistics for Large Datasets. This competition had the following novel features: (1) we generated synthetic datasets with the ExaGeoStat software so that the number of generated realizations ranged from 100 thousand to 1 million; (2) we systematically designed the data-generating models to represent spatial processes with a wide range of statistical properties for both Gaussian and non-Gaussian cases; (3) the competition tasks included both estimation and prediction, and the results were assessed by multiple criteria; and (4) we have made all the datasets and competition results publicly available to serve as a benchmark for other approximation methods. In this paper, we disclose all the competition details and results along with some analysis of the competition outcomes.
  • Accelerating Seismic Redatuming Using Tile Low-Rank Approximations on NEC SX-Aurora TSUBASA

    Hong, Yuxi; Ltaief, Hatem; Ravasi, Matteo; Gatineau, Laurent; Keyes, David E. (Submitted to SUPERCOMPUTING FRONTIERS AND INNOVATIONS, Submitted to South Ural State University (Chelyabinsk, Russia), 2021-07-07) [Preprint]
    With the aim of imaging subsurface discontinuities, seismic data recorded at the surface of the Earth must be numerically re-positioned at locations in the subsurface where reflections have originated, a process generally referred to as redatuming by the geophysical community. Historically, this process has been carried out by numerically time-reversing the data recorded along an open boundary of surface receivers into the subsurface. Despite its simplicity, such an approach is only able to handle seismic energy from primary arrivals (i.e., waves that interact only once with the medium discontinuities), failing to explain multi-scattering in the subsurface. As a result, seismic images are contaminated by artificial reflectors if data are not pre-processed prior to imaging such that multiples are removed from the data. In the last decade, a novel family of methods has emerged under the name of Marchenko redatuming; such methods allow for accurate redatuming of the full-wavefield recorded seismic data including multiple arrivals. This is achieved by solving an inverse problem, whose adjoint modeling can be shown to be equivalent to the standard single-scattering redatuming method for primary-only data. A downside of this application is that the so-called multi-dimensional convolution operator must be repeatedly evaluated as part of the inversion. Such an operator requires the application of multiple dense matrix-vector multiplications (MVM), which represent the most time-consuming operations in the forward and adjoint processes. We identify and leverage the data sparsity structure for each of the frequency matrices during the MVM operation, and propose to accelerate the MVM step using tile low-rank (TLR) matrix approximations. We study the TLR impact on time-to-solution for the MVM using different accuracy thresholds whilst at the same time assessing the quality of the resulting subsurface seismic wavefields and show that TLR leads to a minimal degradation in terms of signal-to-noise ratio on a 3D synthetic dataset. We mitigate the load imbalance overhead and provide performance evaluation on two distributed-memory systems. Our MPI+OpenMP TLR-MVM implementation reaches up to 3X performance speedup against the dense MVM counterpart from NEC scientific library on 128 NEC SX-Aurora TSUBASA cards. Thanks to the second generation of high bandwidth memory technology, it further attains up to 67X performance speedup (i.e., 110 TB/s) compared to the dense MVM from Intel MKL when running on 128 dual-socket 20-core Intel Cascade Lake nodes with DDR4 memory, without deteriorating the quality of the reconstructed seismic wavefields.
  • Nonlinear Preconditioning Strategies for Two-Phase Flows in Porous Media Discretized by a Fully Implicit Discontinuous Galerkin Method

    Luo, Li; Cai, Xiao-Chuan; Keyes, David E. (SIAM Journal on Scientific Computing, Society for Industrial & Applied Mathematics (SIAM), 2021-06-09) [Article]
    We consider numerical simulation of two-phase flows in porous media using implicit methods. Because of the complex features involving heterogeneous permeability and nonlinear capillary effects, the nonlinear algebraic systems arising from the discretization are very difficult to solve. The traditional Newton method suffers from slow convergence in the form of a long stagnation or sometimes does not converge at all. In this paper, we develop nonlinear preconditioning strategies for the system of two-phase flows discretized by a fully implicit discontinuous Galerkin method. The preconditioners identify and approximately eliminate the local high nonlinearities that cause the Newton method to take small updates. Specifically, we propose two elimination strategies: one is based on exploring the unbalanced nonlinearities of the pressure and the saturation fields, and the other is based on identifying certain elements of the finite element space that have much higher nonlinearities than the rest of the elements. We compare the performance and robustness of the proposed algorithms with an existing single-field elimination approach and the classical inexact Newton method with respect to some physical and numerical parameters. Experiments on three-dimensional porous media applications show that the proposed algorithms are superior to other methods in terms of robustness and parallel efficiency.
  • Crustal and upper-mantle structure below Central and Southern Mexico

    Espindola-Carmona, A.; Peter, Daniel; Ortiz-Aleman, C. (Journal of Geophysical Research: Solid Earth, American Geophysical Union (AGU), 2021-06-03) [Article]
    The Mexican subduction zone is a complex convergent margin characterized by a slab with an unusual morphology, an abnormal location of the volcanic arc, and the existence of slow earthquakes (slow slip events and tectonic tremors). The number of seismic imaging studies of the Mexican subduction zone has increased over the past decade. These studies have revealed the seismic wave velocity structure beneath Central and Southern Mexico. However, the existing tomographic models are either confined to specific areas or have coarse resolution. As a consequence, they do not capture the lateral and finer-scale variations in the mantle and the crust structure. Here, we fit frequency-dependent traveltime differences between observed and synthetic seismograms in a three-dimensional model of Central and Southern Mexico to constrain crustal and upper mantle seismic velocity structures jointly. We use ∼3,300 seismic records, filtered between 5 and 50 s, from 74 regional earthquakes. We perform point spread function tests to assess the resolution and trade-offs between model parameters. Our tomographic model reveals several seismic wave velocity features. These features include the geometry of the Cocos slab, the partial melting zone beneath the Trans-Mexican Volcanic Belt, and the Yucatan slab diving below Los Tuxtlas Volcanic Field that are consistent with previous studies. We also identify horizontal structures in the lower crust and the shallow upper mantle that were not previously identified. These structures include slow seismic wave velocity anomalies that may be correlated with ultra-slow seismic velocity zones and high conductivity regions, where slow earthquakes have been identified.
  • Mapping full seismic waveforms to vertical velocity profiles by deep learning

    Kazei, Vladimir; Ovcharenko, Oleg; Plotnitskii, Pavel; Peter, Daniel; Zhang, Xiangliang; Alkhalifah, Tariq Ali (GEOPHYSICS, Society of Exploration Geophysicists, 2021-05-28) [Article]
    Building realistic and reliable models of the subsurface is the primary goal of seismic imaging. Here we construct an ensemble of convolutional neural networks (CNNs) to build velocity models directly from the data. Most other approaches attempt to map full data into 2D labels. We exploit the regularity of seismic acquisition and train CNNs to map gathers of neighboring common midpoints (CMPs) to vertical 1D velocity logs. This allows us to integrate well-log data into the inversion, simplify the mapping by using the 1D labels, and accommodate larger dips relative to using single CMP inputs. We dynamically generate the training data in parallel with training the CNNs, which reduces overfitting. Data generation and training of the CNNs is more computationally expensive than conventional full-waveform inversion (FWI). However, once the network is trained, data sets with similar acquisition parameters can be inverted much faster than with FWI. The multiCMP CNN ensemble is tested on multiple realistic synthetic models, performs well, and was combined with FWI for even better performance.

View more