Extreme Computing Research Center
Recent Submissions

Nonlinear Preconditioning Strategies for TwoPhase Flows in Porous Media Discretized by a Fully Implicit Discontinuous Galerkin Method(SIAM Journal on Scientific Computing, Society for Industrial & Applied Mathematics (SIAM), 20210609) [Article]We consider numerical simulation of twophase flows in porous media using implicit methods. Because of the complex features involving heterogeneous permeability and nonlinear capillary effects, the nonlinear algebraic systems arising from the discretization are very difficult to solve. The traditional Newton method suffers from slow convergence in the form of a long stagnation or sometimes does not converge at all. In this paper, we develop nonlinear preconditioning strategies for the system of twophase flows discretized by a fully implicit discontinuous Galerkin method. The preconditioners identify and approximately eliminate the local high nonlinearities that cause the Newton method to take small updates. Specifically, we propose two elimination strategies: one is based on exploring the unbalanced nonlinearities of the pressure and the saturation fields, and the other is based on identifying certain elements of the finite element space that have much higher nonlinearities than the rest of the elements. We compare the performance and robustness of the proposed algorithms with an existing singlefield elimination approach and the classical inexact Newton method with respect to some physical and numerical parameters. Experiments on threedimensional porous media applications show that the proposed algorithms are superior to other methods in terms of robustness and parallel efficiency.

Crustal and uppermantle structure below Central and Southern Mexico(Journal of Geophysical Research: Solid Earth, American Geophysical Union (AGU), 20210603) [Article]The Mexican subduction zone is a complex convergent margin characterized by a slab with an unusual morphology, an abnormal location of the volcanic arc, and the existence of slow earthquakes (slow slip events and tectonic tremors). The number of seismic imaging studies of the Mexican subduction zone has increased over the past decade. These studies have revealed the seismic wave velocity structure beneath Central and Southern Mexico. However, the existing tomographic models are either confined to specific areas or have coarse resolution. As a consequence, they do not capture the lateral and finerscale variations in the mantle and the crust structure. Here, we fit frequencydependent traveltime differences between observed and synthetic seismograms in a threedimensional model of Central and Southern Mexico to constrain crustal and upper mantle seismic velocity structures jointly. We use ∼3,300 seismic records, filtered between 5 and 50 s, from 74 regional earthquakes. We perform point spread function tests to assess the resolution and tradeoffs between model parameters. Our tomographic model reveals several seismic wave velocity features. These features include the geometry of the Cocos slab, the partial melting zone beneath the TransMexican Volcanic Belt, and the Yucatan slab diving below Los Tuxtlas Volcanic Field that are consistent with previous studies. We also identify horizontal structures in the lower crust and the shallow upper mantle that were not previously identified. These structures include slow seismic wave velocity anomalies that may be correlated with ultraslow seismic velocity zones and high conductivity regions, where slow earthquakes have been identified.

Mapping full seismic waveforms to vertical velocity profiles by deep learning(GEOPHYSICS, Society of Exploration Geophysicists, 20210528) [Article]Building realistic and reliable models of the subsurface is the primary goal of seismic imaging. Here we construct an ensemble of convolutional neural networks (CNNs) to build velocity models directly from the data. Most other approaches attempt to map full data into 2D labels. We exploit the regularity of seismic acquisition and train CNNs to map gathers of neighboring common midpoints (CMPs) to vertical 1D velocity logs. This allows us to integrate welllog data into the inversion, simplify the mapping by using the 1D labels, and accommodate larger dips relative to using single CMP inputs. We dynamically generate the training data in parallel with training the CNNs, which reduces overfitting. Data generation and training of the CNNs is more computationally expensive than conventional fullwaveform inversion (FWI). However, once the network is trained, data sets with similar acquisition parameters can be inverted much faster than with FWI. The multiCMP CNN ensemble is tested on multiple realistic synthetic models, performs well, and was combined with FWI for even better performance.

A rotated characteristic decomposition technique for highorder reconstructions in multidimensions(arXiv, 20210521) [Preprint]When constructing highorder schemes for solving hyperbolic conservation laws, the corresponding highorder reconstructions are commonly performed in characteristic spaces to eliminate spurious oscillations as much as possible. For multidimensional finite volume (FV) schemes, we need to perform the characteristic decomposition several times in different normal directions of the target cell, which is very timeconsuming. In this paper, we propose a rotated characteristic decomposition technique which requires only onetime decomposition for multidimensional reconstructions. The rotated direction depends only on the gradient of a specific physical quantity which is cheap to calculate. This technique not only reduces the computational cost remarkably, but also controls spurious oscillations effectively. We take a thirdorder weighted essentially nonoscillatory finite volume (WENOFV) scheme for solving the Euler equations as an example to demonstrate the efficiency of the proposed technique.

Efficiency assessment of approximated spatial predictions for large datasets(Spatial Statistics, Elsevier BV, 20210514) [Article]Due to the wellknown computational showstopper of the exact Maximum Likelihood Estimation (MLE) for large geospatial observations, a variety of approximation methods have been proposed in the literature, which usually require tuning certain inputs. For example, the recently developed Tile LowRank approximation (TLR) method involves many tuning parameters, including numerical accuracy. To properly choose the tuning parameters, it is crucial to adopt a meaningful criterion for the assessment of the prediction efficiency with different inputs. Unfortunately, the most commonlyused Mean Square Prediction Error (MSPE) criterion cannot directly assess the loss of efficiency when the spatial covariance model is approximated. Though the Kullback–Leibler Divergence criterion can provide the information loss of the approximated model, it cannot give more detailed information that one may be interested in, e.g., the accuracy of the computed MSE. In this paper, we present three other criteria, the Mean Loss of Efficiency (MLOE), Mean Misspecification of the Mean Square Error (MMOM), and Root mean square MOM (RMOM), and show numerically that, in comparison with the common MSPE criterion and the Kullback–Leibler Divergence criterion, our criteria are more informative, and thus more adequate to assess the loss of the prediction efficiency by using the approximated or misspecified covariance models. Hence, our suggested criteria are more useful for the determination of tuning parameters for sophisticated approximation methods of spatial model fitting. To illustrate this, we investigate the tradeoff between the execution time, estimation accuracy, and prediction efficiency for the TLR method with extensive simulation studies and suggest proper settings of the TLR tuning parameters. We then apply the TLR method to a large spatial dataset of soil moisture in the area of the Mississippi River basin, and compare the TLR with the Gaussian predictive process and the composite likelihood method, showing that our suggested criteria can successfully be used to choose the tuning parameters that can keep the estimation or the prediction accuracy in applications.

Seismic Velocity Variations in a 3D Martian Mantle: Implications for the InSight Measurements(Journal of Geophysical Research: Planets, American Geophysical Union (AGU), 20210514) [Article]We use a large data set of 3D thermal evolution models to predict the distribution of presentday seismic velocities in the Martian interior. Our models show a difference between maximum and minimum Swave velocity of up to 10% either below the crust, where thermal variations are largest, or at the depth of the olivine to wadsleyite phase transition, located at around 1000 – 1200 km depth. Models with thick lithospheres on average have weak lowvelocity zones that extend deeper than 400 km, and seismic velocity variations in the uppermost 400 – 600 km that closely follow the crustal thickness pattern. For these cases the crust contains more than half of the total amount of heat producing elements. Models with limited crustal heat production have thinner lithospheres and shallower but prominent lowvelocity zones that are incompatible with InSight observations. Seismic events suggested to originate in Cerberus Fossae indicate the absence of Swave shadow zones in 25°  30° epicentral distance. This result is compatible with previous bestfit models that require a large average lithospheric thickness and a crust containing more than half of the bulk amount of heat producing elements to be compatible with geological and geophysical constraints. Ongoing and future InSight measurements that will determine the existence of a weak lowvelocity zone will directly bear on the crustal heat production.

HighPerformance Partial Spectrum Computation for Symmetric eigenvalue problems and the SVD(arXiv, 20210429) [Preprint]Current dense symmetric eigenvalue (EIG) and singular value decomposition (SVD) implementations may suffer from the lack of concurrency during the tridiagonal and bidiagonal reductions, respectively. This performance bottleneck is typical for the twosided transformations due to the Level2 BLAS memorybound calls. Therefore, the current stateoftheart EIG and SVD implementations may achieve only a small fraction of the system's sustained peak performance. The QRbased Dynamically Weighted Halley (QDWH) algorithm may be used as a preprocessing step toward the EIG and SVD solvers, while mitigating the aforementioned bottleneck. QDWHEIG and QDWHSVD expose more parallelism, while relying on computebound matrix operations. Both run closer to the sustained peak performance of the system, but at the expense of performing more FLOPS than the standard EIG and SVD algorithms. In this paper, we introduce a new QDWHbased solver for computing the partial spectrum for EIG (QDWHpartialEIG) and SVD (QDWHpartialSVD) problems. By optimizing the rational function underlying the algorithms only in the desired part of the spectrum, QDWHpartialEIG and QDWHpartialSVD algorithms efficiently compute a fraction (say 120%) of the corresponding spectrum. We develop highperformance implementations of QDWHpartialEIG and QDWHpartialSVD on distributedmemory anymore systems and demonstrate their numerical robustness. Experimental results using up to 36K MPI processes show performance speedups for QDWHpartialSVD up to 6X and 2X against PDGESVD from ScaLAPACK and KSVD, respectively. QDWHpartialEIG outperforms PDSYEVD from ScaLAPACK up to 3.5X but remains slower compared to ELPA. QDWHpartialEIG achieves, however, a better occupancy of the underlying hardware by extracting higher sustained peak performance than ELPA, which is critical moving forward with acceleratorbased supercomputers.

Optimized RungeKutta Methods with Automatic Step Size Control for Compressible Computational Fluid Dynamics(arXiv, 20210414) [Preprint]We develop errorcontrol based time integration algorithms for compressible fluid dynamics (CFD) applications and show that they are efficient and robust in both the accuracylimited and stabilitylimited regime. Focusing on discontinuous spectral element semidiscretizations, we design new controllers for existing methods and for some new embedded RungeKutta pairs. We demonstrate the importance of choosing adequate controller parameters and provide a means to obtain these in practice. We compare a wide range of errorcontrolbased methods, along with the common approach in which step size control is based on the CourantFriedrichsLewy (CFL) number. The optimized methods give improved performance and naturally adopt a step size close to the maximum stable CFL number at loose tolerances, while additionally providing control of the temporal error at tighter tolerances. The numerical examples include challenging industrial CFD applications.

A class of highorder weighted compact central schemes for solving hyperbolic conservation laws(arXiv, 20210409) [Preprint]We propose a class of weighted compact central (WCC) schemes for solving hyperbolic conservation laws. The linear version can be considered as a highorder extension of the central LaxFriedrichs (LxF) scheme and the central conservation element and solution element (CESE) scheme. On every cell, the solution is approximated by a Pth order polynomial of which all the DOFs are stored and updated separately. The cell average is updated by a classical finite volume scheme which is constructed based on spacetime staggered meshes such that the fluxes are continuous across the interfaces of the adjacent control volumes and, therefore, the local Riemann problem is bypassed. The kth order spatial derivatives are updated by a central difference of (k1)th order spatial derivatives at cell vertices. All the spacetime information is calculated by the CauchyKovalewski procedure. By doing so, the schemes are able to achieve arbitrarily uniform spacetime high order on a supercompact stencil with only one explicit time step. In order to capture discontinuities without spurious oscillations, a weighted essentially nonoscillatory (WENO) type limiter is tailormade for the schemes. The limiter preserves the compactness and high order accuracy of the schemes. The accuracy, robustness, and efficiency of the schemes are verified by several numerical examples of scalar conservation laws and the compressible Euler equations.

High Performance Multivariate Geospatial Statistics on Manycore Systems(IEEE Transactions on Parallel and Distributed Systems, Institute of Electrical and Electronics Engineers (IEEE), 20210406) [Article]Modeling and inferring spatial relationships and predicting missing values of environmental data are some of the main tasks of geospatial statisticians. These routine tasks are accomplished using multivariate geospatial models and the cokriging technique, which requires the evaluation of the expensive Gaussian loglikelihood function. This largescale cokriging challenge provides a fertile ground for supercomputing implementations for the geospatial statistics community as it is paramount to scale computational capability to match the growth in environmental data. In this paper, we develop largescale multivariate spatial modeling and inference on parallel hardware architectures. To tackle the increasing complexity in matrix operations and the massive concurrency in parallel systems, we leverage lowrank matrix approximation techniques with taskbased programming models and schedule the asynchronous computational tasks using a dynamic runtime system. The proposed framework provides both the dense and approximated computations of the Gaussian loglikelihood function. It demonstrates accuracy robustness and performance scalability on a variety of computer systems. Using both synthetic and real datasets, the lowrank matrix approximation shows better performance compared to exact computation, while preserving the application requirements in both parameter estimation and prediction accuracy. We also propose a novel algorithm to assess the prediction accuracy after the online parameter estimation.

The Arab world prepares the exascale workforce(Communications of the ACM, Association for Computing Machinery (ACM), 202104) [Article]THE ARAB WORLD is currently host to eight supercomputers in the Top500 globally, including the current #10 and a former #7. Hardware can become a honeypot for talent attraction—senior talent from abroad, and rising talent from within. Good return on investment from leadingedge hardware motivates forging collaborative ties to global supercomputing leaders, which leads to integration into the global campaigns that supercomputing excels in, such as predicting climate change and developing sustainable energy resources for its mitigation, positing properties of new materials and catalysis by design, repurposing alreadycertified drugs and discovering new ones, and big data analytics and machine learning applied to science and to society. While the petroleum industry has been the historical motivation for supercomputing in the Arab World with its workloads of seismic imaging and reservoir modeling, the attraction today is universal.

Constraints on the upper mantle structure beneath the Pacific from 3D anisotropic waveform modelling(Journal of Geophysical Research: Solid Earth, American Geophysical Union (AGU), 20210324) [Article]Seismic radial anisotropy is a crucial tool to help constrain flow in the Earth's mantle. However, Earth structure beneath the oceans imaged by current 3D radially anisotropic mantle models shows large discrepancies. In this study, we provide constraints on the radially anisotropic upper mantle structure beneath the Pacific by waveform modelling and subsequent inversion. Specifically, we objectively evaluate three 3D tomography mantle models which exhibit varying distributions of radial anisotropy through comparisons of independent real datasets with synthetic seismograms computed with the spectralelement method. The data require an asymmetry at the East Pacific Rise (EPR) with stronger positive radial anisotropy ξ=urn:xwiley:21699313:media:jgrb54831:jgrb54831math0001=1.131.16 at ∼100 km depth to the west of the East Pacific Rise than to the east (ξ=1.111.13). This suggests that the anisotropy in this region is due to the lattice preferred orientation (LPO) of anisotropic mantle minerals produced by sheardriven asthenospheric flow beneath the South Pacific Superswell. Our new radial anisotropy constraints in the Pacific show three distinct positive linear anomalies at ∼100 km depth. These anomalies are possibly related to mantle entrainment at the NazcaSouth America subduction zone, flow at the East Pacific Rise and from the South Pacific Superswell and SPO (shapepreferred orientation) of melt beneath Hawaii. Radial anisotropy reduces with lithospheric age to ξ < 1.05 in the west at ∼100 km depth, which possibly reflects a deviation from horizontal flow as the mantle is entrained with subducting slabs, a change in temperature or water content that could alter the anisotropic olivine fabric or the shapepreferred orientation of melt.

The influence of depthvarying elastic properties in controlling the dynamic rupture of megathrust earthquakes and upperplate coseismic deformation(Copernicus GmbH, 20210303) [Presentation]It has been recently proposed that the depthvarying rupture properties of megathrust earthquakes can be explained by the depth distribution of elastic properties of the rocks overlying the megathrust fault. Here we demonstrate that such relationship is mechanically viable by using 3D dynamic rupture simulations. We compare results from two subduction zone scenarios with different depthdistribution of elastic properties to explore the influence of realistic upperplate elasticity on rupture characteristics such as slip, rupture duration, and frequency content.</p><p>The first scenario has a homogeneous distribution of elastic properties, with values of Vp, Vs, and density typical of rocks overlying the megathrust fault at 25 km depth. The second scenario includes the typical depth distribution of elastic properties overlying the megathrust fault inferred from worldwide tomographic models of the upper plate. For both scenarios, we simulate three cases with ruptures confined to the shallow domain (05 km depth), transitional domain (510 km depth), and regular domain (1025 km depth), respectively. We assume the same friction properties for both scenarios.</p><p>Results show that the realistic distribution of elastic properties accounts for increasing slip and decreasing high frequency content trenchwards, and that slip may be 8 times larger and corner frequency 2 times lower in the shallow domain than in the regular domain. Rupture times along depth shows that the rupture through a realistic elastic model may be 2.53 times slower in the shallow domain than in the regular domain. Depthvariations of slip, frequency content, and rupture time quantitatively agree with previous predictions, confirming that depletion of high frequency content and slow rupture are inherent of ruptures propagating through the shallow domain, where elastic properties variations drop more rapidly than in the regular and transitional domains.</p><p>Depthdependent elastic properties also affect the dynamics of slip rate. Peak slip rate values in the heterogeneous model anticorrelate with rigidity variations and are 34 times higher than those observed in the homogeneous model in the shallow domain. Increasing peak sliprate difference trenchwards correlates with increasing local ground motion differences between models. We also find important differences on permanent coseismic deformation of the upper plate. We show that coseismic deformation is significantly larger in the shallow domain in the heterogeneous models, where uplift ratios may be up to 2 times larger and alongdip displacement of the seafloor may be >6 times larger than displacement values from the homogeneous model. We use the permanent uplift seafloor deformation from both models to model the corresponding tsunamis with TsunamiHySEA software. The results show that, at the coast, the maximum amplitude of the tsunami generated by the heterogeneous model may be up to 25% larger than that excited by the homogeneous model.</p><p>This study demonstrate the relevant role of upperplate elasticity in controlling not only rupture characteristics, but also coseismic upper plate deformation, and tsunamigenesis. Neglecting the distribution of these properties may result in important underestimation of slip, rupture time, and local ground motion, as well as on seafloor coseismic deformation of the shallow domain, which in turn may lead to underestimations of tsunami size.

Triple Decomposition of Velocity Gradient Tensor in Compressible Turbulence(Fluids, MDPI AG, 20210302) [Article]The decomposition of the local motion of a fluid into straining, shearing, and rigidbody rotation is examined in this work for a compressible isotropic turbulence by means of direct numerical simulations. The triple decomposition is closely associated with a basic reference frame (BRF), in which the extraction of the biasing effect of shear is maximized. In this study, a new computational and inexpensive procedure is proposed to identify the BRF for a threedimensional flow field. In addition, the influence of compressibility effects on some statistical properties of the turbulent structures is addressed. The direct numerical simulations are carried out with a Reynolds number that is based on the Taylor microscale of Reλ=100 for various turbulent Mach numbers that range from Mat=0.12 to Mat=0.89. The DNS database is generated with an improved seventhorder accurate weighted essentially nonoscillatory scheme to discretize the nonlinear advective terms, and an eighthorder accurate centered finite difference scheme is retained for the diffusive terms. One of the major findings of this analysis is that regions featuring strong rigidbody rotations or straining motions are highly spatially intermittent, while most of the flow regions exhibit moderately strong shearing motions in the absence of rigidbody rotations and straining motions. The majority of compressibility effects can be estimated if the scaling laws in the case of compressible turbulence are rescaled by only considering the solenoidal contributions.

A sharp interface method using enriched finite elements for elliptic interface problems(Numerische Mathematik, Springer Nature, 20210302) [Article]We present an immersed boundary method for the solution of elliptic interface problems with discontinuous coefficients which provides a secondorder approximation of the solution. The proposed method can be categorised as an extended or enriched finite element method. In contrast to other extended FEM approaches, the new shape functions get projected in order to satisfy the Kroneckerdelta property with respect to the interface. The resulting combination of projection and restriction was already derived in Höllbacher and Wittum (TBA, 2019a) for application to particulate flows. The crucial benefits are the preservation of the symmetry and positive definiteness of the continuous bilinear operator. Besides, no additional stabilisation terms are necessary. Furthermore, since our enrichment can be interpreted as adaptive mesh refinement, the standard integration schemes can be applied on the cut elements. Finally, small cut elements do not impair the condition of the scheme and we propose a simple procedure to ensure good conditioning independent of the location of the interface. The stability and convergence of the solution will be proven and the numerical tests demonstrate optimal order of convergence.

Parallel Hierarchical Matrix Technique to Approximate Large Covariance Matrices, Likelihood Functions and Parameter Identi fication(20210301) [Presentation]We develop the HLIBCov package, which is using parallel hierarchical (H) matrices to: 1) Approximate large dense inhomogeneous covariance matrices with a loglinear computational cost and storage requirement. 2) Compute matrixvector product, Cholesky factorization and inverse with a loglinear complexity. 3) Identify unknown parameters of the covariance function (variance, smoothness, and covariance length). These unknown parameters are estimated by maximizing the joint Gaussian loglikelihood function. To demonstrate the numerical performance, we identify three unknown parameters in an example with 2,000,000 locations on a PCdesktop.

KSPHPDDM and PCHPDDM: Extending PETSc with advanced Krylov methods and robust multilevel overlapping Schwarz preconditioners(Computers and Mathematics with Applications, Elsevier BV, 20210122) [Article]Contemporary applications in computational science and engineering often require the solution of linear systems which may be of different sizes, shapes, and structures. The goal of this paper is to explain how two libraries, PETSc and HPDDM, have been interfaced in order to offer endusers robust overlapping Schwarz preconditioners and advanced Krylov methods featuring recycling and the ability to deal with multiple righthand sides. The flexibility of the implementation is showcased and explained with minimalist, easytorun, and reproducible examples, to ease the integration of these algorithms into more advanced frameworks. The examples provided cover applications from eigenanalysis, elasticity, combustion, and electromagnetism.

Exploiting lowrank covariance structures for computing highdimensional normal and Studentt probabilities(Statistics and Computing, Springer Nature, 20210112) [Article]We present a preconditioned Monte Carlo method for computing highdimensional multivariate normal and Studentt probabilities arising in spatial statistics. The approach combines a tilelowrank representation of covariance matrices with a blockreordering scheme for efficient quasiMonte Carlo simulation. The tilelowrank representation decomposes the highdimensional problem into many diagonalblocksize problems and lowrank connections. The blockreordering scheme reorders between and within the diagonal blocks to reduce the impact of integration variables from right to left, thus improving the Monte Carlo convergence rate. Simulations up to dimension 65,536 suggest that the new method can improve the run time by an order of magnitude compared with the hierarchical quasiMonte Carlo method and two orders of magnitude compared with the dense quasiMonte Carlo method. Our method also forms a strong substitute for the approximate conditioning methods as a more robust estimation with error guarantees. An application study to wind stochastic generators is provided to illustrate that the new computational method makes the maximum likelihood estimation feasible for highdimensional skewnormal random fields.

Entropy–Stable No–Slip Wall Boundary Conditions for the Eulerian Model for Viscous and Heat Conducting Compressible Flows(American Institute of Aeronautics and Astronautics (AIAA), 20210111) [Conference Paper]Nonlinear (entropy) stability analysis is used to derive entropy–stable no–slip wall boundary conditions at the continuous and semi–discrete levels for the Eulerian model proposed by Svärd in 2018 (Physica A: Statistical Mechanics and its Applications, 2018). The spatial discretization is based on discontinuous Galerkin summationbyparts operators of any order for unstructured grids. We provide a set of two–dimensional numerical results for laminar and turbulent flows simulated with both the Eulerian and classical Navier–Stokes models. These results are computed with a highperformance ℎ–entropy–stable solver, that also features explicit and implicit entropy–stable time integration schemes.

A Comparison of Parallel Profiling Tools for Programs utilizing the FFT(Association for Computing Machinery (ACM), 20210106) [Conference Paper]Performance monitoring is an important component of code optimization. Performance monitoring is also important for the beginning user, but can be difficult to configure appropriately. The overhead of the performance monitoring tools Craypat, FPMP, mpiP, Scalasca and TAU, are measured using default configurations likely to be choosen by a novice user and shown to be small when profiling Fast Fourier Transform based solvers for the Klein Gordon equation based on 2decomp&FFT and on FFTE. Performance measurements help explain that despite FFTE having a more efficient parallel algorithm, it is not always faster than 2decom&FFT because the complied single core FFT is not as fast as that in FFTW which is used in 2decomp&FFT.