Recent Submissions

  • A Multilayer Nonlinear Elimination Preconditioned Inexact Newton Method for Steady-State Incompressible Flow Problems in Three Dimensions

    Luo, Li; Cai, Xiao Chuan; Yan, Zhengzheng; Xu, Lei; Keyes, David E. (SIAM Journal on Scientific Computing, Society for Industrial & Applied Mathematics (SIAM), 2020-11-24) [Article]
    We develop a multilayer nonlinear elimination preconditioned inexact Newton method for a nonlinear algebraic system of equations, and a target application is the three-dimensional steady-state incompressible Navier--Stokes equations at high Reynolds numbers. Nonlinear steadystate problems are often more difficult to solve than time-dependent problems because the Jacobian matrix is less diagonally dominant, and a good initial guess from the previous time step is not available. For such problems, Newton-like methods may suffer from slow convergence or stagnation even with globalization techniques such as line search. In this paper, we introduce a cascadic multilayer nonlinear elimination approach based on feedback from intermediate solutions to improve the convergence of Newton iteration. Numerical experiments show that the proposed algorithm is superior to the classical inexact Newton method and other single layer nonlinear elimination approaches in terms of the robustness and efficiency. Using the proposed nonlinear preconditioner with a highly parallel domain decomposition framework, we demonstrate that steady solutions of the Navier--Stokes equations with Reynolds numbers as large as 7,500 can be obtained for the lid-driven cavity flow problem in three dimensions without the use of any continuation methods.
  • NodePy: A package for the analysis of numerical ODE solvers

    Ketcheson, David I.; Ranocha, Hendrik; Parsani, Matteo; Waheed, Umair bin; Hadjimichael, Yiannis (Journal of Open Source Software, The Open Journal, 2020-11-17) [Article]
    Ordinary differential equations (ODEs) are used to model a vast range of physical and other phenomena. They also arise in the discretization of partial differential equations. In most cases, solutions of differential equations must be approximated by numerical methods. The study of the properties of numerical methods for ODEs comprises an important and large body of knowledge. NodePy (available from, with documentation at is a software package for designing and studying the properties of numerical ODE solvers. For the most important classes of methods, NodePy can automatically assess their stability, accuracy, and many other properties. NodePy has also been used as a catalog of coefficients for time integration methods in PDE solver codes.
  • Leveraging PaRSEC Runtime Support to Tackle Challenging 3D Data-Sparse Matrix Problems

    Cao, Qinglei; Pei, Yu; Akbudak, Kadir; Bosilca, George; Ltaief, Hatem; Keyes, David E.; Dongarra, Jack (IEEE, 2020-11-01) [Conference Paper]
    The task-based programming model associated with dynamic runtime systems has gained popularity for challenging problems because of workload imbalance, heterogeneous resources, or extreme concurrency. During the last decade, lowrank matrix approximations, where the main idea consists of exploiting data sparsity typically by compressing off-diagonal tiles up to an application-specific accuracy threshold, have been adopted to address the curse of dimensionality at extreme scale. In this paper, we create a bridge between the runtime and the linear algebra by communicating knowledge of the data sparsity to the runtime. We design and implement this synergistic approach with high user productivity in mind, in the context of the PaRSEC runtime system and the HiCMA numerical library. This requires to extend PaRSEC with new features to integrate rank information into the dataflow so that proper decisions can be taken at runtime. We focus on the tile low-rank (TLR) Cholesky factorization for solving 3D data-sparse covariance matrix problems arising in environmental applications. In particular, we employ the 3D exponential model of Matern matrix kernel, which exhibits challenging nonuniform ´high ranks in off-diagonal tiles. We first provide a dynamic data structure management driven by a performance model to reduce extra floating-point operations. Next, we optimize the memory footprint of the application by relying on a dynamic memory allocator, and supported by a rank-aware data distribution to cope with the workload imbalance. Finally, we expose further parallelism using kernel recursive formulations to shorten the critical path. Our resulting high-performance implementation outperforms existing data-sparse TLR Cholesky factorization by up to 7-fold on a large-scale distributed-memory system, while minimizing the memory footprint up to a 44-fold factor. This multidisciplinary work highlights the need to empower runtime systems beyond their original duty of task scheduling for servicing next-generation low-rank matrix algebra libraries.
  • High Performance Asynchronous Reverse Time Migration for Oil and Gas Exploration

    Qu, Long; Abdelkhalak, Rached; Ltaief, Hatem; Said, Issam; Keyes, David E. (IEEE, 2020-11-01) [Conference Paper]
    Reverse Time Migration (RTM) is a state-of-the-art algorithm used in seismic depth imaging in complex geological environments for the oil and gas exploration industry. It calculates high-resolution images by solving the three-dimensional acoustic wave equation using seismic datasets recorded at various receiver locations. Using a finite-difference time-domain (FDTD) scheme, the time integration follows an adjoint-state formulation with two successive phases, i.e., forward modeling and backward integration. Each subsurface image is then generated during the imaging condition that combines a forward propagated source wavefield with a backward propagated receiver wavefield. RTM’s computational phases are predominantly composed of stencil computational kernels for the FDTD, applying the absorbing boundary conditions, and I/O operations needed for the imaging condition. In fact, RTM can be considered as an out-of-core algorithm, which requires offloading to disk snapshots of the domain solution at specific time intervals during the forward modeling phase. During the backward time integration, these snapshots are read back and synchronized at the corresponding times. As far as optimizing the stencil computation, spatial blocking represents the widely-adopted vendor-agnostic technique for increasing data reuse in the high-level of the memory subsystem. In this paper, we integrate in RTM the asynchronous Multicore Wavefront Diamond (MWD) tiling approach that permits to further increase data reuse by leveraging spatial with Temporal Blocking (TB) during the stencil computations. This integration engenders new challenges with a snowball effect on the legacy synchronous RTM workflow as it requires to deeply rethink of how the absorbing boundary conditions, the I/O operations, and the imaging condition operate. These disruptive changes in cascade are necessary to maintain the performance superiority of asynchronous stencil execution throughout the time integration, while ensuring the quality of the subsurface image does not deteriorate. We assess the overall performance of the new MWD-based RTM and compare against traditional SB-based RTM on various shared-memory systems using the SEG Salt3D model. The MWD-based RTM is able to achieve up to 60% performance speedup compared to SB-based RTM. To our knowledge, this paper highlights for the first time the applicability of asynchronous RTM executions, which results in a higher simulation throughput and may eventually create new research opportunities in improving the hydrocarbon extraction for the petroleum industry.
  • RK-Opt: A package for the design of numerical ODE solvers

    Ketcheson, David I.; Parsani, Matteo; Grant, Zachary; Ahmadia, Aron; Ranocha, Hendrik (Journal of Open Source Software, The Open Journal, 2020-10-30) [Article]
    Ordinary and partial differential equations (ODEs and PDEs) are used to model many important phenomena. In most cases, solutions of these models must be approximated by numerical methods. Most of the relevant algorithms fall within a few classes of methods, with the properties of individual methods determined by their coefficients. The choice of appropriate coefficients in the design of methods for specific applications is an important area of research. RK-Opt is a software package for designing numerical ODE solvers with coefficients optimally chosen to provide desired properties. It is available from, with documentation at The primary focus of the package is on the design of Runge-Kutta methods, but some routines for designing other classes of methods such as multistep Runge-Kutta and general linear methods are also included.
  • On the robustness and performance of entropy stable collocated discontinuous Galerkin methods

    Rojas, Diego B.; Boukharfane, Radouan; Dalcin, Lisandro; Del Rey Fernández, David C.; Ranocha, Hendrik; Keyes, David E.; Parsani, Matteo (Journal of Computational Physics, Elsevier BV, 2020-10-22) [Article]
    In computational fluid dynamics, the demand for increasingly multidisciplinary reliable simulations, for both analysis and design optimization purposes, requires transformational advances in individual components of future solvers. At the algorithmic level, hardware compatibility and efficiency are of paramount importance in determining viability at exascale and beyond. However, equally important (if not more so) is algorithmic robustness with minimal user intervention, which becomes progressively more challenging to achieve as problem size and physics complexity increase. We numerically show that low and high order entropy stable collocated discontinuous Galerkin discretizations based on summation-by-part operators and simultaneous-approximationterms technique provide an essential step toward a truly enabling technology in terms of reliability and robustness for both under-resolved turbulent flow simulations and flows with discontinuities.
  • Bayes Meets Tikhonov: Understanding Uncertainty Within Gaussian Framework for Seismic Inversion

    Izzatullah, Muhammad; Peter, Daniel; Kabanikhin, Sergey; Shishlenin, Maxim (Springer Singapore, 2020-10-22) [Book Chapter]
    In this chapter, we demonstrate the sound connection between the Bayesian approach and the Tikhonov regularisation within Gaussian framework. We provide a thorough uncertainty analysis to answer the following two fundamental questions: (1) How well is the estimate determined by a posteriori PDF, i.e. by the combination of observed data and a priori information? (2) What are the respective contributions of observed data and a priori information? To support the proposed methodology, we demonstrate it through numerical applications in seismic inversions.
  • Hierarchical Matrix Approximations of Hessians Arising in Inverse Problems Governed by PDEs

    Ambartsumyan, Ilona; Boukaram, Wagih Halim; Bui-Thanh, Tan; Ghattas, Omar; Keyes, David E.; Stadler, Georg; Turkiyyah, George; Zampini, Stefano (SIAM Journal on Scientific Computing, Society for Industrial & Applied Mathematics (SIAM), 2020-10-22) [Article]
    Hessian operators arising in inverse problems governed by partial differential equations (PDEs) play a critical role in delivering efficient, dimension-independent convergence for Newton solution of deterministic inverse problems, as well as Markov chain Monte Carlo sampling of posteriors in the Bayesian setting. These methods require the ability to repeatedly perform operations on the Hessian such as multiplication with arbitrary vectors, solving linear systems, inversion, and (inverse) square root. Unfortunately, the Hessian is a (formally) dense, implicitly defined operator that is intractable to form explicitly for practical inverse problems, requiring as many PDE solves as inversion parameters. Low rank approximations are effective when the data contain limited information about the parameters but become prohibitive as the data become more informative. However, the Hessians for many inverse problems arising in practical applications can be well approximated by matrices that have hierarchically low rank structure. Hierarchical matrix representations promise to overcome the high complexity of dense representations and provide effective data structures and matrix operations that have only log-linear complexity. In this work, we describe algorithms for constructing and updating hierarchical matrix approximations of Hessians, and illustrate them on a number of representative inverse problems involving time-dependent diffusion, advection-dominated transport, frequency domain acoustic wave propagation, and low frequency Maxwell equations, demonstrating up to an order of magnitude speedup compared to globally low rank approximations.
  • Stochastic Levenberg-Marquardt for Solving Optimization Problems on Hardware Accelerators

    Hong, Yuxi; Bergou, Houcine; Doucet, Nicolas; Zhang, Hao; Cranney, Jesse; Ltaief, Hatem; Gratadour, Damien; Rigaut, Francois; Keyes, David E. (Submitted to IEEE, 2020-10-19) [Preprint]
    We present a new Stochastic Levenberg-Marquardt (SLM) algorithm for efficiently solving large-scale nonlinear least-squares optimization problems. The SLM method incorporates stochasticity into the traditional Levenberg-Marquardt (LM) method. While the traditional LM operates on the full objective function, SLM randomly evaluates part of the objective to compute the corresponding derivatives and function values. Hence, SLM reduces the algorithmic complexity per iteration and speeds up the overall time to solution, while maintaining the numerical robustness of second-order methods. We assess the SLM method on standard datasets from LIBSVM, as well as on a large-scale optimization problem found in ground-based astronomy applications, and in particular adaptive optics systems of the next generation of instruments for the European Very Large and Extremely Large Telescopes. We implement SLM and deploy it on a shared-memory system equipped with multiple GPU hardware accelerators. We demonstrate the performance superiority of the SLM method over not only the traditional LM algorithm but also the state-of-the-art first-order methods. SLM finishes optimization process in less than 1 second on large datasets from the adaptive optics application, where LM and other methods require more than a few minutes. This enables to identify of the system parameters (e.g., atmospheric turbulence and wind speed) and to capture their evolution required during a night of observations with a close to real-time throughput.
  • Extrapolating low-frequency prestack land data with deep learning

    Ovcharenko, Oleg; Kazei, Vladimir; Plotnitskiy, Pavel; Peter, Daniel; Silvestrov, Ilya; Bakulin, Andrey; Alkhalifah, Tariq Ali (Society of Exploration Geophysicists, 2020-10-01) [Conference Paper]
    Missing low-frequency content in seismic data is a common challenge for seismic inversion. Long wavelengths are necessary to reveal large structures in the subsurface and to build an acceptable starting point for later iterations of full-waveform inversion (FWI). High-frequency land seismic data are particularly challenging due to the elastic nature of the Earth contrasting with acoustic air at the typically rugged free surface, which makes the use of low frequencies even more vital to the inversion. We propose a supervised deep learning framework for bandwidth extrapolation of prestack elastic data in the time domain. We utilize a Convolutional Neural Network (CNN) with a UNet-inspired architecture to convert portions of band-limited shot gathers from 5-15 Hz to 0-5 Hz band. In the synthetic experiment, we train the network on 192x192 patches of wavefields simulated for different cross-sections of the elastic SEAM Arid model with free-surface. Then, we test the network on unseen shot gathers from the same model to demonstrate the viability of the approach. The results show promise for future field data applications.
  • Elastic near-surface model estimation from full waveforms by deep learning

    Kazei, Vladimir; Ovcharenko, Oleg; Plotnitskii, Pavel; Peter, Daniel; Alkhalifah, Tariq Ali; Silvestrov, Ilya; Bakulin, Andrey; Zwartjes, Paul (Society of Exploration Geophysicists, 2020-09-30) [Conference Paper]
    Strong near-surface heterogeneity poses a major challenge for seismic imaging of deep targets in arid environments. Inspired by the initial success of deep learning applications to inverse problems, we investigate the possibility of building nearsurface models directly from raw elastic data including surface and body waves in arid conditions. Namely, we train a convolutional neural network to map the data into the model directly in a supervised way on a part SEAM Arid synthetic dataset and evaluate its performance on a different part of the same dataset. The main feature of our approach is that we estimate the model as a set of 1D vertical velocity profiles, utilizing relevant subsets of input data from neighboring locations. This effectively reduces the data and label spaces for a more practical neural network application.
  • Explicit coupling of acoustic and elastic wave propagation in finite-difference simulations

    Gao, Longfei; Keyes, David E. (Geophysics, Society of Exploration Geophysicists, 2020-09-30) [Article]
    We present a mechanism to explicitly couple the finite-difference discretizations of 2D acoustic and isotropic elastic-wave systems that are separated by straight interfaces. Such coupled simulations allow for the application of the elastic model to geological regions that are of special interest for seismic exploration studies (e.g., the areas surrounding salt bodies), with the computationally more tractable acoustic model still being applied in the background regions. Specifically, the acoustic wave system is expressed in terms of velocity and pressure while the elastic wave system is expressed in terms of velocity and stress. Both systems are posed in first-order forms and are discretized on staggered grids. Special variants of the standard finite-difference operators, namely, operators that possess the summation-by-parts property, are used for the approximation of spatial derivatives. Penalty terms, which are also referred to as the simultaneous approximation terms, are designed to weakly impose the elastic-acoustic interface conditions in the finite-difference discretizations and couple the elastic and acoustic wave simulations together. With the presented mechanism, we are able to perform the coupled elastic-acoustic wave simulations stably and accurately. Moreover, it is shown that the energy-conserving property in the continuous systems can be preserved in the discretized systems with carefully designed penalty terms.
  • Preconditioned BFGS-based Uncertainty Quantification in elastic Full Waveform Inversion

    Liu, Qiancheng; Beller, Stephen; Lei, Wenjie; Peter, Daniel; Tromp, Jeroen (arXiv, 2020-09-26) [Preprint]
    Full Waveform Inversion (FWI) has become an essential technique for mapping geophysical subsurface structures. However, proper uncertainty quantification is often lacking in current applications. In theory, uncertainty quantification is related to the inverse Hessian (or the posterior covariance matrix). Even for common geophysical inverse problems is beyond the computational and storage capacities of the largest high-performance computing systems. In this study, we amend the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm to perform uncertainty quantification for large-scale applications. For seismic inverse problems, the limited-memory BFGS (L-BFGS) method prevails as the most efficient quasi-Newton method. We aim to further augment it to obtain an approximate inverse Hessian for uncertainty quantification in FWI. To facilitate retrieval of the inverse Hessian, we combine BFGS (essentially a full-history L-BFGS) with randomized singular value decomposition to determine a low-rank approximation of the inverse Hessian. Setting the rank number equal to the number of iterations makes this solution efficient and memory-affordable even for large-scale problems. Furthermore, based on the Gauss-Newton method, we formulate different initial, diagonal Hessian matrices as preconditioners for the inverse scheme and compare their performances in elastic FWI applications. We highlight our approach with the elastic Marmousi benchmark, demonstrating the applicability of preconditioned BFGS for large-scale FWI and uncertainty quantification.
  • High-order accurate entropy-stable discontinuous collocated Galerkin methods with the summation-by-parts property for compressible CFD frameworks: Scalable SSDC algorithms and flow solver

    Parsani, Matteo; Boukharfane, Radouan; Nolasco, Irving Reyna; Del Rey Fernández, David C.; Zampini, Stefano; Hadri, Bilel; Dalcin, Lisandro (Journal of Computational Physics, Elsevier BV, 2020-09-22) [Article]
    This work reports on the performances of a fully-discrete hp-adaptive entropy stable discontinuous collocated Galerkin method for the compressible Naiver–Stokes equations. The resulting code framework is denoted by SSDC, the first S for entropy, the second for stable, and DC for discontinuous collocated. The method is endowed with the summation-by-parts property, allows for arbitrary spatial and temporal order, and is implemented in an unstructured high performance solver. The considered class of fully-discrete algorithms are systematically designed with mimetic and structure preserving properties that allow the transfer of continuous proofs to the fully discrete setting. Our goal is to provide numerical evidence of the adequacy and maturity of these high-order methods as potential base schemes for the next generation of unstructured computational fluid dynamics tools. We provide a series of test cases of increased difficulty, ranging from non-smooth to turbulent flows, in order to evaluate the numerical performance of the algorithms. Results on weak and strong scaling of the distributed memory implementation demonstrate that the parallel SSDC solver can scale efficiently over 100,000 processes.
  • Effects of Composition Heterogeneities on Flame Kernel Propagation: A DNS Study

    Er-raiy, Aimad; Boukharfane, Radouan; Parsani, Matteo (Fluids, MDPI AG, 2020-09-04) [Article]
    In this study, a new set of direct numerical simulations is generated and used to examine the influence of mixture composition heterogeneities on the propagation of a premixed iso-octane/air spherical turbulent flame, with a representative chemical description. The dynamic effects of both turbulence and combustion heterogeneities are considered, and their competition is assessed. The results of the turbulent homogeneous case are compared with those of heterogeneous cases which are characterized by multiple stratification length scales and segregation rates in the regime of a wrinkled flame. The comparison reveals that stratification does not alter turbulent flame behaviors such as the preferential alignment of the convex flame front with the direction of the compression. However, we find that the overall flame front propagation is slower in the presence of heterogeneities because of the differential on speed propagation. Furthermore, analysis of different displacement speed components is performed by taking multi-species formalism into account. This analysis shows that the global flame propagation front slows down due to the heterogeneities caused by the reaction mechanism and the differential diffusion accompanied by flame surface density variations. Quantification of the effects of each of these mechanisms shows that their intensity increases with the increase in stratification’s length scale and segregation rate.
  • Maximizing I/O bandwidth for reverse time migration on heterogeneous large-scale systems

    Alturkestani, Tariq Lutfallah Mohammed; Ltaief, Hatem; Keyes, David E. (Springer International Publishing, 2020-08-18) [Conference Paper]
    Reverse Time Migration (RTM) is an important scientific application for oil and gas exploration. The 3D RTM simulation generates terabytes of intermediate data that does not fit in main memory. In particular, RTM has two successive computational phases, i.e., the forward modeling and the backward propagation, that necessitate to write and then to read the state of the computed solution grid at specific time steps of the time integration. Advances in memory architecture have made it feasible and affordable to integrate hierarchical storage media on large-scale systems, starting from the traditional Parallel File Systems (PFS) to intermediate fast disk technologies (e.g., node-local and remote-shared Burst Buffer) and up to CPU main memory. To address the trend of heterogeneous HPC systems deployment, we introduce an extension to our Multilayer Buffer System (MLBS) framework to further maximize RTM I/O bandwidth in presence of GPU hardware accelerators. The main idea is to leverage the GPU’s High Bandwidth Memory (HBM) as an additional storage media layer. The objective of MLBS is ultimately to hide the application’s I/O overhead by enabling a buffering mechanism operating across all the hierarchical storage media layers. MLBS is therefore able to sustain the I/O bandwidth at each storage media layer. By asynchronously performing expensive I/O operations and creating opportunities for overlapping data motion with computations, MLBS may transform the original I/O bound behavior of the RTM application into a compute-bound regime. In fact, the prefetching strategy of MLBS allows the RTM application to believe that it has access to a larger memory capacity on the GPU, while transparently performing the necessary housekeeping across the storage layers. We demonstrate the effectiveness of MLBS on the Summit supercomputer using 2048 compute nodes equipped with a total of 12288 GPUs by achieving up to 1.4X performance speedup compared to the reference PFS-based RTM implementation for large 3D solution grid.
  • Characterization of pressure fluctuations within a controlled-diffusion blade boundary layer using the equilibrium wall-modelled LES.

    Boukharfane, Radouan; Parsani, Matteo; Bodart, Julien (Scientific reports, Springer Science and Business Media LLC, 2020-07-29) [Article]
    In this study, the generation of airfoil trailing edge broadband noise that arises from the interaction of turbulent boundary layer with the airfoil trailing edge is investigated. The primary objectives of this work are: (i) to apply a wall-modelled large-eddy simulation (WMLES) approach to predict the flow of air passing a controlled-diffusion blade, and (ii) to study the blade broadband noise that is generated from the interaction of a turbulent boundary layer with a lifting surface trailing edge. This study is carried out for two values of the Mach number, [Formula: see text] and 0.5, two values of the chord Reynolds number, [Formula: see text] and [Formula: see text], and two angles of attack, AoA [Formula: see text] and [Formula: see text]. To examine the influence of the grid resolution on aerodynamic and aeroacoustic quantities, we compare our results with experimental data available in the literature. We also compare our results with two in-house numerical solutions generated from two wall-resolved LES (WRLES) calculations, one of which has a DNS-like resolution. We show that WMLES accurately predicts the mean pressure coefficient distribution, velocity statistics (including the mean velocity), and the traces of Reynolds tensor components. Furthermore, we observe that the instantaneous flow structures computed by the WMLES resemble those found in the reference WMLES database, except near the leading edge region. Some of the differences observed in these structures are associated with tripping and the transition to a turbulence mechanism near the leading edge, which are significantly affected by the grid resolution. The aeroacoustic noise calculations indicate that the power spectral density profiles obtained using the WMLES compare well with the experimental data.
  • MFEM: A modular finite element methods library

    Anderson, Robert; Andrej, Julian; Barker, Andrew; Bramwell, Jamie; Camier, Jean Sylvain; Cerveny, Jakub; Dobrev, Veselin; Dudouit, Yohann; Fisher, Aaron; Kolev, Tzanio; Pazner, Will; Stowell, Mark; Tomov, Vladimir; Akkerman, Ido; Dahm, Johann; Medina, David; Zampini, Stefano (Computers and Mathematics with Applications, Elsevier BV, 2020-07-11) [Article]
    MFEM is an open-source, lightweight, flexible and scalable C++ library for modular finite element methods that features arbitrary high-order finite element meshes and spaces, support for a wide variety of discretization approaches and emphasis on usability, portability, and high-performance computing efficiency. MFEM's goal is to provide application scientists with access to cutting-edge algorithms for high-order finite element meshing, discretizations and linear solvers, while enabling researchers to quickly and easily develop and test new algorithms in very general, fully unstructured, high-order, parallel and GPU-accelerated settings. In this paper we describe the underlying algorithms and finite element abstractions provided by MFEM, discuss the software implementation, and illustrate various applications of the library.
  • Fully discrete explicit locally entropy-stable schemes for the compressible Euler and Navier–Stokes equations

    Ranocha, Hendrik; Dalcin, Lisandro; Parsani, Matteo (Computers & Mathematics with Applications, Elsevier BV, 2020-07-09) [Article]
    Recently, relaxation methods have been developed to guarantee the preservation of a single global functional of the solution of an ordinary differential equation. Here, we generalize this approach to guarantee local entropy inequalities for finitely many convex functionals (entropies) and apply the resulting methods to the compressible Euler and Navier–Stokes equations. Based on the unstructured hp-adaptive SSDC framework of entropy conservative or dissipative semidiscretizations using summation-by-parts and simultaneous-approximation-term operators, we develop the first discretizations for compressible computational fluid dynamics that are primary conservative, locally entropy stable in the fully discrete sense under a usual CFL condition, explicit except for the parallelizable solution of a single scalar equation per element, and arbitrarily highorder accurate in space and time. We demonstrate the accuracy and the robustness of the fully-discrete explicit locally entropy-stable solver for a set of test cases of increasing complexity.
  • Extreme-Scale Task-Based Cholesky Factorization Toward Climate and Weather Prediction Applications

    Cao, Qinglei; Pei, Yu; Akbudak, Kadir; Mikhalev, Aleksandr; Bosilca, George; Ltaief, Hatem; Keyes, David E.; Dongarra, Jack (ACM, 2020-06-18) [Conference Paper]
    Climate and weather can be predicted statistically via geospatial Maximum Likelihood Estimates (MLE), as an alternative to running large ensembles of forward models. The MLE-based iterative optimization procedure requires the solving of large-scale linear systems that performs a Cholesky factorization on a symmetric positive-definite covariance matrix—a demanding dense factorization in terms of memory footprint and computation. We propose a novel solution to this problem: at the mathematical level, we reduce the computational requirement by exploiting the data sparsity structure of the matrix off-diagonal tiles by means of low-rank approximations; and, at the programming-paradigm level, we integrate PaRSEC, a dynamic, task-based runtime to reach unparalleled levels of efficiency for solving extreme-scale linear algebra matrix operations. The resulting solution leverages fine-grained computations to facilitate asynchronous execution while providing a flexible data distribution to mitigate load imbalance. Performance results are reported using 3D synthetic datasets up to 42M geospatial locations on 130, 000 cores, which represent a cornerstone toward fast and accurate predictions of environmental applications.

View more