### Recent Submissions

• #### Triple Decomposition of Velocity Gradient Tensor in Compressible Turbulence

(Fluids, MDPI AG, 2021-03-02) [Article]
The decomposition of the local motion of a fluid into straining, shearing, and rigid-body rotation is examined in this work for a compressible isotropic turbulence by means of direct numerical simulations. The triple decomposition is closely associated with a basic reference frame (BRF), in which the extraction of the biasing effect of shear is maximized. In this study, a new computational and inexpensive procedure is proposed to identify the BRF for a three-dimensional flow field. In addition, the influence of compressibility effects on some statistical properties of the turbulent structures is addressed. The direct numerical simulations are carried out with a Reynolds number that is based on the Taylor micro-scale of Reλ=100 for various turbulent Mach numbers that range from Mat=0.12 to Mat=0.89. The DNS database is generated with an improved seventh-order accurate weighted essentially non-oscillatory scheme to discretize the non-linear advective terms, and an eighth-order accurate centered finite difference scheme is retained for the diffusive terms. One of the major findings of this analysis is that regions featuring strong rigid-body rotations or straining motions are highly spatially intermittent, while most of the flow regions exhibit moderately strong shearing motions in the absence of rigid-body rotations and straining motions. The majority of compressibility effects can be estimated if the scaling laws in the case of compressible turbulence are rescaled by only considering the solenoidal contributions.
• #### Parallel Hierarchical Matrix Technique to Approximate Large Covariance Matrices, Likelihood Functions and Parameter Identi fication

(2021-03-01) [Presentation]
We develop the HLIBCov package, which is using parallel hierarchical (H-) matrices to: 1) Approximate large dense inhomogeneous covariance matrices with a log-linear computational cost and storage requirement. 2) Compute matrix-vector product, Cholesky factorization and inverse with a log-linear complexity. 3) Identify unknown parameters of the covariance function (variance, smoothness, and covariance length). These unknown parameters are estimated by maximizing the joint Gaussian log-likelihood function. To demonstrate the numerical performance, we identify three unknown parameters in an example with 2,000,000 locations on a PC-desktop.
• #### The PetscSF Scalable Communication Layer

(arXiv, 2021-02-25) [Preprint]
PetscSF, the communication component of the Portable, Extensible Toolkit for Scientific Computation (PETSc), is being used to gradually replace the direct MPI calls in the PETSc library. PetscSF provides a simple application programming interface (API) for managing common communication patterns in scientific computations by using a star-forest graph representation. PetscSF supports several implementations whose selection is based on the characteristics of the application or the target architecture. An efficient and portable model for network and intra-node communication is essential for implementing large-scale applications. The Message Passing Interface, which has been the de facto standard for distributed memory systems, has developed into a large complex API that does not yet provide high performance on the emerging heterogeneous CPU-GPU-based exascale systems. In this paper, we discuss the design of PetscSF, how it can overcome some difficulties of working directly with MPI with GPUs, and we demonstrate its performance, scalability, and novel features.
• #### KSPHPDDM and PCHPDDM: Extending PETSc with advanced Krylov methods and robust multilevel overlapping Schwarz preconditioners

(Computers and Mathematics with Applications, Elsevier BV, 2021-01-22) [Article]
Contemporary applications in computational science and engineering often require the solution of linear systems which may be of different sizes, shapes, and structures. The goal of this paper is to explain how two libraries, PETSc and HPDDM, have been interfaced in order to offer end-users robust overlapping Schwarz preconditioners and advanced Krylov methods featuring recycling and the ability to deal with multiple right-hand sides. The flexibility of the implementation is showcased and explained with minimalist, easy-to-run, and reproducible examples, to ease the integration of these algorithms into more advanced frameworks. The examples provided cover applications from eigenanalysis, elasticity, combustion, and electromagnetism.
• #### Exploiting low-rank covariance structures for computing high-dimensional normal and Student-t probabilities

(Statistics and Computing, Springer Science and Business Media LLC, 2021-01-12) [Article]
We present a preconditioned Monte Carlo method for computing high-dimensional multivariate normal and Student-t probabilities arising in spatial statistics. The approach combines a tile-low-rank representation of covariance matrices with a block-reordering scheme for efficient quasi-Monte Carlo simulation. The tile-low-rank representation decomposes the high-dimensional problem into many diagonal-block-size problems and low-rank connections. The block-reordering scheme reorders between and within the diagonal blocks to reduce the impact of integration variables from right to left, thus improving the Monte Carlo convergence rate. Simulations up to dimension 65,536 suggest that the new method can improve the run time by an order of magnitude compared with the hierarchical quasi-Monte Carlo method and two orders of magnitude compared with the dense quasi-Monte Carlo method. Our method also forms a strong substitute for the approximate conditioning methods as a more robust estimation with error guarantees. An application study to wind stochastic generators is provided to illustrate that the new computational method makes the maximum likelihood estimation feasible for high-dimensional skew-normal random fields.
• #### Entropy–Stable No–Slip Wall Boundary Conditions for the Eulerian Model for Viscous and Heat Conducting Compressible Flows

(American Institute of Aeronautics and Astronautics, 2021-01-11) [Conference Paper]
Nonlinear (entropy) stability analysis is used to derive entropy–stable no–slip wall boundary conditions at the continuous and semi–discrete levels for the Eulerian model proposed by Svärd in 2018 (Physica A: Statistical Mechanics and its Applications, 2018). The spatial discretization is based on discontinuous Galerkin summation-by-parts operators of any order for unstructured grids. We provide a set of two–dimensional numerical results for laminar and turbulent flows simulated with both the Eulerian and classical Navier–Stokes models. These results are computed with a high-performance ℎ–entropy–stable solver, that also features explicit and implicit entropy–stable time integration schemes.
• #### A Comparison of Parallel Profiling Tools for Programs utilizing the FFT

(ACM, 2021-01-06) [Conference Paper]
Performance monitoring is an important component of code optimization. Performance monitoring is also important for the beginning user, but can be difficult to configure appropriately. The overhead of the performance monitoring tools Craypat, FPMP, mpiP, Scalasca and TAU, are measured using default configurations likely to be choosen by a novice user and shown to be small when profiling Fast Fourier Transform based solvers for the Klein Gordon equation based on 2decomp&FFT and on FFTE. Performance measurements help explain that despite FFTE having a more efficient parallel algorithm, it is not always faster than 2decom&FFT because the complied single core FFT is not as fast as that in FFTW which is used in 2decomp&FFT.
• #### A robust explicit asynchronous time integration method for hyperbolic conservation laws

(American Institute of Aeronautics and Astronautics, 2021-01-04) [Conference Paper]
Hyperbolic conservation laws are of great practical importance as they model diverse multiscale phenomena (highly turbulent flows, combustion, etc.). To solve these equations, explicit time integration methods are used, for which the Courant–Friedrichs–Lewy (CFL) condition has to be satisfied everywhere in the computational domain. Therefore, the global time step will be dictated by the cells that require the smallest time step, resulting in an unnecessarily expensive computational approach. To overcome this difficulty, a conservative asynchronous method for explicit time integration schemes is developed and implemented for flux-based spatial schemes. The concept of the developed method is using dynamically variable time steps for classes of cells, while ensuring the time coherence of the time integration and flux conservation. In this paper, we present the classification of computational cells in classes based on their local stability criterion. Two versions of the (asynchronous) synchronization sequence are proposed, which are designed regardless of equation model and spatial scheme. In the context of hyperbolic conservation laws, we numerically investigate the conservation, accuracy and stability properties of the proposed method for one-dimensional linear convection and Euler equations. We show that the proposed asynchronous approach can be more accurate than its synchronous counterpart through the limitation of the diffusion errors by locally increasing the CFL number and thus, the local time step.
• #### Optimized explicit runge–kutta schemes for entropy stable discontinuous collocated methods applied to the euler and navier–stokes equations

(American Institute of Aeronautics and Astronautics, 2021-01-04) [Conference Paper]
In this work, we design a new set of optimized explicit Runge–Kutta schemes for the integration of systems of ordinary differential equations arising from the spatial discretization of wave propagation problems with entropy stable collocated discontinuous Galerkin methods. The optimization of the new time integration schemes is based on the spectrum of the discrete spatial operator for the advection equation. To demonstrate the efficiency and accuracy of the new schemes compared to some widely used classic explicit Runge–Kutta methods, we report the wall-clock time versus the error for the simulation of the two-dimensional advection equation and the propagation of an isentropic vortex with the compressible Euler equations. The efficiency and robustness of the proposed optimized schemes for more complex flow problems are presented for the three-dimensional Taylor–Green vortex at a Reynolds number of Re = 1.6 × 103 and Mach number Ma = 0.1, and the flow past two identical spheres in tandem at a Reynolds number of Re = 3.9 × 103 and Mach number Ma = 0.1.
• #### Effects of differential diffusion and stratification characteristic length-scale on the propagation of a spherical methane-air flame kernel

(American Institute of Aeronautics and Astronautics, 2021-01-04) [Conference Paper]
Early flame kernel development and propagation in globally lean stratified fuel--air mixtures is of importance in various practical devices such as internal combustion engines. In this work, three-dimensional direct numerical simulation (DNS) is used to study the influence of the differential diffusion effects in a globally lean methane--air mixtures in presence of mixture heterogeneities with the goal of understanding the flame kernel behavior in such conditions. The DNS typical configuration corresponds to a homogeneous isotropic flow with an expanding spherical flame kernel. The local forced ignition of the kernel is performed by appending as source term in the sensible enthalpy transport equation that emulates spark ignition by energy deposit for a prescribed duration. The combustion chemistry is described with a skeletal methane-air mechanism, which i) features 14 species and 38 reactions, and ii) uses a multicomponent approach to evaluate transport coefficients. To assess the joint effects of differential diffusion and the stratification characteristic length-scale $L_{\Phi}$ on the flame kernel development, we considered cases with constant (unitary) and variable fuel Lewis number, both with different values for $L_{\Phi}$.
• #### Simulation of Turbulent Flows Using a Fully Discrete Explicit hp-nonconforming Entropy Stable Solver of Any Order on Unstructured Grids

(American Institute of Aeronautics and Astronautics, 2021-01-04) [Conference Paper]
We report the numerical solution of two challenging turbulent flow test cases simulated with the SSDC framework, a compressible, fully discrete hp-nonconforming entropy stable solver based on the summation-by-parts discontinuous collocation Galerkin discretizations and the relaxation Runge—Kutta methods. The algorithms at the core of the solver are systematically designed with mimetic and structure-preserving techniques that transfer fundamental properties from the continuous level to the discrete one. We aim at providing numerical evidence of the robustness and maturity of these entropy stable scale-resolving methods for the new generation of adaptive unstructured computational fluid dynamics tools. The two selected turbulent flows are i) the flow past two spheres in tandem at a Reynolds number based on the sphere diameter of ReD = 3.9 × 103 and 104, and a Mach number of Ma∞ = 0.1, and ii) the NASA junction flow experiment at a Reynolds number based on the crank chord length of Reℓ = 2.4×106 and Ma∞ = 0.189.
• #### Compressibility effects on homogeneous isotropic turbulence using Schur decomposition of the velocity gradient tensor.

(American Institute of Aeronautics and Astronautics, 2021-01-04) [Conference Paper]
The study of compressibility effects on the dynamics and the structure of turbulence is an important, but difficult, topic in turbulence modeling. Taking advantage of a recently proposed Schur decomposition approach (Keylock, C. J., The Schur decomposition of the velocity gradient tensor for turbulent flows, Journal of Fluid Mechanics, 2018) to decompose the velocity gradient tensor into its normal and non-normal parts, here we evaluate the influence of the compressibility on some statistical properties of the turbulent structures. We perform a set of direct numerical simulations of decaying compressible turbulence at six turbulent Mach numbers between Mt = 0.12 and Mt = 0.89 and a Reynolds number based on the Taylor micro-scale of Ret = 100. All the simulations have been carried out using an improved seventh-order accurate WENO scheme to discretize the non-linear advective terms and an eight-order accurate centered finite difference scheme is retained for the diffusive terms. In the double decomposition, the normal parts of the velocity gradient tensor (represented by the eigenvalues) are separated explicitly from non-normal components. The two-dimensional space defined by the second and third invariants of the velocity gradient tensor is subdivided into six regions and the contribution of each regional term to the Schur decomposition of the velocity gradient tensor is analyzed. Our preliminary findings show the difficulty of understanding the non-local effects without taking into account both the normal contribution (represented by the eigenvalues) and the non-normal component computed with of the Schur decomposition.
• #### Sum of Kronecker products representation and its Cholesky factorization for spatial covariance matrices from large grids

(Computational Statistics & Data Analysis, Elsevier BV, 2021-01) [Article]
The sum of Kronecker products (SKP) representation for spatial covariance matrices from gridded observations and a corresponding adaptive-cross-approximation-based framework for building the Kronecker factors are investigated. The time cost for constructing an -dimensional covariance matrix is and the total memory footprint is , where is the number of Kronecker factors. The memory footprint under the SKP representation is compared with that under the hierarchical representation and found to be one order of magnitude smaller. A Cholesky factorization algorithm under the SKP representation is proposed and shown to factorize a one-million dimensional covariance matrix in under 600 seconds on a standard scientific workstation. With the computed Cholesky factor, simulations of Gaussian random fields in one million dimensions can be achieved at a low cost for a wide range of spatial covariance functions.
• #### Energy-conserving 3D elastic wave simulation with finite difference discretization on staggered grids with nonconforming interfaces

(arXiv, 2020-12-27) [Preprint]
In this work, we describe an approach to stably simulate the 3D isotropic elastic wave propagation using finite difference discretization on staggered grids with nonconforming interfaces. Specifically, we consider simulation domains composed of layers of uniform grids with different grid spacings, separated by planar interfaces. This discretization setting is motivated by the observation that wave speeds of earth media tend to increase with depth due to sedimentation and consolidation processes. We demonstrate that the layer-wise finite difference discretization approach has the potential to significantly reduce the simulation cost, compared to its counterpart that uses holistically uniform grids. Such discretizations are enabled by summation-by-parts finite difference operators, which are standard finite difference operators with special adaptations near boundaries or interfaces, and simultaneous approximation terms, which are penalty terms appended to the discretized system to weakly impose boundary or interface conditions. Combined with specially designed interpolation operators, the discretized system is shown to preserve the energy-conserving property of the continuous elastic wave equation, and a fortiori ensure the stability of the simulation. Numerical examples are presented to corroborate these analytical developments.
• #### Validating advanced wavefront control techniques on the SCExAO testbed/instrument

(SPIE, 2020-12-13) [Conference Paper]
The Subaru Coronagraphic Extreme Adaptive Optics (SCExAO) serves both a science instrument in operation, and a prototyping platform for integrating and validating advanced wavefront control techniques. It provides a modular hardware and software environment optimized for flexible prototyping, reducing the time from concept formulation to on-sky operation and validation. This approach also enables external research group to deploy and test new hardware and algorithms. The hardware architecture allows for multiple subsystems to run concurrently, sharing starlight by means of dichroics. The multiplexing lends itself to running parallel experiments simultaneously, and developing sensor fusion approaches for increased wavefront sensing sensitivity and reliability. Thanks to a modular realtime control software architecture designed around the CACAO package, users can deploy WFS/C routines with full low-latency access to all cameras data streams. Algorithms can easily be shared with other cacao-based AO systems at Magellan (MagAO-X) and Keck. We highlight recent achievements and ongoing activities that are particularly relevant to the development of high contrast imaging instruments for future large ground-based telescopes (ELT, TMT, GMT) and space telescopes (HabEx, LUVOIR). These include predictive control and sensor fusion, PSF reconstruction from AO telemetry, integrated coronagraph/WFS development, focal plane speckle control with photon counting MKIDS camera, and fiber interferometry. We also describe upcoming upgrades to the WFS/C architecture: a new 64x64 actuator first stage DM, deployment of a beam switcher for concurrent operation of SCExAO with other science instruments, and the ULTIMATE upgrade including deployment of multiple LGS WFSs and an adaptive secondary mirror.
• #### Predictive learn and apply: MAVIS application-apply

(SPIE, 2020-12-13) [Conference Paper, Poster, Presentation]
The Learn and Apply tomographic reconstructor coupled with the pseudo open-loop control scheme shows promising results in simulation for multi-conjugate adaptive optics systems. We motivate, derive, and demonstrate the inclusion of a predictive step in the Learn and Apply tomographic reconstructor based on frozen-flow turbulence assumption. The addition of this predictive step provides an additional gain in performance, especially at larger wave-front sensor exposure periods, with no increase of online computational burden. We provide results using end-to-end numerical simulations for a multi-conjugate adaptive optics system for an 8m telescope based on the MAVIS system design.
• #### Adaptive optics real-time control with the compute and control for adaptive optics (Cacao) software framework

(SPIE, 2020-12-13) [Conference Paper]
The Compute and control for adaptive optics (Cacao) is an open source software package providing a flexible framework for deploying real-time adaptive optics control. Cacao leverages CPU and GPU computational resources to meet the demands of modern AO systems with thousands of degrees of freedom running at kHz speed or faster. Cacao adopts a modular approach, where individual processes operate over a standardized data stream stucture. Advanced control loops integrating multiple sensors and DMs are built by assembling multiple such processes. High-level constructs are provided for sensor fusion, where multiple sensors can drive a single physical DM. The common data stream format is at the heart of Cacao, holding data content in shared memory and timing information as semaphores. Cacao is currently in operation on the general-purpose Subaru AO188 system, the SCExAO and MagAOX extreme-AO instruments. Its data stream format has been adopted at Keck, within the COMPASS AO simulation tool, and in the COSMIC modular RTC platform. We describe Cacao's software architecture and toolset, and provide simple examples for users to build a real-time control loop. Advanced features are discussed, including on-sky results and experience with predictive control and sensor fusion. Future development plans will include leveraging machine learning algorithms for real-time PSF calibration and more optimal AO control, for which early on-sky demonstration will be presented.
• #### Seismic Velocities Distribution in a 3D Mantle: Implications for InSight Measurements

(Wiley, 2020-11-24) [Preprint]
• #### A Multilayer Nonlinear Elimination Preconditioned Inexact Newton Method for Steady-State Incompressible Flow Problems in Three Dimensions

(SIAM Journal on Scientific Computing, Society for Industrial & Applied Mathematics (SIAM), 2020-11-24) [Article]
We develop a multilayer nonlinear elimination preconditioned inexact Newton method for a nonlinear algebraic system of equations, and a target application is the three-dimensional steady-state incompressible Navier--Stokes equations at high Reynolds numbers. Nonlinear steadystate problems are often more difficult to solve than time-dependent problems because the Jacobian matrix is less diagonally dominant, and a good initial guess from the previous time step is not available. For such problems, Newton-like methods may suffer from slow convergence or stagnation even with globalization techniques such as line search. In this paper, we introduce a cascadic multilayer nonlinear elimination approach based on feedback from intermediate solutions to improve the convergence of Newton iteration. Numerical experiments show that the proposed algorithm is superior to the classical inexact Newton method and other single layer nonlinear elimination approaches in terms of the robustness and efficiency. Using the proposed nonlinear preconditioner with a highly parallel domain decomposition framework, we demonstrate that steady solutions of the Navier--Stokes equations with Reynolds numbers as large as 7,500 can be obtained for the lid-driven cavity flow problem in three dimensions without the use of any continuation methods.
• #### NodePy: A package for the analysis of numerical ODE solvers

(Journal of Open Source Software, The Open Journal, 2020-11-17) [Article]
Ordinary differential equations (ODEs) are used to model a vast range of physical and other phenomena. They also arise in the discretization of partial differential equations. In most cases, solutions of differential equations must be approximated by numerical methods. The study of the properties of numerical methods for ODEs comprises an important and large body of knowledge. NodePy (available from https://github.com/ketch/nodepy, with documentation at https://nodepy.readthedocs.io/en/latest/) is a software package for designing and studying the properties of numerical ODE solvers. For the most important classes of methods, NodePy can automatically assess their stability, accuracy, and many other properties. NodePy has also been used as a catalog of coefficients for time integration methods in PDE solver codes.