### Recent Submissions

• #### A Reduced-order Variational Bayesian Approach for Efficient Subsurface Imaging

(Geophysical Journal International, Oxford University Press (OUP), 2021-12-15) [Article]
This work considers the reconstruction of a subsurface model from seismic observations, which is known to be a high-dimensional and ill-posed inverse problem. Two approaches are combined to tackle this problem: the Discrete Cosine Transform (DCT) approach, used in the forward modeling step, and the Variational Bayesian (VB) approach, used in the inverse reconstruction step. VB can provide not only point estimates but also closed forms of the full posterior probability distributions. To efficiently compute such estimates of the full joint posterior distributions of large-scale seismic inverse problems, we resort to a DCT order-reduction scheme with a VB approximation of the posteriors, avoiding the need for costly Bayesian sampling methods. More specifically, we first reduce the model parameters through truncation of their DCT coefficients. This helps regularizing our seismic inverse problem and alleviates its computational complexity. Then, we apply a VB inference in the reduced-DCT space to estimate the dominant (retained) DCT coefficients together with the variance of the observational noise. We also present an efficient implementation of the derived VB-based algorithm for further cost reduction. The performances of the proposed scheme are evaluated through extensive numerical experiments for both linear and non-linear forward models. In the former, the subsurface reflectivity model was reconstructed at a comparable estimation accuracy as the optimal weighted-least squares solution. In the latter, the main structural features of the squared slowness model were well reconstructed.
• #### SphGLLTools: A toolbox for visualization of large seismic model files based on 3D spectral-element meshes

(Computers & Geosciences, Elsevier BV, 2021-12) [Article]
Adjoint tomography, a full-waveform inversion technique based on 3D wave simulations, has become a commonly used tool in passive-source seismology, drawing on advances in computational power and numerical methods. From global to reservoir scales, seismic models can iteratively be updated in adjoint inversions by extracting information from full seismic waveforms. Seismic models are typically constructed on the numerical mesh used for wave simulations. Thus the size of model files depends on the minimum resolvable period achieved in simulations controlled by the numerical mesh (the higher the mesh resolution, the larger the model and mesh files). This is specifically a concern for recent global-scale adjoint tomographic models where the size and format of numerical meshes pose challenges for model visualization, analysis, interpretation, and sharing model files. Here, we present SphGLLTools, an open-source toolbox that intends to diminish these challenges by expanding global adjoint models onto spherical harmonic functions, which are widely used in global seismology. Our tools are initially designed for spectral-element meshes used in recent global adjoint tomography studies. SphGLLTools facilitate many commonplace tasks for model visualization and analysis, including spherical harmonic expansion of models sampled on spectral-element meshes together with associated tools for easy sharing, visualization, and interpretation of large-scale seismic model files. All the developed routines are accompanied by user instructions and are available through GitHub. For transparency, reproducibility, and educational purposes, we also include Colab notebooks, which provide an intuitive and comprehensive review of the principles and methods for spectral-element meshes, spherical harmonic expansion, and other model analysis tools.
• #### Optimized Runge-Kutta Methods with Automatic Step Size Control for Compressible Computational Fluid Dynamics

(Communications on Applied Mathematics and Computation, Springer Science and Business Media LLC, 2021-11-22) [Article]
AbstractWe develop error-control based time integration algorithms for compressible fluid dynamics (CFD) applications and show that they are efficient and robust in both the accuracy-limited and stability-limited regime. Focusing on discontinuous spectral element semidiscretizations, we design new controllers for existing methods and for some new embedded Runge-Kutta pairs. We demonstrate the importance of choosing adequate controller parameters and provide a means to obtain these in practice. We compare a wide range of error-control-based methods, along with the common approach in which step size control is based on the Courant-Friedrichs-Lewy (CFL) number. The optimized methods give improved performance and naturally adopt a step size close to the maximum stable CFL number at loose tolerances, while additionally providing control of the temporal error at tighter tolerances. The numerical examples include challenging industrial CFD applications.
• #### The influence of depth-varying elastic properties of the upper plate on megathrust earthquake rupture dynamics and tsunamigenesis

(Journal of Geophysical Research: Solid Earth, American Geophysical Union (AGU), 2021-11-15) [Article]
Megathrust earthquakes are strongly influenced by the elastic properties of rocks surrounding the fault. In contrast to friction, these properties can be derived in situ from geophysical measurements along the seismogenic zone. However, they are often overestimated in numerical simulations, particularly in the shallow megathrust. Here we explore the influence that realistic depth-varying upper-plate elastic properties along the megathrust have on earthquake rupture dynamics and tsunamigenesis using 3D dynamic rupture and tsunami simulations. We compare results from three subduction zone scenarios with homogeneous and heterogeneous elastic media, and bimaterial fault. Elastic properties in the heterogeneous model follow a realistic depth-distribution derived from controlled-source tomography models of subduction zones. We assume the same friction properties for all scenarios. Simulations in the heterogeneous and homogeneous models show that rigidity depth variations explain the depth-varying behavior of slip, slip rate, frequency content, and rupture time. The depth-varying behavior of slip, frequency content, and rupture duration quantitatively agree with previous predictions based on worldwide data compilations, explaining the main depth-dependent traits of tsunami earthquakes and large shallow megathrust earthquakes. Large slip, slow rupture and slip rate amplification in bimaterial simulations are largely controlled by elastic rock properties of the most compliant side of the fault, which in subduction zones is the upper plate. Large shallow slip and trenchward increasing upper-plate compliance of the heterogeneous model lead to the largest co-seismic seafloor deformation and tsunami amplitude. This highlights the importance of considering realistic upper-plate rigidity variations to properly assess the tsunamigenic potential of megathrust earthquakes.
• #### A direct numerical simulation study of skewed three-dimensional spatially evolving compressible mixing layer

(Physics of Fluids, AIP Publishing, 2021-11) [Article]
The turbulent flow of spatially developing and high-speed hydrogen/air mixing layers subject to small skew angle f is systematically investigated by means of direct numerical simulation. The present database features both detailed chemistry and detailed transport (i.e., Soret, Dufour, and bulk viscosity effects). The angle f measures the misalignment of the two asymptotic streams of fluid, whose interaction creates the turbulent mixing region. Numerical simulations have been carried out either in the absence of skewing, namely, perfectly parallel streams (f ¼ 0), or in skew angles f ¼ 5; 10, and 15. The streamwise evolution and the self-similar state of turbulence statistics of skewed cases are reported and compared to the unskewed and reference case. The present computations indicate that the transitional region and the fully developed turbulence region depends strongly on the degree of flow skewing at the inlet. In particular, we find that skewing yields faster growth of the inlet structures, thus leading to mixing enhancement. The underlying mechanisms responsible for turbulence modulation are analyzed through the transport equation of the Reynolds stresses. One possible perspective of the present work concerns the mixing control and a reliable comparison between the experiment, simulations, and turbulence modeling.
• #### Assessment of local and non–local turbulent flow components on turbulence–flame interaction

(IOP Publishing, 2021-11-01) [Conference Paper]
In the framework of turbulence-flame interaction, the flame is characterized by the gradient of a reactive scalar such as the progress variable, whereas the turbulence is represented by the vorticity and the strain rate. Quantitative assessment of this interaction is performed trough the study of the coupled transport between these quantities that are subject to the effects of heat release and chemical reactions. The present analysis aims at improving the understanding of the small scale turbulence – flame interaction properties, through the introduction of an additive decomposition of the strain rate and vorticity fields into their local and non-local components. The respective role of the local and non-local effects is studied for a broad range of Karlovitz numbers, by virtue of direct numerical simulations (DNS) of turbulent, premixed, lean, and statistically planar flames of methane-air. In the conditions of the present study, the alignment between flame front normals and the strain rate is found to be dominated by the local contribution from the strain rate tensor.
• #### Development and analysis of entropy stable no-slip wall boundary conditions for the Eulerian model for viscous and heat conducting compressible flows

(arXiv, 2021-10-19) [Preprint]
Nonlinear entropy stability analysis is used to derive entropy stable no-slip wall boundary conditions for the Eulerian model proposed by Sv\"{a}rd (Physica A: Statistical Mechanics and its Applications, 2018). and its spatial discretization based on entropy stable collocated discontinuous Galerkin operators with the summation-by-parts property for unstructured grids. A set of viscous test cases of increasing complexity are simulated using both the Eulerian and the classic compressible Navier-Stokes models. The numerical results obtained with the two models are compared, and differences and similarities are then highlighted.
• #### Towards autonomous event identifications in wave-equation traveltime inversion

(GEOPHYSICS, Society of Exploration Geophysicists, 2021-09-29) [Article]
We present a method to automatically relate events between observed and synthetic data for wave-equation traveltime (WT) inversion. The scheme starts with local similarity measurements by applying cross-correlation to localized traces. We then develop a differentiable alternative of the argmax function. By applying the differentiable argmax to each time slice of the local similarity map, we build a traveltime difference function but keep this process differentiable. WT inversion requires only the traveltime difference of related events through a phase shift. Thus, we must reject events that are not apparently related between observed and synthetic data. The local similarity map implies the possibility of doing so but shows abrupt changes with time and offset. To mitigate this problem, we introduce a dynamic programming algorithm to define a warping function. Wave packets between observed and synthetic data are assumed to be related if they are connected by this warping function, and they also exhibit high local similarity. Only such related events are considered in the subsequent calculation of misfit and adjoint source. Numerical examples demonstrate that the proposed method successfully retrieves an informative model from roughly selected data. In contrast, WT inversion based on cross-correlation or deconvolution fails to do so.
• #### Preconditioned BFGS-based Uncertainty Quantification in Elastic Full Waveform Inversion

(Geophysical Journal International, Oxford University Press (OUP), 2021-09-21) [Article]
Full Waveform Inversion (FWI) has become an essential technique for mapping geophysical subsurface structures. However, proper uncertainty quantification is often lacking in current applications. In theory, uncertainty quantification is related to the inverse Hessian (or the posterior covariance matrix). Even for common geophysical inverse problems its calculation is beyond the computational and storage capacities of the largest high-performance computing systems. In this study, we amend the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm to perform uncertainty quantification for large-scale applications. For seismic inverse problems, the limited-memory BFGS (L-BFGS) method prevails as the most efficient quasi-Newton method. We aim to augment it further to obtain an approximate inverse Hessian for uncertainty quantification in FWI. To facilitate retrieval of the inverse Hessian, we combine BFGS (essentially a full-history L-BFGS) with randomized singular value decomposition to determine a low-rank approximation of the inverse Hessian. Setting the rank number equal to the number of iterations makes this solution efficient and memory-affordable even for large-scale problems. Furthermore, based on the Gauss-Newton method, we formulate different initial, diagonal Hessian matrices as preconditioners for the inverse scheme and compare their performances in elastic FWI applications. We highlight our approach with the elastic Marmousi benchmark model, demonstrating the applicability of preconditioned BFGS for large-scale FWI and uncertainty quantification.
• #### H2Opus: A distributed-memory multi-GPU software package for non-local operators

(arXiv, 2021-09-12) [Preprint]
Hierarchical $\mathcal{H}^2$-matrices are asymptotically optimal representations for the discretizations of non-local operators such as those arising in integral equations or from kernel functions. Their $O(N)$ complexity in both memory and operator application makes them particularly suited for large-scale problems. As a result, there is a need for software that provides support for distributed operations on these matrices to allow large-scale problems to be represented. In this paper, we present high-performance, distributed-memory GPU-accelerated algorithms and implementations for matrix-vector multiplication and matrix recompression of hierarchical matrices in the $\mathcal{H}^2$ format. The algorithms are a new module of H2Opus, a performance-oriented package that supports a broad variety of $\mathcal{H}^2$-matrix operations on CPUs and GPUs. Performance in the distributed GPU setting is achieved by marshaling the tree data of the hierarchical matrix representation to allow batched kernels to be executed on the individual GPUs. MPI is used for inter-process communication. We optimize the communication data volume and hide much of the communication cost with local compute phases of the algorithms. Results show near-ideal scalability up to 1024 NVIDIA V100 GPUs on Summit, with performance exceeding 2.3 Tflop/s/GPU for the matrix-vector multiplication, and 670 Gflops/s/GPU for matrix compression, which involves batched QR and SVD operations. We illustrate the flexibility and efficiency of the library by solving a 2D variable diffusivity integral fractional diffusion problem with an algebraic multigrid-preconditioned Krylov solver and demonstrate scalability up to 16M degrees of freedom problems on 64 GPUs.
• #### Toward performance-portable PETSc for GPU-based exascale systems

(Parallel Computing, Elsevier BV, 2021-09-10) [Article]
The Portable Extensible Toolkit for Scientific computation (PETSc) library delivers scalable solvers for nonlinear time-dependent differential and algebraic equations and for numerical optimization. The PETSc design for performance portability addresses fundamental GPU accelerator challenges and stresses flexibility and extensibility by separating the programming model used by the application from that used by the library, and it enables application developers to use their preferred programming model, such as Kokkos, RAJA, SYCL, HIP, CUDA, or OpenCL, on upcoming exascale systems. A blueprint for using GPUs from PETSc-based codes is provided, and case studies emphasize the flexibility and high performance achieved on current GPU-based systems.
• #### Cycle-skipping mitigation using misfit measurements based on differentiable dynamic time warping

(arXiv, 2021-09-09) [Preprint]
The dynamic time warping (DTW) distance has been used as a misfit function for wave-equation inversion to mitigate the local minima issue. However, the original DTW distance is not smooth; therefore it can yield a strong discontinuity in the adjoint source. Such a weakness does not help nonlinear inverse problems converge to a plausible minimum by any means. We therefore introduce for the first time in geophysics the smooth DTW distance, which has demonstrated its performance in time series classification, clustering, and prediction as the loss function. The fundamental idea of such a distance is to replace the $\min$ operator with its smooth relaxation. Then it becomes possible to define the analytic derivative of DTW distance. The new misfit function is entitled to the differentiable DTW distance. Moreover, considering that the warping path is an indicator of the traveltime difference between the observed and synthetic trace, a penalization term is constructed based on the warping path such that the misfit accumulation by the penalized differentiable DTW distance is weighted in favor of the traveltime difference. Numerical examples demonstrate the advantage of the penalized differentiable DTW misfit function over the conventional non-differentiable one.
• #### An O(N) algorithm for computing expectation of N-dimensional truncated multi-variate normal distribution I: fundamentals

(Advances in Computational Mathematics, Springer Science and Business Media LLC, 2021-09-01) [Article]
In this paper, we present the fundamentals of a hierarchical algorithm for computing the N-dimensional integral ϕ(a,b;A)=∫abH(x)f(x|A)dx representing the expectation of a function H(X) where f(x|A) is the truncated multi-variate normal (TMVN) distribution with zero mean, x is the vector of integration variables for the N-dimensional random vector X, A is the inverse of the covariance matrix Σ, and a and b are constant vectors. The algorithm assumes that H(x) is “low-rank” and is designed for properly clustered X so that the matrix A has “low-rank” blocks and “low-dimensional” features. We demonstrate the divide-and-conquer idea when A is a symmetric positive definite tridiagonal matrix and present the necessary building blocks and rigorous potential theory–based algorithm analysis when A is given by the exponential covariance model. The algorithm overall complexity is O(N) for N-dimensional problems, with a prefactor determined by the rank of the off-diagonal matrix blocks and number of effective variables. Very high accuracy results for N as large as 2048 are obtained on a desktop computer with 16G memory using the fast Fourier transform (FFT) and non-uniform FFT to validate the analysis. The current paper focuses on the ideas using the simple yet representative examples where the off-diagonal matrix blocks are rank 1 and the number of effective variables is bounded by 2, to allow concise notations and easier explanation. In a subsequent paper, we discuss the generalization of current scheme using the sparse grid technique for higher rank problems and demonstrate how all the moments of kth order or less (a total of O(Nk) integrals) can be computed using O(Nk) operations for k ≥ 2 and O(NlogN) operations for k = 1.
• #### Misfit functions based on differentiable dynamic time warping for waveform inversion

(Society of Exploration Geophysicists, 2021-09-01) [Conference Paper]
Misfit functions based on differentiable dynamic time warping (DTW) have demonstrated an excellent performance in various sequence-related tasks. We introduce this concept in the context of waveform inversion and discuss a fast algorithm to calculate the first and second derivatives of such misfits. The DTW distance is originally calculated by solving a series of min sorting problems, thus it is not differentiable. The fundamental idea of differentiable DTW is replacing the min operator with its smooth relaxation. It is then straightforward to solve the derivatives following the differentiation rules, which results in both the differentiable misfit measurements and warping path. This path weights the traveltime mismatch more than its amplitude counterpart. We can construct a penalization term based on this warping path. The penalized misfit function is adaptable to measure traveltime and amplitude separately. Numerical examples show the flexibility and the performance of the proposed misfit function in mitigating local minima issues and retrieving the long-wavelength model features
• #### Traveltime computation using a supervised learning approach

(Society of Exploration Geophysicists, 2021-09-01) [Conference Paper]
In real-time microseismic monitoring, the ability to efﬁcientlycompute source-receiver traveltimes can help in signiﬁcantlyspeeding up the model calibration and hypocenter determina-tion processes, thus ensuring timely information about the sub-surface fractures for use in effective decision making. Here,we present a supervised-learning based traveltime computationapproach for layered 1D velocity models. First, we generatenumerous synthetic traveltime examples from a combinationof source locations and layered subsurface models, coveringa broad range of realistic P-wave velocities (2500–5000 m/s).Next, we train a multi-layered feed-forward neural network us-ing the training set containing source locations and velocitiesas input and traveltimes as corresponding labels. By doing so,we aim for a neural-network model that is trained only onceand can be applied to a wide range of subsurface velocitiesas well as source-receiver positions to predict fast and accu-rate traveltimes. We apply the trained model on numeroustest examples to validate the accuracy and speed of the pro-posed method. Based on the comparisons with acoustic ﬁnite-difference modeling and a ray-shooting method, we show thatthe trained model can provide faster and reasonably accuratetraveltimes for any realistic model scenario within the trainedvelocity range.
• #### Dual-band generative learning for low-frequency extrapolation in seismic land data

(Society of Exploration Geophysicists, 2021-09-01) [Conference Paper]
The presence of low-frequency energy in seismic data can help mitigate cycle-skipping problems in full-waveform inversion. Unfortunately, the generation and recording of low-frequency signals in seismic exploration remains a non-trivial task. Extrapolation of missing low-frequency content in field data might be addressed in a data-driven framework. In particular, deep learning models trained on synthetic data could be used for inference on the field data. Such an implementation of switching application domains remains challenging. We, therefore, propose the concept of generative dual-band learning to facilitate the knowledge transfer between synthetic and field seismic data applications of low-frequency data extrapolation. We first explain the two-step procedure for training a generative adversarial network (GAN) that extrapolates low frequencies. Then, we describe the workflow for synthetic dataset generation. Finally, we explore the feasibility of the dual-band learning concept on real near-surface land data acquired in Saudi Arabia. The presence of low-frequency energy in seismic data can help mitigate cycle-skipping problems in full-waveform inversion. Unfortunately, the generation and recording of low-frequency signals in seismic exploration remains a non-trivial task. Extrapolation of missing low-frequency content in field data might be addressed in a data-driven framework. In particular, deep learning models trained on synthetic data could be used for inference on the field data. Such an implementation of switching application domains remains challenging. We, therefore, propose the concept of generative dual-band learning to facilitate the knowledge transfer between synthetic and field seismic data applications of low-frequency data extrapolation. We first explain the two-step procedure for training a generative adversarial network (GAN) that extrapolates low frequencies. Then, we describe the workflow for synthetic dataset generation. Finally, we explore the feasibility of the dual-band learning concept on real near-surface land data acquired in Saudi Arabia.
• #### Multicycle Simulation of Strike-Slip Earthquake Rupture for Use in Near-Source Ground-Motion Simulations

(Bulletin of the Seismological Society of America, Seismological Society of America (SSA), 2021-08-31) [Article]
ABSTRACT Realistic dynamic rupture modeling validated by observed earthquakes is necessary for estimating parameters that are poorly resolved by seismic source inversion, such as stress drop, rupture velocity, and slip rate function. Source inversions using forward dynamic modeling are increasingly used to obtain earthquake rupture models. In this study, to generate a large number of physically self-consistent rupture models, rupture process of which is consistent with the spatiotemporal heterogeneity of stress produced by previous earthquakes on the same fault, we use multicycle simulations under the rate and state (RS) friction law. We adopt a one-way coupling from multicycle simulations to dynamic rupture simulations; the quasidynamic solver QDYN is used to nucleate the seismic events and the spectral element dynamic solver SPECFEM3D to resolve their rupture process. To simulate realistic seismicity, with a wide range of magnitudes and irregular recurrence, several realizations of 2D-correlated heterogeneous random distributions of characteristic weakening distance (Dc) in RS friction are tested. Other important parameters are the normal stress, which controls the stress drop and rupture velocity during an earthquake, and the maximum value of Dc, which controls rupture velocity but not stress drop. We perform a parametric study on a vertical planar fault and generate a set of a hundred spontaneous rupture models in a wide magnitude range (Mw 5.5–7.4). We validate the rupture models by comparison of source scaling, ground motion (GM), and surface slip properties to observations. We compare the source-scaling relations between rupture area, average slip, and seismic moment of the modeled events with empirical ones derived from source inversions. Near-fault GMs are computed from the source models. Their peak ground velocities and peak ground accelerations agree well with the ground-motion prediction equation values. We also obtain good agreement of the surface fault displacements with observed values.
• #### Mapping full seismic waveforms to vertical velocity profiles by deep learning

(GEOPHYSICS, Society of Exploration Geophysicists, 2021-08-31) [Article]
Building realistic and reliable models of the subsurface is the primary goal of seismic imaging. Here we construct an ensemble of convolutional neural networks (CNNs) to build velocity models directly from the data. Most other approaches attempt to map full data into 2D labels. We exploit the regularity of seismic acquisition and train CNNs to map gathers of neighboring common midpoints (CMPs) to vertical 1D velocity logs. This allows us to integrate well-log data into the inversion, simplify the mapping by using the 1D labels, and accommodate larger dips relative to using single CMP inputs. We dynamically generate the training data in parallel with training the CNNs, which reduces overfitting. Data generation and training of the CNNs is more computationally expensive than conventional full-waveform inversion (FWI). However, once the network is trained, data sets with similar acquisition parameters can be inverted much faster than with FWI. The multiCMP CNN ensemble is tested on multiple realistic synthetic models, performs well, and was combined with FWI for even better performance.
• #### Space-Fractional Diffusion with Variable Order and Diffusivity: Discretization and Direct Solution Strategies

(arXiv, 2021-08-29) [Preprint]
We consider the multidimensional space-fractional diffusion equations with spatially varying diffusivity and fractional order. Significant computational challenges are encountered when solving these equations due both to the kernel singularity in the fractional integral operator and to the resulting dense discretized operators, which quickly become prohibitively expensive to handle because of their memory and arithmetic complexities. In this work, we present a singularity-aware discretization scheme that regularizes the singular integrals through a singularity subtraction technique adapted to the spatial variability of diffusivity and fractional order. This regularization strategy is conveniently formulated as a sparse matrix correction that is added to the dense operator, and is applicable to different formulations of fractional diffusion equations. We also present a block low rank representation to handle the dense matrix representations, by exploiting the ability to approximate blocks of the resulting formally dense matrix by low rank factorizations. A Cholesky factorization solver operates directly on this representation using the low rank blocks as its atomic computational tiles, and achieves high performance on multicore hardware. Numerical results show that the singularity treatment is robust, substantially reduces discretization errors, and attains the first-order convergence rate allowed by the regularity of the solutions. They also show that considerable savings are obtained in storage ($O(N^{1.5})$) and computational cost ($O(N^2)$) compared to dense factorizations. This translates to orders-of-magnitude savings in memory and time on multi-dimensional problems, and shows that the proposed methods offer practical tools for tackling large nonlocal fractional diffusion simulations.
• #### H2OPUS-TLR: High Performance Tile Low Rank Symmetric Factorizations using Adaptive Randomized Approximation

(arXiv, 2021-08-26) [Preprint]
Tile low rank representations of dense matrices partition them into blocks of roughly uniform size, where each off-diagonal tile is compressed and stored as its own low rank factorization. They offer an attractive representation for many data-sparse dense operators that appear in practical applications, where substantial compression and a much smaller memory footprint can be achieved. TLR matrices are a compromise between the simplicity of a regular perfectly-strided data structure and the optimal complexity of the unbalanced trees of hierarchically low rank matrices, and provide a convenient performance-tuning parameter through their tile size that can be proportioned to take into account the cache size where the tiles reside in the memory hierarchy. There are currently no high-performance algorithms that can generate Cholesky and $LDL^T$ factorizations, particularly on GPUs. The difficulties in achieving high performance when factoring TLR matrices come from the expensive compression operations that must be performed during the factorization process and the adaptive rank distribution of the tiles that causes an irregular work pattern for the processing cores. In this work, we develop a dynamic batching operation and combine it with batched adaptive randomized approximations to achieve high performance both on GPUs and CPUs. Our implementation attains over 1.2 TFLOP/s in double precision on the V100 GPU, and is limited by the performance of batched GEMM operations. The Cholesky factorization of covariance matrix of size $N = 131K$ arising in spatial statistics can be factored to an accuracy $\epsilon=10^{-2}$ in just a few seconds. We believe the proposed GEMM-centric algorithm allows it to be readily ported to newer hardware such as the tensor cores that are optimized for small GEMM operations.