Extreme Computing Research Center
Recent Submissions

A Reducedorder Variational Bayesian Approach for Efficient Subsurface Imaging(Geophysical Journal International, Oxford University Press (OUP), 20211215) [Article]This work considers the reconstruction of a subsurface model from seismic observations, which is known to be a highdimensional and illposed inverse problem. Two approaches are combined to tackle this problem: the Discrete Cosine Transform (DCT) approach, used in the forward modeling step, and the Variational Bayesian (VB) approach, used in the inverse reconstruction step. VB can provide not only point estimates but also closed forms of the full posterior probability distributions. To efficiently compute such estimates of the full joint posterior distributions of largescale seismic inverse problems, we resort to a DCT orderreduction scheme with a VB approximation of the posteriors, avoiding the need for costly Bayesian sampling methods. More specifically, we first reduce the model parameters through truncation of their DCT coefficients. This helps regularizing our seismic inverse problem and alleviates its computational complexity. Then, we apply a VB inference in the reducedDCT space to estimate the dominant (retained) DCT coefficients together with the variance of the observational noise. We also present an efficient implementation of the derived VBbased algorithm for further cost reduction. The performances of the proposed scheme are evaluated through extensive numerical experiments for both linear and nonlinear forward models. In the former, the subsurface reflectivity model was reconstructed at a comparable estimation accuracy as the optimal weightedleast squares solution. In the latter, the main structural features of the squared slowness model were well reconstructed.

SphGLLTools: A toolbox for visualization of large seismic model files based on 3D spectralelement meshes(Computers & Geosciences, Elsevier BV, 202112) [Article]Adjoint tomography, a fullwaveform inversion technique based on 3D wave simulations, has become a commonly used tool in passivesource seismology, drawing on advances in computational power and numerical methods. From global to reservoir scales, seismic models can iteratively be updated in adjoint inversions by extracting information from full seismic waveforms. Seismic models are typically constructed on the numerical mesh used for wave simulations. Thus the size of model files depends on the minimum resolvable period achieved in simulations controlled by the numerical mesh (the higher the mesh resolution, the larger the model and mesh files). This is specifically a concern for recent globalscale adjoint tomographic models where the size and format of numerical meshes pose challenges for model visualization, analysis, interpretation, and sharing model files. Here, we present SphGLLTools, an opensource toolbox that intends to diminish these challenges by expanding global adjoint models onto spherical harmonic functions, which are widely used in global seismology. Our tools are initially designed for spectralelement meshes used in recent global adjoint tomography studies. SphGLLTools facilitate many commonplace tasks for model visualization and analysis, including spherical harmonic expansion of models sampled on spectralelement meshes together with associated tools for easy sharing, visualization, and interpretation of largescale seismic model files. All the developed routines are accompanied by user instructions and are available through GitHub. For transparency, reproducibility, and educational purposes, we also include Colab notebooks, which provide an intuitive and comprehensive review of the principles and methods for spectralelement meshes, spherical harmonic expansion, and other model analysis tools.

Optimized RungeKutta Methods with Automatic Step Size Control for Compressible Computational Fluid Dynamics(Communications on Applied Mathematics and Computation, Springer Science and Business Media LLC, 20211122) [Article]AbstractWe develop errorcontrol based time integration algorithms for compressible fluid dynamics (CFD) applications and show that they are efficient and robust in both the accuracylimited and stabilitylimited regime. Focusing on discontinuous spectral element semidiscretizations, we design new controllers for existing methods and for some new embedded RungeKutta pairs. We demonstrate the importance of choosing adequate controller parameters and provide a means to obtain these in practice. We compare a wide range of errorcontrolbased methods, along with the common approach in which step size control is based on the CourantFriedrichsLewy (CFL) number. The optimized methods give improved performance and naturally adopt a step size close to the maximum stable CFL number at loose tolerances, while additionally providing control of the temporal error at tighter tolerances. The numerical examples include challenging industrial CFD applications.

The influence of depthvarying elastic properties of the upper plate on megathrust earthquake rupture dynamics and tsunamigenesis(Journal of Geophysical Research: Solid Earth, American Geophysical Union (AGU), 20211115) [Article]Megathrust earthquakes are strongly influenced by the elastic properties of rocks surrounding the fault. In contrast to friction, these properties can be derived in situ from geophysical measurements along the seismogenic zone. However, they are often overestimated in numerical simulations, particularly in the shallow megathrust. Here we explore the influence that realistic depthvarying upperplate elastic properties along the megathrust have on earthquake rupture dynamics and tsunamigenesis using 3D dynamic rupture and tsunami simulations. We compare results from three subduction zone scenarios with homogeneous and heterogeneous elastic media, and bimaterial fault. Elastic properties in the heterogeneous model follow a realistic depthdistribution derived from controlledsource tomography models of subduction zones. We assume the same friction properties for all scenarios. Simulations in the heterogeneous and homogeneous models show that rigidity depth variations explain the depthvarying behavior of slip, slip rate, frequency content, and rupture time. The depthvarying behavior of slip, frequency content, and rupture duration quantitatively agree with previous predictions based on worldwide data compilations, explaining the main depthdependent traits of tsunami earthquakes and large shallow megathrust earthquakes. Large slip, slow rupture and slip rate amplification in bimaterial simulations are largely controlled by elastic rock properties of the most compliant side of the fault, which in subduction zones is the upper plate. Large shallow slip and trenchward increasing upperplate compliance of the heterogeneous model lead to the largest coseismic seafloor deformation and tsunami amplitude. This highlights the importance of considering realistic upperplate rigidity variations to properly assess the tsunamigenic potential of megathrust earthquakes.

A direct numerical simulation study of skewed threedimensional spatially evolving compressible mixing layer(Physics of Fluids, AIP Publishing, 202111) [Article]The turbulent flow of spatially developing and highspeed hydrogen/air mixing layers subject to small skew angle f is systematically investigated by means of direct numerical simulation. The present database features both detailed chemistry and detailed transport (i.e., Soret, Dufour, and bulk viscosity effects). The angle f measures the misalignment of the two asymptotic streams of fluid, whose interaction creates the turbulent mixing region. Numerical simulations have been carried out either in the absence of skewing, namely, perfectly parallel streams (f ¼ 0), or in skew angles f ¼ 5; 10, and 15. The streamwise evolution and the selfsimilar state of turbulence statistics of skewed cases are reported and compared to the unskewed and reference case. The present computations indicate that the transitional region and the fully developed turbulence region depends strongly on the degree of flow skewing at the inlet. In particular, we find that skewing yields faster growth of the inlet structures, thus leading to mixing enhancement. The underlying mechanisms responsible for turbulence modulation are analyzed through the transport equation of the Reynolds stresses. One possible perspective of the present work concerns the mixing control and a reliable comparison between the experiment, simulations, and turbulence modeling.

Assessment of local and non–local turbulent flow components on turbulence–flame interaction(IOP Publishing, 20211101) [Conference Paper]In the framework of turbulenceflame interaction, the flame is characterized by the gradient of a reactive scalar such as the progress variable, whereas the turbulence is represented by the vorticity and the strain rate. Quantitative assessment of this interaction is performed trough the study of the coupled transport between these quantities that are subject to the effects of heat release and chemical reactions. The present analysis aims at improving the understanding of the small scale turbulence – flame interaction properties, through the introduction of an additive decomposition of the strain rate and vorticity fields into their local and nonlocal components. The respective role of the local and nonlocal effects is studied for a broad range of Karlovitz numbers, by virtue of direct numerical simulations (DNS) of turbulent, premixed, lean, and statistically planar flames of methaneair. In the conditions of the present study, the alignment between flame front normals and the strain rate is found to be dominated by the local contribution from the strain rate tensor.

Development and analysis of entropy stable noslip wall boundary conditions for the Eulerian model for viscous and heat conducting compressible flows(arXiv, 20211019) [Preprint]Nonlinear entropy stability analysis is used to derive entropy stable noslip wall boundary conditions for the Eulerian model proposed by Sv\"{a}rd (Physica A: Statistical Mechanics and its Applications, 2018). and its spatial discretization based on entropy stable collocated discontinuous Galerkin operators with the summationbyparts property for unstructured grids. A set of viscous test cases of increasing complexity are simulated using both the Eulerian and the classic compressible NavierStokes models. The numerical results obtained with the two models are compared, and differences and similarities are then highlighted.

Towards autonomous event identifications in waveequation traveltime inversion(GEOPHYSICS, Society of Exploration Geophysicists, 20210929) [Article]We present a method to automatically relate events between observed and synthetic data for waveequation traveltime (WT) inversion. The scheme starts with local similarity measurements by applying crosscorrelation to localized traces. We then develop a differentiable alternative of the argmax function. By applying the differentiable argmax to each time slice of the local similarity map, we build a traveltime difference function but keep this process differentiable. WT inversion requires only the traveltime difference of related events through a phase shift. Thus, we must reject events that are not apparently related between observed and synthetic data. The local similarity map implies the possibility of doing so but shows abrupt changes with time and offset. To mitigate this problem, we introduce a dynamic programming algorithm to define a warping function. Wave packets between observed and synthetic data are assumed to be related if they are connected by this warping function, and they also exhibit high local similarity. Only such related events are considered in the subsequent calculation of misfit and adjoint source. Numerical examples demonstrate that the proposed method successfully retrieves an informative model from roughly selected data. In contrast, WT inversion based on crosscorrelation or deconvolution fails to do so.

Preconditioned BFGSbased Uncertainty Quantification in Elastic Full Waveform Inversion(Geophysical Journal International, Oxford University Press (OUP), 20210921) [Article]Full Waveform Inversion (FWI) has become an essential technique for mapping geophysical subsurface structures. However, proper uncertainty quantification is often lacking in current applications. In theory, uncertainty quantification is related to the inverse Hessian (or the posterior covariance matrix). Even for common geophysical inverse problems its calculation is beyond the computational and storage capacities of the largest highperformance computing systems. In this study, we amend the BroydenFletcherGoldfarbShanno (BFGS) algorithm to perform uncertainty quantification for largescale applications. For seismic inverse problems, the limitedmemory BFGS (LBFGS) method prevails as the most efficient quasiNewton method. We aim to augment it further to obtain an approximate inverse Hessian for uncertainty quantification in FWI. To facilitate retrieval of the inverse Hessian, we combine BFGS (essentially a fullhistory LBFGS) with randomized singular value decomposition to determine a lowrank approximation of the inverse Hessian. Setting the rank number equal to the number of iterations makes this solution efficient and memoryaffordable even for largescale problems. Furthermore, based on the GaussNewton method, we formulate different initial, diagonal Hessian matrices as preconditioners for the inverse scheme and compare their performances in elastic FWI applications. We highlight our approach with the elastic Marmousi benchmark model, demonstrating the applicability of preconditioned BFGS for largescale FWI and uncertainty quantification.

H2Opus: A distributedmemory multiGPU software package for nonlocal operators(arXiv, 20210912) [Preprint]Hierarchical $\mathcal{H}^2$matrices are asymptotically optimal representations for the discretizations of nonlocal operators such as those arising in integral equations or from kernel functions. Their $O(N)$ complexity in both memory and operator application makes them particularly suited for largescale problems. As a result, there is a need for software that provides support for distributed operations on these matrices to allow largescale problems to be represented. In this paper, we present highperformance, distributedmemory GPUaccelerated algorithms and implementations for matrixvector multiplication and matrix recompression of hierarchical matrices in the $\mathcal{H}^2$ format. The algorithms are a new module of H2Opus, a performanceoriented package that supports a broad variety of $\mathcal{H}^2$matrix operations on CPUs and GPUs. Performance in the distributed GPU setting is achieved by marshaling the tree data of the hierarchical matrix representation to allow batched kernels to be executed on the individual GPUs. MPI is used for interprocess communication. We optimize the communication data volume and hide much of the communication cost with local compute phases of the algorithms. Results show nearideal scalability up to 1024 NVIDIA V100 GPUs on Summit, with performance exceeding 2.3 Tflop/s/GPU for the matrixvector multiplication, and 670 Gflops/s/GPU for matrix compression, which involves batched QR and SVD operations. We illustrate the flexibility and efficiency of the library by solving a 2D variable diffusivity integral fractional diffusion problem with an algebraic multigridpreconditioned Krylov solver and demonstrate scalability up to 16M degrees of freedom problems on 64 GPUs.

Toward performanceportable PETSc for GPUbased exascale systems(Parallel Computing, Elsevier BV, 20210910) [Article]The Portable Extensible Toolkit for Scientific computation (PETSc) library delivers scalable solvers for nonlinear timedependent differential and algebraic equations and for numerical optimization. The PETSc design for performance portability addresses fundamental GPU accelerator challenges and stresses flexibility and extensibility by separating the programming model used by the application from that used by the library, and it enables application developers to use their preferred programming model, such as Kokkos, RAJA, SYCL, HIP, CUDA, or OpenCL, on upcoming exascale systems. A blueprint for using GPUs from PETScbased codes is provided, and case studies emphasize the flexibility and high performance achieved on current GPUbased systems.

Cycleskipping mitigation using misfit measurements based on differentiable dynamic time warping(arXiv, 20210909) [Preprint]The dynamic time warping (DTW) distance has been used as a misfit function for waveequation inversion to mitigate the local minima issue. However, the original DTW distance is not smooth; therefore it can yield a strong discontinuity in the adjoint source. Such a weakness does not help nonlinear inverse problems converge to a plausible minimum by any means. We therefore introduce for the first time in geophysics the smooth DTW distance, which has demonstrated its performance in time series classification, clustering, and prediction as the loss function. The fundamental idea of such a distance is to replace the $\min$ operator with its smooth relaxation. Then it becomes possible to define the analytic derivative of DTW distance. The new misfit function is entitled to the differentiable DTW distance. Moreover, considering that the warping path is an indicator of the traveltime difference between the observed and synthetic trace, a penalization term is constructed based on the warping path such that the misfit accumulation by the penalized differentiable DTW distance is weighted in favor of the traveltime difference. Numerical examples demonstrate the advantage of the penalized differentiable DTW misfit function over the conventional nondifferentiable one.

An O(N) algorithm for computing expectation of Ndimensional truncated multivariate normal distribution I: fundamentals(Advances in Computational Mathematics, Springer Science and Business Media LLC, 20210901) [Article]In this paper, we present the fundamentals of a hierarchical algorithm for computing the Ndimensional integral ϕ(a,b;A)=∫abH(x)f(xA)dx representing the expectation of a function H(X) where f(xA) is the truncated multivariate normal (TMVN) distribution with zero mean, x is the vector of integration variables for the Ndimensional random vector X, A is the inverse of the covariance matrix Σ, and a and b are constant vectors. The algorithm assumes that H(x) is “lowrank” and is designed for properly clustered X so that the matrix A has “lowrank” blocks and “lowdimensional” features. We demonstrate the divideandconquer idea when A is a symmetric positive definite tridiagonal matrix and present the necessary building blocks and rigorous potential theory–based algorithm analysis when A is given by the exponential covariance model. The algorithm overall complexity is O(N) for Ndimensional problems, with a prefactor determined by the rank of the offdiagonal matrix blocks and number of effective variables. Very high accuracy results for N as large as 2048 are obtained on a desktop computer with 16G memory using the fast Fourier transform (FFT) and nonuniform FFT to validate the analysis. The current paper focuses on the ideas using the simple yet representative examples where the offdiagonal matrix blocks are rank 1 and the number of effective variables is bounded by 2, to allow concise notations and easier explanation. In a subsequent paper, we discuss the generalization of current scheme using the sparse grid technique for higher rank problems and demonstrate how all the moments of kth order or less (a total of O(Nk) integrals) can be computed using O(Nk) operations for k ≥ 2 and O(NlogN) operations for k = 1.

Misfit functions based on differentiable dynamic time warping for waveform inversion(Society of Exploration Geophysicists, 20210901) [Conference Paper]Misfit functions based on differentiable dynamic time warping (DTW) have demonstrated an excellent performance in various sequencerelated tasks. We introduce this concept in the context of waveform inversion and discuss a fast algorithm to calculate the first and second derivatives of such misfits. The DTW distance is originally calculated by solving a series of min sorting problems, thus it is not differentiable. The fundamental idea of differentiable DTW is replacing the min operator with its smooth relaxation. It is then straightforward to solve the derivatives following the differentiation rules, which results in both the differentiable misfit measurements and warping path. This path weights the traveltime mismatch more than its amplitude counterpart. We can construct a penalization term based on this warping path. The penalized misfit function is adaptable to measure traveltime and amplitude separately. Numerical examples show the flexibility and the performance of the proposed misfit function in mitigating local minima issues and retrieving the longwavelength model features

Traveltime computation using a supervised learning approach(Society of Exploration Geophysicists, 20210901) [Conference Paper]In realtime microseismic monitoring, the ability to efﬁcientlycompute sourcereceiver traveltimes can help in signiﬁcantlyspeeding up the model calibration and hypocenter determination processes, thus ensuring timely information about the subsurface fractures for use in effective decision making. Here,we present a supervisedlearning based traveltime computationapproach for layered 1D velocity models. First, we generatenumerous synthetic traveltime examples from a combinationof source locations and layered subsurface models, coveringa broad range of realistic Pwave velocities (2500–5000 m/s).Next, we train a multilayered feedforward neural network using the training set containing source locations and velocitiesas input and traveltimes as corresponding labels. By doing so,we aim for a neuralnetwork model that is trained only onceand can be applied to a wide range of subsurface velocitiesas well as sourcereceiver positions to predict fast and accurate traveltimes. We apply the trained model on numeroustest examples to validate the accuracy and speed of the proposed method. Based on the comparisons with acoustic ﬁnitedifference modeling and a rayshooting method, we show thatthe trained model can provide faster and reasonably accuratetraveltimes for any realistic model scenario within the trainedvelocity range.

Dualband generative learning for lowfrequency extrapolation in seismic land data(Society of Exploration Geophysicists, 20210901) [Conference Paper]The presence of lowfrequency energy in seismic data can help mitigate cycleskipping problems in fullwaveform inversion. Unfortunately, the generation and recording of lowfrequency signals in seismic exploration remains a nontrivial task. Extrapolation of missing lowfrequency content in field data might be addressed in a datadriven framework. In particular, deep learning models trained on synthetic data could be used for inference on the field data. Such an implementation of switching application domains remains challenging. We, therefore, propose the concept of generative dualband learning to facilitate the knowledge transfer between synthetic and field seismic data applications of lowfrequency data extrapolation. We first explain the twostep procedure for training a generative adversarial network (GAN) that extrapolates low frequencies. Then, we describe the workflow for synthetic dataset generation. Finally, we explore the feasibility of the dualband learning concept on real nearsurface land data acquired in Saudi Arabia. The presence of lowfrequency energy in seismic data can help mitigate cycleskipping problems in fullwaveform inversion. Unfortunately, the generation and recording of lowfrequency signals in seismic exploration remains a nontrivial task. Extrapolation of missing lowfrequency content in field data might be addressed in a datadriven framework. In particular, deep learning models trained on synthetic data could be used for inference on the field data. Such an implementation of switching application domains remains challenging. We, therefore, propose the concept of generative dualband learning to facilitate the knowledge transfer between synthetic and field seismic data applications of lowfrequency data extrapolation. We first explain the twostep procedure for training a generative adversarial network (GAN) that extrapolates low frequencies. Then, we describe the workflow for synthetic dataset generation. Finally, we explore the feasibility of the dualband learning concept on real nearsurface land data acquired in Saudi Arabia.

Multicycle Simulation of StrikeSlip Earthquake Rupture for Use in NearSource GroundMotion Simulations(Bulletin of the Seismological Society of America, Seismological Society of America (SSA), 20210831) [Article]ABSTRACT Realistic dynamic rupture modeling validated by observed earthquakes is necessary for estimating parameters that are poorly resolved by seismic source inversion, such as stress drop, rupture velocity, and slip rate function. Source inversions using forward dynamic modeling are increasingly used to obtain earthquake rupture models. In this study, to generate a large number of physically selfconsistent rupture models, rupture process of which is consistent with the spatiotemporal heterogeneity of stress produced by previous earthquakes on the same fault, we use multicycle simulations under the rate and state (RS) friction law. We adopt a oneway coupling from multicycle simulations to dynamic rupture simulations; the quasidynamic solver QDYN is used to nucleate the seismic events and the spectral element dynamic solver SPECFEM3D to resolve their rupture process. To simulate realistic seismicity, with a wide range of magnitudes and irregular recurrence, several realizations of 2Dcorrelated heterogeneous random distributions of characteristic weakening distance (Dc) in RS friction are tested. Other important parameters are the normal stress, which controls the stress drop and rupture velocity during an earthquake, and the maximum value of Dc, which controls rupture velocity but not stress drop. We perform a parametric study on a vertical planar fault and generate a set of a hundred spontaneous rupture models in a wide magnitude range (Mw 5.5–7.4). We validate the rupture models by comparison of source scaling, ground motion (GM), and surface slip properties to observations. We compare the sourcescaling relations between rupture area, average slip, and seismic moment of the modeled events with empirical ones derived from source inversions. Nearfault GMs are computed from the source models. Their peak ground velocities and peak ground accelerations agree well with the groundmotion prediction equation values. We also obtain good agreement of the surface fault displacements with observed values.

Mapping full seismic waveforms to vertical velocity profiles by deep learning(GEOPHYSICS, Society of Exploration Geophysicists, 20210831) [Article]Building realistic and reliable models of the subsurface is the primary goal of seismic imaging. Here we construct an ensemble of convolutional neural networks (CNNs) to build velocity models directly from the data. Most other approaches attempt to map full data into 2D labels. We exploit the regularity of seismic acquisition and train CNNs to map gathers of neighboring common midpoints (CMPs) to vertical 1D velocity logs. This allows us to integrate welllog data into the inversion, simplify the mapping by using the 1D labels, and accommodate larger dips relative to using single CMP inputs. We dynamically generate the training data in parallel with training the CNNs, which reduces overfitting. Data generation and training of the CNNs is more computationally expensive than conventional fullwaveform inversion (FWI). However, once the network is trained, data sets with similar acquisition parameters can be inverted much faster than with FWI. The multiCMP CNN ensemble is tested on multiple realistic synthetic models, performs well, and was combined with FWI for even better performance.

SpaceFractional Diffusion with Variable Order and Diffusivity: Discretization and Direct Solution Strategies(arXiv, 20210829) [Preprint]We consider the multidimensional spacefractional diffusion equations with spatially varying diffusivity and fractional order. Significant computational challenges are encountered when solving these equations due both to the kernel singularity in the fractional integral operator and to the resulting dense discretized operators, which quickly become prohibitively expensive to handle because of their memory and arithmetic complexities. In this work, we present a singularityaware discretization scheme that regularizes the singular integrals through a singularity subtraction technique adapted to the spatial variability of diffusivity and fractional order. This regularization strategy is conveniently formulated as a sparse matrix correction that is added to the dense operator, and is applicable to different formulations of fractional diffusion equations. We also present a block low rank representation to handle the dense matrix representations, by exploiting the ability to approximate blocks of the resulting formally dense matrix by low rank factorizations. A Cholesky factorization solver operates directly on this representation using the low rank blocks as its atomic computational tiles, and achieves high performance on multicore hardware. Numerical results show that the singularity treatment is robust, substantially reduces discretization errors, and attains the firstorder convergence rate allowed by the regularity of the solutions. They also show that considerable savings are obtained in storage ($O(N^{1.5})$) and computational cost ($O(N^2)$) compared to dense factorizations. This translates to ordersofmagnitude savings in memory and time on multidimensional problems, and shows that the proposed methods offer practical tools for tackling large nonlocal fractional diffusion simulations.

H2OPUSTLR: High Performance Tile Low Rank Symmetric Factorizations using Adaptive Randomized Approximation(arXiv, 20210826) [Preprint]Tile low rank representations of dense matrices partition them into blocks of roughly uniform size, where each offdiagonal tile is compressed and stored as its own low rank factorization. They offer an attractive representation for many datasparse dense operators that appear in practical applications, where substantial compression and a much smaller memory footprint can be achieved. TLR matrices are a compromise between the simplicity of a regular perfectlystrided data structure and the optimal complexity of the unbalanced trees of hierarchically low rank matrices, and provide a convenient performancetuning parameter through their tile size that can be proportioned to take into account the cache size where the tiles reside in the memory hierarchy. There are currently no highperformance algorithms that can generate Cholesky and $LDL^T$ factorizations, particularly on GPUs. The difficulties in achieving high performance when factoring TLR matrices come from the expensive compression operations that must be performed during the factorization process and the adaptive rank distribution of the tiles that causes an irregular work pattern for the processing cores. In this work, we develop a dynamic batching operation and combine it with batched adaptive randomized approximations to achieve high performance both on GPUs and CPUs. Our implementation attains over 1.2 TFLOP/s in double precision on the V100 GPU, and is limited by the performance of batched GEMM operations. The Cholesky factorization of covariance matrix of size $N = 131K$ arising in spatial statistics can be factored to an accuracy $\epsilon=10^{2}$ in just a few seconds. We believe the proposed GEMMcentric algorithm allows it to be readily ported to newer hardware such as the tensor cores that are optimized for small GEMM operations.