Recent Submissions

  • A nonlinear elimination preconditioned inexact Newton method for blood flow problems in human artery with stenosis

    Luo, Li; Shiu, Wen Shin; Chen, Rongliang; Cai, Xiao Chuan (Journal of Computational Physics, Elsevier BV, 2019-09-09) [Article]
    Simulation of blood flows in the human artery is a promising tool for understanding the hemodynamics. The blood flow is often smooth in a healthy artery, but may become locally chaotic in a diseased artery with stenosis, and as a result, a traditional solver may take many iterations to converge or does not converge at all. To overcome the problem, we develop a nonlinearly preconditioned Newton method in which the variables associated with the stenosis are iteratively eliminated and then a global Newton method is applied to the smooth part of the system. More specifically, we model the blood flow in a patient-specific artery based on the unsteady incompressible Navier-Stokes equations with resistive boundary conditions discretized by a fully implicit finite element method. The resulting nonlinear system at each time step is solved by using an inexact Newton method with a domain decomposition based Jacobian solver. To improve the convergence and robustness of the Newton method for arteries with stenosis, we develop an adaptive restricted region-based nonlinear elimination preconditioner which performs subspace correction to remove the local high nonlinearities. Numerical experiments for several cerebral arteries are presented to demonstrate the superiority of the proposed algorithm over the classical method with respect to some physical and numerical parameters. We also report the parallel scalability of the proposed algorithm on a supercomputer with thousands of processor cores.
  • A rezoning-free CESE Scheme for solving the Compressible Euler Equations on Moving Unstructured Meshes

    Shen, Hua; Parsani, Matteo (Journal of Computational Physics, Elsevier BV, 2019-08-02) [Article]
    We construct a space-time conservation element and solution element (CESE) scheme for solving the compressible Euler equations on moving meshes (CESE-MM) which allow an arbitrary motion for each of the mesh points. The scheme is a direct extension of a purely Eulerian CESE scheme that was previously implemented on hybrid unstructured meshes (Shen et al., J. Comput. Phys., 2015). It adopts a staggered mesh in space and time such that the physical variables are continuous across the interfaces of the adjacent space-time control volumes and, therefore, a Riemann solver is not required to calculate interface fluxes or the node velocities. Moreover, the staggered mesh can significantly alleviate mesh tangles so that the time step can be kept at an acceptable level without using any rezoning operation. The discretization of the integral space-time conservation law is completely based on the physical space-time control volume, thereby satisfying the physical and geometrical conservation laws. Plenty of numerical examples are carried out to validate the accuracy and robustness of the CESE-MM scheme.
  • Conservative and entropy stable solid wall boundary conditions for the compressible Navier–Stokes equations: Adiabatic wall and heat entropy transfer

    Dalcin, Lisandro; Rojas, Diego B.; Zampini, Stefano; Del Rey Fernández, David C.; Carpenter, Mark H.; Parsani, Matteo (Journal of Computational Physics, Elsevier BV, 2019-08-01) [Article]
    We present a novel technique for the imposition of non-linear entropy conservative and entropy stable solid wall boundary conditions for the compressible Navier–Stokes equations in the presence of an adiabatic wall, or a wall with a prescribed heat entropy flow. The procedure relies on the formalism and mimetic properties of diagonal-norm, summation-by-parts and simultaneous-approximation-term operators, and is a generalization of previous works on discontinuous interface coupling [1] and solid wall boundary conditions [2]. Using the method of lines, a semi-discrete entropy estimate for the entire domain is obtained when the proposed numerical imposition of boundary conditions are coupled with an entropy-conservative or entropy-stable discrete interior operator. The resulting estimate mimics the global entropy estimate obtained at the continuous level. The boundary data at the wall are weakly imposed using a penalty flux approach and a simultaneous-approximation-term technique for both the conservative variables and the gradient of the entropy variables. Discontinuous spectral collocation operators (mass lumped nodal discontinuous Galerkin operators), on high-order unstructured grids, are used for the purpose of demonstrating the robustness and efficacy of the new procedure for weakly enforcing boundary conditions. Numerical simulations confirm the non-linear stability of the proposed technique, with applications to three-dimensional subsonic and supersonic flows. The procedure described is compatible with any diagonal-norm summation-by-parts spatial operator, including finite element, finite difference, finite volume, discontinuous Galerkin, and flux reconstruction schemes.
  • HLIBCov: Parallel hierarchical matrix approximation of large covariance matrices and likelihoods with applications in parameter identification

    Litvinenko, Alexander; Kriemann, Ronald; Genton, Marc G.; Sun, Ying; Keyes, David E. (MethodsX, Elsevier BV, 2019-07-12) [Article]
    We provide more technical details about the HLIBCov package, which is using parallel hierarchical (H-) matrices to: • approximates large dense inhomogeneous covariance matrices with a log-linear computational cost and storage requirement; •computes matrix-vector product, Cholesky factorization and inverse with a log-linear complexity; •identify unknown parameters of the covariance function (variance, smoothness, and covariance length); These unknown parameters are estimated by maximizing the joint Gaussian log-likelihood function. To demonstrate the numerical performance, we identify three unknown parameters in an example with 2,000,000 locations on a PC-desktop.
  • Solution of the 3D density-driven groundwater flow problem with uncertain porosity and permeability

    Litvinenko, Alexander; Logashenko, Dmitry; Tempone, Raul; Wittum, Gabriel; Keyes, David E. (arXiv, 2019-06-01) [Preprint]
    As groundwater is an essential nutrition and irrigation resource, its pollution may lead to catastrophic consequences. Therefore, accurate modeling of the pollution of the soil and groundwater aquifer is highly important. As a model, we consider a density-driven groundwater flow problem with uncertain porosity and permeability. This problem may arise in geothermal reservoir simulation, natural saline-disposal basins, modeling of contaminant plumes, and subsurface flow. This strongly nonlinear time-dependent problem describes the convection of the two-phase flow. This liquid streams under the gravity force, building so-called ``fingers''. The accurate numerical solution requires fine spatial resolution with an unstructured mesh and, therefore, high computational resources. Consequently, we run the parallelized simulation toolbox \myug with the geometric multigrid solver on Shaheen II supercomputer. The parallelization is done in physical and stochastic spaces. Additionally, we demonstrate how the \myug toolbox can be run in a black-box fashion for testing different scenarios in the density-driven flow. As a benchmark, we solve the Elder-like problem in a 3D domain. For approximations in the stochastic space, we use the generalized polynomial chaos expansion. We compute the mean, variance, and exceedance probabilities of the mass fraction. As a reference solution, we use the solution, obtained from the quasi-Monte Carlo method.
  • Rotational test spaces for a fully-implicit FVM and FEM for the DNS of fluid-particle interaction

    Hollbacher, Susanne; Wittum, Gabriel (Journal of Computational Physics, Academic Press, 2019-05-23) [Article]
    The paper presents a fully-implicit and stable finite element and finite volume scheme for the simulation of freely moving particles in a fluid. The developed method is based on the Petrov-Galerkin formulation of a vertex-centered finite volume method (PG-FVM) on unstructured grids. Appropriate extension of the ansatz and test spaces lead to a formulation comparable to a fictitious domain formulation. The purpose of this work is to introduce a new concept of numerical modeling reducing the mathematical overhead which many other methods require. It exploits the identification of the PG-FVM with a corresponding finite element bilinear form. The surface integrals of the finite volume scheme enable a natural incorporation of the interface forces purely based on the original bilinear operator for the fluid. As a result, there is no need to expand the system of equations to a saddle-point problem. Like for fictitious domain methods the extended scheme treats the particles as rigid parts of the fluid. The distinguishing feature compared to most existing fictitious domain methods is that there is no need for an additional Lagrange multiplier or other artificial external forces for the fluid-solid coupling. Consequently, only one single solve for the derived linear system for the fluid together with the particles is necessary and the proposed method does not require any fractional time stepping scheme to balance the interaction forces between fluid and particles. For the linear Stokes problem we will prove the stability of both schemes. Moreover, for the stationary case the conservation of mass and momentum is not violated by the extended scheme, i.e. conservativity is accomplished within the range of the underlying, unconstrained discretization scheme. The scheme is applicable for problems in two and three dimensions.
  • Square-Root Variable Metric based elastic full-waveform inversion – Part 1: Theory and validation

    Liu, Qiancheng; Peter, Daniel; Tape, Carl (Geophysical Journal International, Oxford University Press (OUP), 2019-05-16) [Article]
    Full-waveform inversion (FWI) has become a powerful tool in inverting subsurface geophysical properties. The estimation of uncertainty in the resulting Earth models and parameter trade-offs, although equally important to the inversion result, has however often been neglected or became prohibitive for large-scale inverse problems. Theoretically, the uncertainty estimation is linked to the inverse Hessian (or posterior covariance matrix), which for massive inverse problems becomes impossible to store and compute. In this study, we investigate the application of the square-root variable metric (SRVM) method, a quasi-Newton optimisation algorithm, to FWI in a vector version. This approach allows us to reconstruct the final inverse Hessian at an affordable storage memory cost. We conduct SRVM based elastic FWI on several elastic models in regular, free-surface and practical cases. Comparing the results with those obtained by the state-of-the-art L-BFGS algorithm, we find that the proposed SRVM method performs on a similar, highly-efficient level as L-BFGS, with the advantage of providing additional information such as the inverse Hessian needed for uncertainty quantification.
  • Square-root variable metric based elastic full-waveform inversion-Part 2: Uncertainty estimation

    Liu, Qiancheng; Peter, Daniel (Geophysical Journal International, Oxford University Press (OUP), 2019-05-02) [Article]
    In our first paper (Part 1) about the square-root variable metric (SRVM) method we presented the basic theory and validation of the inverse algorithm applicable to large-scale seismic data inversions. In this second paper (Part 2) about the SRVM method, the objective is to estimate the resolution and uncertainty of the inverted resulting geophysical model. Bayesian inference allows estimating the posterior model distribution from its prior distribution and likelihood function. These distributions, when being linear and Gaussian, can be mathematically characterized by their covariance matrices. However, it is prohibitive to explicitly construct and store the covariance in large-scale practical problems. In Part 1, we applied the SRVM method to elastic full-waveform inversion in a matrix-free vector version. This new algorithm allows accessing the posterior covariance by reconstructing the inverseHessian with memory-Affordable vector series. The focus of this paper is on extracting quantitative and statistical information from the inverse Hessian for quality assessment of the inverted seismic model by FWI. To operate on the inverse Hessian more efficiently, we compute its eigenvalues and eigenvectors with randomized singular value decomposition. Furthermore, we collect point-spread functions from the Hessian in an efficient way. The posterior standard deviation quantitatively measures the uncertainties of the posterior model. 2-D Gaussian random samplers will help to visually compare both the prior and posterior distributions. We highlight our method on several numerical examples and demonstrate an uncertainty estimation analysis applicable to large-scale inversions.
  • Massively Parallel Polar Decomposition on Distributed-memory Systems

    Ltaief, Hatem; Sukkari, Dalal E.; Esposito, Aniello; Nakatsukasa, Yuji; Keyes, David E. (ACM Transactions on Parallel Computing, Association for Computing, 2019-05-01) [Article]
    We present a high-performance implementation of the Polar Decomposition (PD) on distributed-memory systems. Building upon on the QR-based Dynamically Weighted Halley (QDWH) algorithm, the key idea lies in finding the best rational approximation for the scalar sign function, which also corresponds to the polar factor for symmetric matrices, to further accelerate the QDWH convergence. Based on the Zolotarev rational functions-introduced by Zolotarev (ZOLO) in 1877-this new PD algorithm ZOLO-PD converges within two iterations even for ill-conditioned matrices, instead of the original six iterations needed for QDWH. ZOLO-PD uses the property of Zolotarev functions that optimality is maintained when two functions are composed in an appropriate manner. The resulting ZOLO-PD has a convergence rate up to 17, in contrast to the cubic convergence rate for QDWH. This comes at the price of higher arithmetic costs and memory footprint. These extra floating-point operations can, however, be processed in an embarrassingly parallel fashion. We demonstrate performance using up to 102,400 cores on two supercomputers. We demonstrate that, in the presence of a large number of processing units, ZOLO-PD is able to outperform QDWH by up to 2.3× speedup, especially in situations where QDWH runs out of work, for instance, in the strong scaling mode of operation.
  • A k-mean characteristic function for optimizing STA/LTA based detection of microseismic events

    Akram, Jubran; Peter, Daniel; Eaton, David (GEOPHYSICS, Society of Exploration Geophysicists, 2019-04-13) [Article]
    Event detection is an essential component of microseismic data analysis. This process is typically carried out using a short- and long-term average ratio (STA/LTA) method, which is simple and computationally efficient but often yields inconsistent results for noisy datasets. Here, we aim to optimize the performance of the STA/LTA method by testing different input forms of three-component waveform data and different characteristic functions (CFs), including a proposed k-mean CF. These tests are evaluated using receiver operating characteristic (ROC) analysis and compared based on synthetic and field data examples. Our analysis shows that the STA/LTA method using a k-mean CF improves the detection sensitivity and yields more robust event detection on noisy datasets than some previous approaches. In addition, microseismic events are detected efficiently on field data examples using the same detection threshold obtained from the ROC analysis on synthetic data examples. We recommend the use of Youden index based on ROC analysis using a training subset, extracted from the continuous data, to further improve the detection threshold for field microseismic data.
  • Reproducibility in Benchmarking Parallel Fast Fourier Transform based Applications

    Aseeri, Samar; Muite, Benson K.; Takahashi, Daisuke (Companion of the 2019 ACM/SPEC International Conference on Performance Engineering - ICPE '19, Association for Computing Machinery (ACM), 2019-04-05) [Conference Paper]
    An overview of concerns observed in allowing for reproducibility in parallel applications that heavily depend on the three dimensional distributed memory fast Fourier transform are summarized. Suggestions for reproducibility categories for benchmark results are given.
  • mpi4py-fft: Parallel Fast Fourier Transforms with MPI for Python

    Mortensen, Mikael; Dalcin, Lisandro; Keyes, David E. (Journal of Open Source Software, The Open Journal, 2019-04-02) [Article]
  • A QDWH-based SVD software framework on distributed-memory manycore systems

    Sukkari, Dalal E.; Ltaief, Hatem; Esposito, Aniello; Keyes, David E. (ACM Transactions on Mathematical Software, Association for Computing Machinery (ACM), 2019-04-01) [Article]
    This article presents a high-performance software framework for computing a dense SVD on distributed-memory manycore systems. Originally introduced by Nakatsukasa et al. (2010) and Nakatsukasa and Higham (2013), the SVD solver relies on the polar decomposition using the QR Dynamically Weighted Halley algorithm (QDWH). Although the QDWH-based SVD algorithm performs a significant amount of extra floating-point operations compared to the traditional SVD with the one-stage bidiagonal reduction, the inherent high level of concurrency associated with Level 3 BLAS compute-bound kernels ultimately compensates for the arithmetic complexity overhead. Using the ScaLAPACK two-dimensional block cyclic data distribution with a rectangular processor topology, the resulting QDWH-SVD further reduces excessive communications during the panel factorization, while increasing the degree of parallelism during the update of the trailing submatrix, as opposed to relying on the default square processor grid. After detailing the algorithmic complexity and the memory footprint of the algorithm, we conduct a thorough performance analysis and study the impact of the grid topology on the performance by looking at the communication and computation profiling trade-offs. We report performance results against state-of-the-art existing QDWH software implementations (e.g., Elemental) and their SVD extensions on large-scale distributed-memory manycore systems based on commodity Intel x86 Haswell processors and Knights Landing (KNL) architecture. The QDWH-SVD framework achieves up to 3/8-fold speedups on the Haswell/KNL-based platforms, respectively, against ScaLAPACK PDGESVD and turns out to be a competitive alternative for well- and ill-conditioned matrices. We finally come up herein with a performance model based on these empirical results. Our QDWH-based polar decomposition and its SVD extension are freely available at and, respectively, and have been integrated into the Cray Scientific numerical library LibSci v17.11.1.
  • Batched triangular dense linear algebra kernels for very small matrix sizes on GPUs

    Charara, Ali; Keyes, David E.; Ltaief, Hatem (ACM Transactions on Mathematical Software, Association for Computing Machinery, 2019-04-01) [Article]
    Batched dense linear algebra kernels are becoming ubiquitous in scientific applications, ranging from tensor contractions in deep learning to data compression in hierarchical low-rank matrix approximation. Within a single API call, these kernels are capable of simultaneously launching up to thousands of similar matrix computations, removing the expensive overhead of multiple API calls while increasing the occupancy of the underlying hardware. A challenge is that for the existing hardware landscape (x86, GPUs, etc.), only a subset of the required batched operations is implemented by the vendors, with limited support for very small problem sizes. We describe the design and performance of a new class of batched triangular dense linear algebra kernels on very small data sizes (up to 256) using single and multiple GPUs. By deploying recursive formulations, stressing the register usage, maintaining data locality, reducing threads synchronization, and fusing successive kernel calls, the new batched kernels outperform existing state-of-the-art implementations.
  • Application of High Performance Asynchronous Acoustic Wave Equation Stencil Solver into a Land Survey

    Abdelkhalak, Rached; Akbudak, Kadir; Etienne, Vincent; Ltaief, Hatem; Tonellot, Thierry; Keyes, David E. (SPE Middle East Oil and Gas Show and Conference, Society of Petroleum Engineers (SPE), 2019-03-13) [Conference Paper]
    This paper describes the application of high performance asynchronous stencil computations for 3D acoustic modeling on a synthetic land survey. Using the Finite-Difference Time-Domain (FDTD) method, a parallel Multicore Wavefront Diamond-tiling (MWD) stencil kernel (Malas et al. 2015, Malas et al. 2017) drives the high performance execution using temporal blocking to maximize data locality, while reducing the expensive horizontal data movement. As absorbing boundary conditions, we use Convolutional Perfectly Matched Layer (CPML), which have to be redesigned to not interrupt the asynchronous execution flow engendered by the MWD stencil kernel for the inner-domain points. The main idea consists in weakening the data dependencies by moving the CPML computations into the inner-computational loop of the MWD stencil kernel (Akbudak et al. 2019). In addition to handling the absorbing boundary conditions, applying the asynchronous MWD with CPML kernels to a realistic land survey requires the extraction of the wavefield value at each receiver position. We revisit the default extraction process and make it also compliant with the overall asynchrony of the 3D acoustic modeling. We report performance improvement up to 24% against the standard spatial blocking algorithm on Intel multicore chips using the synthetic land survey, which is representative of an area of interest in Saudi Arabia. While these results concur with previous performance campaign assessment, we can actually produce and assess the resulting 3D shot gather accuracy. To our knowledge, this is the first time the effectiveness of asynchronous MWD stencil kernel with CPML absorbing boundary conditions is demonstrated in an industrial seismic application.
  • Fast parallel multidimensional FFT using advanced MPI

    Dalcin, Lisandro; Mortensen, Mikael; Keyes, David E. (Journal of Parallel and Distributed Computing, Elsevier BV, 2019-03-11) [Article]
    We present a new method for performing global redistributions of multidimensional arrays essential to parallel fast Fourier (or similar) transforms. Traditional methods use standard all-to-all collective communication of contiguous memory buffers, thus necessarily requiring local data realignment steps intermixed in-between redistribution and transform steps. Instead, our method takes advantage of subarray datatypes and generalized all-to-all scatter/gather from the MPI-2 standard to communicate discontiguous memory buffers, effectively eliminating the need for local data realignments. Despite generalized all-to-all communication of discontiguous data being generally slower, our proposal economizes in local work. For a range of strong and weak scaling tests, we found the overall performance of our method to be on par and often better than well-established libraries like MPI-FFTW, P3DFFT, and 2DECOMP&FFT. We provide compact routines implemented at the highest possible level using the MPI bindings for the C programming language. These routines apply to any global redistribution, over any two directions of a multidimensional array, decomposed on arbitrary Cartesian processor grids (1D slabs, 2D pencils, or even higher-dimensional decompositions). The high level implementation makes the code easy to read, maintain, and eventually extend. Our approach enables for future speedups from optimizations in the internal datatype handling engines within MPI implementations.
  • Hierarchical Matrix Operations on GPUs

    Boukaram, Wagih Halim; Turkiyyah, George; Keyes, David E. (ACM Transactions on Mathematical Software, Association for Computing Machinery (ACM), 2019-02-25) [Article]
    Hierarchical matrices are space- and time-efficient representations of dense matrices that exploit the low-rank structure of matrix blocks at different levels of granularity. The hierarchically low-rank block partitioning produces representations that can be stored and operated on in near-linear complexity instead of the usual polynomial complexity of dense matrices. In this article, we present high-performance implementations of matrix vector multiplication and compression operations for the H2 variant of hierarchical matrices on GPUs. The H2 variant exploits, in addition to the hierarchical block partitioning, hierarchical bases for the block representations and results in a scheme that requires only O(n) storage and O(n) complexity for the mat-vec and compression kernels. These two operations are at the core of algebraic operations for hierarchical matrices, the mat-vec being a ubiquitous operation in numerical algorithms while compression/recompression represents a key building block for other algebraic operations, which require periodic recompression during execution. The difficulties in developing efficient GPU algorithms come primarily from the irregular tree data structures that underlie the hierarchical representations, and the key to performance is to recast the computations on flattened trees in ways that allow batched linear algebra operations to be performed. This requires marshaling the irregularly laid out data in a way that allows them to be used by the batched routines. Marshaling operations only involve pointer arithmetic with no data movement and as a result have minimal overhead. Our numerical results on covariance matrices from 2D and 3D problems from spatial statistics show the high efficiency our routines achieve over 550GB/s for the bandwidth-limited matrix-vector operation and over 850GFLOPS/s in sustained performance for the compression operation on the P100 Pascal GPU.
  • Scaling Distributed Machine Learning with In-Network Aggregation

    Sapio, Amedeo; Canini, Marco; Ho, Chen-Yu; Nelson, Jacob; Kalnis, Panos; Kim, Changhoon; Krishnamurthy, Arvind; Moshref, Masoud; Ports, Dan R. K.; Richtarik, Peter (arXiv, 2019-02-22) [Preprint]
    Training complex machine learning models in parallel is an increasinglyimportant workload. We accelerate distributed parallel training by designing acommunication primitive that uses a programmable switch dataplane to execute akey step of the training process. Our approach, SwitchML, reduces the volume ofexchanged data by aggregating the model updates from multiple workers in thenetwork. We co-design the switch processing with the end-host protocols and MLframeworks to provide a robust, efficient solution that speeds up training byup to 300%, and at least by 20% for a number of real-world benchmark models.
  • Advanced Hepatitis C Virus Replication PDE Models within a Realistic Intracellular Geometric Environment

    Knodel, Markus; Targett-Adams, Paul; Grillo, Alfio; Herrmann, Eva; Wittum, Gabriel (International Journal of Environmental Research and Public Health, MDPI AG, 2019-02-13) [Article]
    The hepatitis C virus (HCV) RNA replication cycle is a dynamic intracellular process occurring in three-dimensional space (3D), which is difficult both to capture experimentally and to visualize conceptually. HCV-generated replication factories are housed within virus-induced intracellular structures termed membranous webs (MW), which are derived from the Endoplasmatic Reticulum (ER). Recently, we published 3D spatiotemporal resolved diffusion–reaction models of the HCV RNA replication cycle by means of surface partial differential equation (sPDE) descriptions. We distinguished between the basic components of the HCV RNA replication cycle, namely HCV RNA, non-structural viral proteins (NSPs), and a host factor. In particular, we evaluated the sPDE models upon realistic reconstructed intracellular compartments (ER/MW). In this paper, we propose a significant extension of the model based upon two additional parameters: different aggregate states of HCV RNA and NSPs, and population dynamics inspired diffusion and reaction coefficients instead of multilinear ones. The combination of both aspects enables realistic modeling of viral replication at all scales. Specifically, we describe a replication complex state consisting of HCV RNA together with a defined amount of NSPs. As a result of the combination of spatial resolution and different aggregate states, the new model mimics a cis requirement for HCV RNA replication. We used heuristic parameters for our simulations, which were run only on a subsection of the ER. Nevertheless, this was sufficient to allow the fitting of core aspects of virus reproduction, at least qualitatively. Our findings should help stimulate new model approaches and experimental directions for virology.
  • Likelihood approximation with hierarchical matrices for large spatial datasets

    Litvinenko, Alexander; Sun, Ying; Genton, Marc G.; Keyes, David E. (Computational Statistics & Data Analysis, Elsevier BV, 2019-02-12) [Article]
    The unknown parameters (variance, smoothness, and covariance length) of a spatial covariance function can be estimated by maximizing the joint Gaussian log-likelihood function. To overcome cubic complexity in the linear algebra, the discretized covariance function is approximated in the hierarchical (H-) matrix format. The H-matrix format has a log-linear computational cost and O(knlogn) storage, where the rank k is a small integer, and n is the number of locations. The H-matrix technique can approximate general covariance matrices (also inhomogeneous) discretized on a fairly general mesh that is not necessarily axes-parallel, and neither the covariance matrix itself nor its inverse has to be sparse. It is investigated how the H-matrix approximation error influences the estimated parameters. Numerical examples with Monte Carlo simulations, where the true values of the unknown parameters are given, and an application to soil moisture data with unknown parameters are presented. The C, C++ codes and data are freely available.

View more