## Search

Now showing items 1-10 of 20

JavaScript is disabled for your browser. Some features of this site may not work without it.

Author

Keyes, David E. (20)

Ltaief, Hatem (8)Yokota, Rio (6)Litvinenko, Alexander (5)Chavez Chavez, Gustavo Ivan (3)View MoreDepartmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division (19)Extreme Computing Research Center (18)Applied Mathematics and Computational Science Program (17)Computer Science Program (13)Earth Science and Engineering Program (1)View MoreSubjecthierarchical matrices (2)Uncertainty Quantification (2)contamination (1)generalized PCE (1)graundwater flow (1)View MoreType
Poster (20)

Year (Issue Date)2017 (6)2015 (2)2014 (12)Item Availability
Open Access (20)

Now showing items 1-10 of 20

- List view
- Grid view
- Sort Options:
- Relevance
- Title Asc
- Title Desc
- Issue Date Asc
- Issue Date Desc
- Submit Date Asc
- Submit Date Desc
- Results Per Page:
- 5
- 10
- 20
- 40
- 60
- 80
- 100

Likelihood Approximation With Parallel Hierarchical Matrices For Large Spatial Datasets

Litvinenko, Alexander; Sun, Ying; Genton, Marc G.; Keyes, David E. (2017-11-01) [Poster]

The main goal of this article is to introduce the parallel hierarchical matrix library HLIBpro to the statistical community.
We describe the HLIBCov package, which is an extension of the HLIBpro library for approximating large covariance matrices and maximizing likelihood functions. We show that an approximate Cholesky factorization of a dense matrix of size $2M\times 2M$ can be computed on a modern multi-core desktop in few minutes.
Further, HLIBCov is used for estimating the unknown parameters such as the covariance length, variance and smoothness parameter of a Matérn covariance function by maximizing the joint Gaussian log-likelihood function. The computational bottleneck here is expensive linear algebra arithmetics due to large and dense covariance matrices. Therefore covariance matrices are approximated in the hierarchical ($\H$-) matrix format with computational cost $\mathcal{O}(k^2n \log^2 n/p)$ and storage $\mathcal{O}(kn \log n)$, where the rank $k$ is a small integer (typically $k<25$), $p$ the number of cores and $n$ the number of locations on a fairly general mesh. We demonstrate a synthetic example, where the true values of known parameters are known.
For reproducibility we provide the C++ code, the documentation, and the synthetic data.

Risk assessment of salt contamination of groundwater under uncertain aquifer properties

Litvinenko, Alexander; Keyes, David E.; Logashenko, Dmitry; Tempone, Raul; Wittum, Gabriel (2017-10-01) [Poster]

One of the central topics in hydrogeology and environmental science is the investigation of salinity-driven groundwater flow in heterogeneous porous media. Our goals are to model and to predict pollution of water resources.
We simulate a density driven groundwater flow with uncertain porosity and permeability. This strongly non-linear model describes the unstable transport of salt water with building ‘fingers’-shaped patterns. The computation requires
a very fine unstructured mesh and, therefore, high computational resources.
We run the highly-parallel multigrid solver, based on ug4, on supercomputer Shaheen II. A MPI-based parallelization is done in the geometrical as well as in the stochastic spaces. Every scenario is computed on 32 cores and
requires a mesh with ~8M grid points and 1500 or more time steps. 200 scenarios are computed concurrently. The total number of cores in parallel computation is 200x32=6400. The main goal of this work is to estimate propagation of uncertainties through the model, to investigate sensitivity of the solution to the input uncertain parameters. Additionally, we demonstrate how the multigrid ug4-based solver can be applied as a black-box in the uncertainty quantification framework.

Likelihood Approximation With Parallel Hierarchical Matrices For Large Spatial Datasets

Litvinenko, Alexander; Sun, Ying; Genton, Marc G.; Keyes, David E. (2017-03-13) [Poster]

Batched Triangular DLA for Very Small Matrices on GPUs

Charara, Ali; Keyes, David E.; Ltaief, Hatem (2017-03-13) [Poster]

In several scientific applications, like tensor contractions in deep learning computation or data compression in hierarchical low rank matrix approximation, the bulk of computation typically resides in performing thousands of independent dense linear algebra operations on very small matrix sizes (usually less than 100). Batched dense linear algebra kernels are becoming ubiquitous for such scientific computations. Within a single API call, these kernels are capable of simultaneously launching a large number of similar matrix computations, removing the expensive overhead of multiple API calls while increasing the utilization of the underlying hardware.

Earthquake Ground Motion Analysis and extreme computing on multi-Petaflops machine

De Martin, Florent; Dupros, Fabrice; Thierry, Philippe; Paciucci, Gabriele; Sochala, Pierre; Boulahya, Faïza; Benaichouche, Abed; Chaljub, Emmanuel; Hadri, Bilel; Ltaief, Hatem; Keyes, David E. (2017-03-13) [Poster]

Implicit Unstructured Aerodynamics on Emerging Multi- and Many-Core HPC Architectures

Al Farhan, Mohammed; Kaushik, Dinesh K.; Keyes, David E. (2017-03-13) [Poster]

Shared memory parallelization of PETSc-FUN3D, an unstructured tetrahedral mesh Euler code previously characterized for distributed memory Single Program, Multiple Data (SPMD) for thousands of nodes, is hybridized with shared memory Single Instruction, Multiple Data (SIMD) for hundreds of threads per node. We explore thread-level performance optimizations on state-of-the-art multi- and many-core Intel processors, including the second generation of Xeon Phi, Knights Landing (KNL). We study the performance on the KNL with different configurations of memory and cluster modes, with code optimizations to minimize indirect addressing and enhance the cache locality. The optimizations employed are expected to be of value other unstructured applications as many-core architecture evolves.

Scalable Hierarchical Algorithms for stochastic PDEs and UQ

Litvinenko, Alexander; Chavez Chavez, Gustavo Ivan; Keyes, David E.; Ltaief, Hatem; Yokota, Rio (2015-01-07) [Poster]

H-matrices and Fast Multipole (FMM) are powerful methods to approximate linear operators coming from partial differential and integral equations as well as speed up computational cost from quadratic or cubic to log-linear (O(n log n)), where n number of degrees of freedom in the discretization. The storage is reduced to the log-linear as well. This hierarchical structure is a good starting point for parallel algorithms. Parallelization on shared and distributed memory systems was pioneered by Kriemann [1,2]. Since 2005, the area of parallel architectures and software is developing very fast. Progress in GPUs and Many-Core Systems (e.g. XeonPhi with 64 cores) motivated us to extend work started in [1,2,7,8].

Scalable Hierarchical Algorithms for stochastic PDEs and Uncertainty Quantification

Litvinenko, Alexander; Chavez Chavez, Gustavo Ivan; Keyes, David E.; Ltaief, Hatem; Yokota, Rio (2015-01-05) [Poster]

H-matrices and Fast Multipole (FMM) are powerful methods to approximate linear operators coming from partial differential and integral equations as well as speed up computational cost from quadratic or cubic to log-linear (O(n log n)), where n number of degrees of freedom in the discretization. The storage is reduced to the log-linear as well. This hierarchical structure is a good starting point for parallel algorithms. Parallelization on shared and distributed memory systems was pioneered by R. Kriemann, 2005. Since 2005, the area of parallel architectures and software is developing very fast. Progress in GPUs and Many-Core Systems (e.g. XeonPhi with 64 cores) motivated us to extend work started in [1,2,7,8].

Enabling High Performance Large Scale Dense Problems through KBLAS

Abdelfattah, Ahmad; Keyes, David E.; Ltaief, Hatem (2014-05-04) [Poster]

KBLAS (KAUST BLAS) is a small library that provides highly optimized BLAS routines on systems accelerated with GPUs. KBLAS is entirely written in CUDA C, and targets NVIDIA GPUs with compute capability 2.0 (Fermi) or higher. The current focus is on level-2 BLAS routines, namely the general matrix vector multiplication (GEMV) kernel, and the symmetric/hermitian matrix vector multiplication (SYMV/HEMV) kernel. KBLAS provides these two kernels in all four precisions (s, d, c, and z), with support to multi-GPU systems. Through advanced optimization techniques that target latency hiding and pushing memory bandwidth to the limit, KBLAS outperforms state-of-the-art kernels by 20-90% improvement. Competitors include CUBLAS-5.5, MAGMABLAS-1.4.0, and CULAR17. The SYMV/HEMV kernel from KBLAS has been adopted by NVIDIA, and should appear in CUBLAS-6.0. KBLAS has been used in large scale simulations of multi-object adaptive optics.

Pipelining Computational Stages of the Tomographic Reconstructor for Multi-Object Adaptive Optics on a Multi?GPU System

Charara, Ali; Ltaief, Hatem; Gratadour, Damien; Keyes, David E.; Sevin, Arnaud; Abdelfattah, Ahmad; Gendron, Eric; Morel, Carine; Vidal, Fabrice (2014-05-04) [Poster]

European Extreme Large Telescope (E-ELT) is a high priority project in ground based astronomy that aims at constructing the largest telescope ever built. MOSAIC is an instrument proposed for E-ELT using Multi- Object Adaptive Optics (MOAO) technique for astronomical telescopes, which compensates for effects of atmospheric turbulence on image quality, and operates on patches across a large FoV.

The export option will allow you to export the current search results of the entered query to a file. Different formats are available for download. To export the items, click on the button corresponding with the preferred download format.

By default, clicking on the export buttons will result in a download of the allowed maximum amount of items. For anonymous users the allowed maximum amount is 50 search results.

To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.