dc.contributor.author Abdulah, Sameh dc.contributor.author Ltaief, Hatem dc.contributor.author Sun, Ying dc.contributor.author Genton, Marc G. dc.contributor.author Keyes, David E. dc.date.accessioned 2019-04-28T13:12:24Z dc.date.available 2019-04-28T13:12:24Z dc.date.issued 2018-04-24 dc.identifier.uri http://hdl.handle.net/10754/632517 dc.description.abstract Maximum likelihood estimation is an important statistical technique for estimating missing data, for example in climate and environmental applications, which are usually large and feature data points that are irregularly spaced. In particular, the Gaussian log-likelihood function is the \emph{de facto} model, which operates on the resulting sizable dense covariance matrix. The advent of high performance systems with advanced computing power and memory capacity have enabled full simulations only for rather small dimensional climate problems, solved at the machine precision accuracy. The challenge for high dimensional problems lies in the computation requirements of the log-likelihood function, which necessitates ${\mathcal O}(n^2)$ storage and ${\mathcal O}(n^3)$ operations, where $n$ represents the number of given spatial locations. This prohibitive computational cost may be reduced by using approximation techniques that not only enable large-scale simulations otherwise intractable but also maintain the accuracy and the fidelity of the spatial statistics model. In this paper, we extend the Exascale GeoStatistics software framework (i.e., ExaGeoStat) to support the Tile Low-Rank (TLR) approximation technique, which exploits the data sparsity of the dense covariance matrix by compressing the off-diagonal tiles up to a user-defined accuracy threshold. The underlying linear algebra operations may then be carried out on this data compression format, which may ultimately reduce the arithmetic complexity of the maximum likelihood estimation and the corresponding memory footprint. Performance results of TLR-based computations on shared and distributed-memory systems attain up to 13X and 5X speedups, respectively, compared to full accuracy simulations using synthetic and real datasets (up to 2M), while ensuring adequate prediction accuracy. dc.publisher arXiv dc.relation.url http://arxiv.org/abs/1804.09137v1 dc.relation.url http://arxiv.org/pdf/1804.09137v1 dc.rights Archived with thanks to arXiv dc.title Tile Low-Rank Approximation of Large-Scale Maximum Likelihood Estimation on Manycore Architectures dc.type Preprint dc.contributor.department Extreme Computing Research Center dc.contributor.department Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division dc.contributor.department Statistics Program dc.contributor.department Applied Mathematics and Computational Science Program dc.eprint.version Pre-print dc.identifier.arxivid 1804.09137 kaust.person Abdulah, Sameh kaust.person Ltaief, Hatem kaust.person Sun, Ying kaust.person Genton, Marc G. kaust.person Keyes, David E. refterms.dateFOA 2019-04-29T06:53:30Z
﻿

Name:
1804.09137v1.pdf
Size:
2.731Mb
Format:
PDF
Description:
Preprint