Show simple item record

dc.contributor.authorSalvaña, Mary Lai O.
dc.contributor.authorAbdulah, Sameh
dc.contributor.authorLtaief, Hatem
dc.contributor.authorSun, Ying
dc.contributor.authorGenton, Marc G.
dc.contributor.authorKeyes, David E.
dc.date.accessioned2022-04-17T06:55:53Z
dc.date.available2022-04-17T06:55:53Z
dc.date.issued2022
dc.identifier.urihttp://hdl.handle.net/10754/676271
dc.description.abstractGaussian geostatistical space-time modeling is an effective tool for performing statistical inference of field data evolving in space and time, generalizing spatial modeling alone at the cost of the greater complexity of operations and storage, and pushing geostatistical modeling even further into the arms of high-performance computing. It makes inferences for missing data by leveraging space-time measurements of one or more fields. We propose a highperformance implementation of a widely applied space-time model for large-scale systems using a two-level parallelization technique. At the inner level, we rely on state-of-the-art dense linear algebra libraries and parallel runtime systems to perform complex matrix operations required to evaluate the maximum likelihood estimation (MLE). At the outer level, we parallelize the optimization process using a distributed implementation of the particle swarm optimization (PSO) algorithm. At this level, parallelization is accomplished using MPI sub-communicators, such that the nodes in each subcommunicator perform a single MLE iteration at a time. To evaluate the effectiveness of the proposed methodology, we assess the accuracy of the newly implemented space-time model on a set of large-scale synthetic space-time datasets. Moreover, we use the proposed implementation to model two air pollution datasets from the Middle East and US regions with 550 spatial locations ×730 time slots and 945 spatial locations ×500 time slots, respectively. The evaluation shows that the proposed approach satisfies high prediction accuracy on both synthetic datasets and real particulate matter (PM) datasets in the context of the air pollution problem. We achieve up to 757.16 TFLOPS/s using 1024 nodes (75% of the peak performance) using 490𝐾 geospatial locations on a Cray XC40 system.
dc.language.isoen
dc.publisherACM
dc.relation.urlhttps://pasc22.pasc-conference.org/
dc.rightsThis is the accepted version of a paper accepted to The Platform for Advanced Scientific Computing (PASC) Conference. Archived with thanks to ACM.
dc.titleParallel Space-Time Likelihood Optimization for Air Pollution Prediction on Large-Scale Systems
dc.typeConference Paper
dc.contributor.departmentExtreme Computing Research Center (ECRC), King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
dc.contributor.departmentExtreme Computing Research Center
dc.contributor.departmentComputer, Electrical and Mathematical Science and Engineering (CEMSE) Division
dc.contributor.departmentStatistics Program
dc.contributor.departmentApplied Mathematics and Computational Science Program
dc.contributor.departmentOffice of the President
dc.conference.dateJune 27 to 29, 2022
dc.conference.nameThe Platform for Advanced Scientific Computing (PASC) Conference
dc.conference.locationBasel, Switzerland
dc.eprint.versionPost-print
dc.contributor.affiliationKing Abdullah University of Science and Technology (KAUST)
pubs.publication-statusAccepted
kaust.personSalvaña, Mary Lai O.
kaust.personAbdulah, Sameh
kaust.personLtaief, Hatem
kaust.personSun, Ying
kaust.personGenton, Marc G.
kaust.personKeyes, David E.
refterms.dateFOA2022-04-16T00:00:00Z


Files in this item

Thumbnail
Name:
salvana_paper (1).pdf
Size:
2.186Mb
Format:
PDF
Description:
Accepted Manuscript

This item appears in the following Collection(s)

Show simple item record