Show simple item record

dc.contributor.authorToye, Habib
dc.contributor.authorKortas, Samuel
dc.contributor.authorZhan, Peng
dc.contributor.authorHoteit, Ibrahim
dc.date.accessioned2018-04-30T06:58:22Z
dc.date.available2018-04-30T06:58:22Z
dc.date.issued2018-04-26
dc.identifier.citationToye H, Kortas S, Zhan P, Hoteit I (2018) A Fault-Tolerant HPC Scheduler Extension for Large and Operational Ensemble Data Assimilation:Application to the Red Sea. Journal of Computational Science. Available: http://dx.doi.org/10.1016/j.jocs.2018.04.018.
dc.identifier.issn1877-7503
dc.identifier.doi10.1016/j.jocs.2018.04.018
dc.identifier.urihttp://hdl.handle.net/10754/627684
dc.description.abstractA fully parallel ensemble data assimilation and forecasting system has been developed for the Red Sea based on the MIT general circulation model (MITgcm) to simulate the Red Sea circulation and the Data Assimilation Research Testbed (DART) ensemble assimilation software. An important limitation of operational ensemble assimilation systems is the risk of ensemble members’ collapse. This could happen in those situations when the filter update step imposes large corrections on one, or more, of the forecasted ensemble members that are not fully consistent with the model physics. Increasing the ensemble size is expected to improve the assimilation system performances, but obviously increases the risk of members’ collapse. Hardware failure or slow numerical convergence encountered for some members should also occur more frequently. In this context, the manual steering of the whole process appears as a real challenge and makes the implementation of the ensemble assimilation procedure uneasy and extremely time consuming.This paper presents our efforts to build an efficient and fault-tolerant MITgcm-DART ensemble assimilation system capable of operationally running thousands of members. Built on top of Decimate, a scheduler extension developed to ease the submission, monitoring and dynamic steering of workflow of dependent jobs in a fault-tolerant environment, we describe the assimilation system implementation and discuss in detail its coupling strategies. Within Decimate, only a few additional lines of Python is needed to define flexible convergence criteria and to implement any necessary actions to the forecast ensemble members, as for instance (i) restarting faulty job in case of job failure, (ii) changing the random seed in case of poor convergence or numerical instability, (iii) adjusting (reducing or increasing) the number of parallel forecasts on the fly, (iv) replacing members on the fly to enrich the ensemble with new members, etc.We demonstrate the efficiency of the system with numerical experiments assimilating real satellites sea surface height and temperature observations in the Red Sea.
dc.description.sponsorshipThe research reported in this manuscript was supported by King Abdullah University of Science and Technology (KAUST) and Saudi ARAMCO, and made use of the resources of the Supercomputing Core Laboratory of KAUST.
dc.publisherElsevier BV
dc.relation.urlhttp://www.sciencedirect.com/science/article/pii/S1877750317312905
dc.rightsNOTICE: this is the author’s version of a work that was accepted for publication in Journal of Computational Science. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Journal of Computational Science, [, , (2018-04-26)] DOI: 10.1016/j.jocs.2018.04.018 . © 2018. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectHigh Performance Computing
dc.subjectEnsemble Data Assimilation
dc.subjectBayesian Filtering
dc.subjectOperational Oceanography
dc.subjectRed Sea
dc.titleA Fault-Tolerant HPC Scheduler Extension for Large and Operational Ensemble Data Assimilation:Application to the Red Sea
dc.typeArticle
dc.contributor.departmentApplied Mathematics and Computational Science Program
dc.contributor.departmentBeacon Development Company
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
dc.contributor.departmentEarth Fluid Modeling and Prediction Group
dc.contributor.departmentEarth Science and Engineering Program
dc.contributor.departmentKAUST Supercomputing Laboratory (KSL)
dc.contributor.departmentPhysical Science and Engineering (PSE) Division
dc.contributor.departmentSupercomputing, Computational Scientists
dc.identifier.journalJournal of Computational Science
dc.eprint.versionPost-print
kaust.personToye, Habib
kaust.personKortas, Samuel
kaust.personZhan, Peng
kaust.personHoteit, Ibrahim
dc.date.published-online2018-04-26
dc.date.published-print2018-07


Files in this item

Thumbnail
Name:
1-s2.0-S1877750317312905-main.pdf
Size:
649.2Kb
Format:
PDF
Description:
Accepted Manuscript

This item appears in the following Collection(s)

Show simple item record