Extreme Precipitation Return Levels for Multiple Durations on a Global Scale

Quantifying the magnitude and frequency of extreme precipitation events is key in translating climate observations to planning and engineering design. Past efforts have mostly focused on the estimation of daily extremes using gauge observations. Recent development of high-resolution global precipitation products, now allow estimation of global extremes. This research aims to quantitatively characterize the spatiotemporal behavior of precipitation extremes, by calculating extreme precipitation return levels for multiple durations on the global domain using the Multi-Source Weighted-Ensemble Precipitation (MSWEP) dataset. Both classical and novel extreme value distributions are used to provide an insight into the spatial patterns of precipitation extremes. Our results show that the traditional Generalized Extreme Value (GEV) distribution and Peak-Over-Threshold (POT) methods, which only use the largest events to estimate precipitation extremes, are not spatially coherent. The recently developed Metastatistical Extreme Value (MEV) distribution, that includes all precipitation events, leads to smoother spatial patterns of local extremes. While the GEV and POT methods predict a consistent shift from heavy to thin tails with increasing duration, the heaviness of the tail obtained with MEV was relatively unaffected by the precipitation duration. The generated extreme precipitation return levels and corresponding parameters are provided as the Global Precipitation EXtremes (GPEX) dataset. These data can be useful for studying the underlying physical processes causing the spatiotemporal variations of the heaviness of extreme precipitation distributions.

, and the US (e.g., Perica et al., 2015Perica et al., , 2018. However, many countries 11 and regions do not have sufficient local data available (Gründemann et al.,12 2018; Kidd et al., 2017;van de Giesen et al., 2014), such that spatially- focused on examining which type of distribution is most suitable to capture 24 the tail behavior of extreme precipitation (Cavanaugh and Gershunov, 2015; 25 Cavanaugh et al., 2015;Papalexiou et al., 2013). In addition, the spatial 26 patterns of the parameter that controls the tail decay have been studied for 27 the GEV distribution (Papalexiou and Koutsoyiannis, 2013;Ragulina and 28 Reitan, 2017), and the Generalized Pareto (GP) distribution (Serinaldi and 29 Kilsby, 2014). However, several issues remain to be addressed in order to 30 obtain global-domain extreme precipitation return levels: 1) the choice of 31 the dataset with associated uncertainties, 2) the focus on daily durations, 32 3) the choice of the time blocks over which block-maxima are determined, 33 and 4) the exploration of possible alternatives to the classical EVDs and the 34 associated uncertainty, especially with respect to the tail behavior. reanalyses, such as ERA-5 (Hersbach et al., 2020), JRA-55 (Kobayashi 53 et al., 2015), and MERRA-2 (Gelaro et al., 2017). However, reanaly-54 sis products tend to exhibit strong systematic biases in the magnitude 55 and frequency of precipitation (Decker et al., 2012;Liu et al., 2018;56 Ménégoz et al., 2013). 57 2. Global-scale analyses of precipitation extremes are generally based on 58 daily precipitation records (Cavanaugh et al., 2015;Koutsoyiannis, 59 2004a,b; Papalexiou and Koutsoyiannis, 2013;Papalexiou et al., 2013;60 Ragulina and Reitan, 2017; Serinaldi and Kilsby, 2014). In practice, 61 however, multiple durations are needed for the design of infrastruc-62 ture (e.g., Nissen and Ulbrich, 2017) or urban drainage networks (e.g.,   (MEV) distribution is fitted using all events with recorded precipita-95 tion instead of only the most severe. The inclusion of more events re-96 duces the uncertainty due to sampling effects, which is important when 97 dealing with short time series (Hu et al., 2020;Marani and Ignaccolo, 98 2015; Marra et al., 2018Marra et al., , 2019aMiniussi and Marani, 2020;Zorzetto 99 et al., 2016;Zorzetto and Marani, 2019). This is particularly advanta-100 geous when analyzing short remote sensing precipitation products, as 101 the commonly applied GEV requires many years of data to accurately 102 estimate the tail of the distribution (Papalexiou and Koutsoyiannis,103 2013). Additionally, GEV parameter estimation depends heavily on a 104 6 few large values, which makes it very sensitive to the possible presence 105 of outliers, a relatively common occurrence in remote sensing estimates 106 of precipitation amounts (Zorzetto and Marani, 2020). The GEV tail 107 behavior is mostly controlled by its shape parameter, which is very 108 sensitive to sampling effects and the choice of the method used for es-109 timation. To overcome these problems, some studies have suggested 110 to use one universal value of the shape parameter that is applicable to 111 the whole world Koutsoyiannis (2004a,b), or a shape parameter value 112 within a narrow range between exponential and heavy-tail behavior 113 (Papalexiou and Koutsoyiannis, 2013), or one shape parameter per re-114 gion, that is similar within climate types and elevation ranges (Ragulina 115 and Reitan, 2017). The estimation of the shape parameter is partic-116 ularly difficult with short data series, though crucial for the accurate 117 estimation of extremes.

118
In this study we contribute to overcome these issues by 1) using a dataset 119 that merges all three main sources of precipitation data, 2) estimating ex-120 tremes for several event durations, 3) using hydrological years in our analyses,  The global precipitation product used in this study is the Multi-Source 128 Weighted-Ensemble Precipitation (MSWEP-V2.2) dataset. MSWEP is par-129 ticularly suited for our purpose due to its global coverage, long temporal 130 span, high spatial and temporal resolution. We used data from 1 January dataset is one of the few precipitation products with daily (as opposed to 136 monthly) gauge corrections, applied using a scheme that accounts for gauge 137 reporting times (Beck et al., 2019b). MSWEP has shown robust performance 138 compared to other widely used precipitation datasets (e.g., Alijanian et al., 139 2017;Bai and Liu, 2018;Beck et al., 2017Beck et al., , 2019aCasson et al., 2018;Hu 140 et al., 2020;Sahlu et al., 2017;Satgé et al., 2019;Zhang et al., 2019) occasionally result in implausible precipitation values. Therefore, we imple-147 mented a three-step quality control procedure of the 3-hourly data prior to 148 the analysis. We first discarded negative values, which are physically impos-149 8 sible. The second step was to discard outliers, which we defined as values 150 deviating from the mean by more than 30 standard deviations. We also dis-151 carded data surrounding the outliers for the same time step using a 11 × 11 152 grid-cell window, as erroneous gauge observations may have influenced sur-153 rounding cells in the production of the MSWEP dataset. The third step was 154 to remove years with > 30 discarded days or < 5 'wet' 3-hourly periods, 155 identified using a threshold of 0.2 mm 3h −1 following Wasko et al. (2015).  The durations we selected for our analysis are 3, 6, 12 and 24 hours, and 162 2, 3, 5 and 10 days. In order to create statistically-independent precipitation 163 events for multiple durations, we first separated 3-hourly events following the 164 declustering method to limit the autocorrelation of the samples described in  hydrological year, given the regional variability in the climatological precip-171 itation seasonality. We therefore developed an uniform way to define the 172 hydrological year. To avoid splitting one rainy season over two different 173 9 years, we computed the median of the monthly precipitation for each grid-174 cell, and defined the start of the hydrological year to be the first day of the 175 driest month. Supplementary Material Figure S1a shows the starting month 176 of the hydrological year as determined by this method. These data are also 177 available in the GPEX dataset  distributions. Annual (hydrological year) maxima were used to estimate the 198 three parameters of the GEV using the L-moments approach, because of its 199 robust performance for small samples (Hosking, 1990). The GEV cumulative 200 distribution function (CDF) is given by: with location parameter µ ∈ (−∞, ∞), scale parameter σ > 0, and shape 202 parameter ξ ∈ (−∞, ∞). The annual extremes estimated by GEV are trans-203 lated into those of the parent distribution, following Koutsoyiannis (2004a, 204 equation 3).

205
As a second EV model we use a Peaks Over Threshold approach, de-206 scribing precipitation accumulations exceeding a high threshold using a GP 207 distribution, while modelling the frequency of threshold exceedances using a 208 Poisson point process (Coles, 2001;Davison and Smith, 1990). This frame-209 work also yields GEV as the resulting extreme value distribution, which is 210 then used to determine the quantile corresponding to a given return period.

211
The GP CDF is given by: where y > 0 are precipitation excesses over the threshold, with β > 0 and hydrological year separately, based on all wet events using the PWM method 235 (Greenwood et al., 1979) as done in Zorzetto et al. (2016). The MEV-Weibull 236 CDF is given by: where j is the hydrological year (j = 1, 2, . . . , M ), C j > 0 is the Weibull scale 238 parameter, w j > 0 is the Weibull shape parameter, and n j is the number of  It should be noted that the methods we applied in this study do not 241 use any parameter bounds. Although Papalexiou and Koutsoyiannis (2013) 242 argued that the GEV shape parameter of daily precipitation lies between ex-243 ponential and heavy-tail behavior, this is not used as additional information

275
The MEV distribution assumes that precipitation events are Weibull-276 distributed. The tail decay of this distribution is controlled by its shape 277 parameter: for w < 1 its tail behavior is "sub-exponential", i.e., heavier than 278 that of an exponential (recovered for w = 1), albeit with a characteristic 279 scale (Laherrere and Sornette, 1998;Wilson and Toumi, 2005).

288
In an effort to compare the heaviness between the distributions, we have 289 come up with a measure of heaviness that is based on the return levels them-  10-year return level can be described as follows: Where b is the difference between the 100-year and 10-year return level, a of thin tails. For pure exponential tails it holds that a = 0. The value for 296 a is highly dependent on the local precipitation systems, so we defined the 297 heaviness amplification factor h T10−T100−T1000 to be a normalization of a: In words, the meaning of h T10−T100−T1000 is the fractional additional in-299 crease between T1000 and T100 that is more than the increase that could be 300 expected from a pure exponentially tailed distribution. A distribution has a 301 heavy tail for h > 0 and a thin tail for h < 0. Here, we chose a range for the 302 heaviness metric over large return periods from 10 to 1000 years, since the 303 1000-year return levels are known to be influenced by the distribution choice 304 (e.g., Rajulapati et al., 2020) and that is precisely what we wanted to com-

305
pare. Yet, it should be noted that this metric may easily be adjusted to other 306 return periods and other factors between the return periods. For GEV and

307
POT the heaviness metric is independent of the return period range as long 308 as the return periods are a factor 10 apart. Although for MEV this heaviness 309 metric is only valid for the return period range over which it is computed, 310 using other ranges (T2-T20-T200 and T5-T50-T500) did not yield significant 311 differences ( Figure S6).  Figure S1c for MEV. We found that in the case 318 of GEV quantiles, the fraction of sites characterized by differences within 319 ±0.5 % is larger than that observed for MEV. When the hydrological year 320 starts in the winter months, the hydrological year is only shifted by a few 321 months. In such instances, the annual maxima mostly stay the same between 322 the calendar and hydrological years, though the included events could differ.

323
For GEV this means that for many cells there is almost no difference in the 324 T1000 estimates, whereas for MEV the difference is small.

325
On the other hand, when the offset with a calendar year is approximately as measured by the wider 5th to 95th percentile interval). This suggests that 336 regional sensitivity to the definition of block maxima can be quite significant 337 for the GEV approach.  , while MEV appears less sensitive to these local corrections.

367
In order to study the ability of the three distributions to capture the 368 spatial coherence of precipitation extremes, we selected several case study 369 areas. They collectively cover a wide range of climates and domain sizes, the 370 locations of which can be found in Figure 3a. Within a single case study 371 area, we expect the precipitation estimates to be statistically homogeneous

561
We also conclude that both GEV and POT generally underestimate the 562 observed extremes, whereas MEV overestimates them ( Figure 5). This occurs 563 particularly for long-duration extremes and large return periods. We do 564 consider it likely, however, that the results could be improved, for instance 565 by changing the event threshold or by fitting the Weibull distribution over 566 two or more years for dry areas (Miniussi and Marani, 2020), so as to reduce 567 inter-annual variability of the parameters due to samples of limited length.

568
Our results suggest that this issue is particularly relevant at the longest .

572
The data generated for this study are openly available as the GPEX  The GPEX dataset is available at the 4TU repository Gründemann et al.   year is different from the calendar year (i.e., the hydrological year does not start in January, see Figure S1a). A negative difference indicates that the return period estimate is larger using hydrological years, whereas a positive difference indicates that the return period estimate is larger using calendar years.

Threshold Analysis for POT
There are many different ways to select an appropriate threshold for the Peak Over Threshold (POT) analysis, such as a value over a specific threshold, a percentage, or an average number of events per year. We refer to Caeiro and Gomes (2016); Langousis, Mamalakis, Puliga, and Deidda (2016) for recent overviews of such threshold selection. Which method is the most effective remains to be an unanswered question. For our global-scale application, we have analyzed to take on average 1 to 5 events per year. This fixed number of events, as opposed to the events over a predefined threshold, ensures that the number of events per year remains constant for all durations, and allows for analyses on the global domain.
The 100-year return levels for all durations and thresholds are shown in Figure S3a. The boxplot shows minimal difference in the T 100 estimates for the different thresholds for the short durations, and a minor difference for the longer durations, namely that the estimated T 100 are slightly larger for inclusion of less events per year. Overall the figure shows only slight differences, so more information has to be considered for accurate threshold selection.
A comparison of the three parameters for all durations and thresholds is shown in Figure S3b-d. The location and scale parameters in S3b and c show very similar results for the different thresholds. The shape parameter in S3d, however, has more variation for the different POT-thresholds. For all durations, the variability of the shape parameter decreases for the inclusion of more events per year, so more events to fit the distribution to. For durations between 3 and 48 hours, the shape slightly increases for the inclusion of more events per year. In other words, the shape is lower for on average 1 event per year, and higher for on average 5 events per year for durations between 3 and 48 hours. For 72 hours, the median shape parameter of the different thresholds is constant, though the variability decreases for the inclusion of more events. For the 5 and 10-day durations (120 and 240 hours, respectively), the shape parameter slightly decreases for the inclusion of more (non-extreme) events. The general pattern of the GP shape parameter is similar to that of the GEV, as they both show a decreasing shape parameter for increasing durations.
Previous studies that have looked into the GP shape parameter on the global domain focused on daily durations. Papalexiou and Koutsoyiannis (2013) estimated the mode of the shape for daily precipitation as 0.134, but displaying a large spread. Serinaldi and Kilsby (2014) estimated the shape for four different seasons based on 1898 stations with more than 100 years of data. They found the shape depends on the season and varies between 0.061 and 0.097, also displaying a large spread. Furthermore, they found that if you lower the threshold and include more non-extreme events, the shape parameter is higher. Our results are similar to that of Serinaldi and Kilsby (2014), as the median for daily precipitation for the different thresholds varies between 0.711 and 0.918, and we also found that the inclusion of more non-extreme events leads to a higher shape parameter for daily precipitation.
In short, the above analysis shows that the threshold selection is of minor influence on the estimation of the 100-year return level. Of the three underlying parameters, the threshold selection has minimal influence on the location and scale parameter, though a much greater influence on the shape parameter. The variability of the shape parameter is the highest for the threshold of on average 1 event per year, and decreases when more non-extreme events are used to fit the distribution to. Furthermore, the shape parameter increases for the inclusion of more events for the shorter durations (3-hours to 2-days), remains constant for a 3-day duration, and decreases for the 5 and 10-day durations. Based on these analyses, we chose to show the threshold of on average 3 events per year in the main manuscript, as that threshold has the right balance of a low variability and not as much of an underestimation of the 100-year return levels for the longest durations. A shape parameter smaller than zero indicates a thin tail with an upper limit, a shape parameter larger than zero indicates a heavy power-law tail.

Shape Parameter
The GEV and MEV distributions are both flexible and able to describe different tail behaviors. The tail behavior the two distributions varies, see Section 2.2.2 in the main manuscript for an overview and Figure S4 for different combinations of scale and shape parameters. Maps of the shape parameter for the three distributions for daily duration are shown in Figure S5. The spatial patterns of the shape parameters are largely similar to those of the heaviness h T10−T100−T1000 , described in Section 3.3 and Figures 5  and 6 of the main manuscript. There are, however, some notable differences. For MEV, the mountainous areas are more defined, indicated by higher shape values (less heavy tail behavior). Looking at the shape parameter, we find, however, that the relationship of the Weibull shape parameter with elevation is more complicated. Lower shapes (heavier tails) are generally observed on the leeward side of large mountain ranges, and higher shapes (though still indicative of heavy-tail behavior) on the windward side that is dominated by orographically enhanced frontal precipitation. This is for example visible in the Rocky Mountains, Indonesia, and Norway, and corresponds to findings of Cavanaugh and Gershunov (2015, their Figure 5), who showed that exponential tails are observed in regions where extreme precipitation is predominantly generated by one type of system. MEV w=0.4, C=0.8, N=100 w=0.6, C=4, N=100 w=0.8, C=9, N=100 w=1, C=14.5, N=100 w=1.2, C=20, N=100 w=1.4, C=25, N=100 w=1.6, C=30, N=100 Figure S4: (a) Behavior of the shape (ξ ) and scale (σ ) parameters of the GEV distribution, with a constant location parameter (µ), and (b) behavior of the shape (w) and scale (C) parameters of the MEV-Weibull distribution, with a constant number of events (N). The results for MEV have been obtained with constant w, C and N parameters for each year. The values of the shape and scale parameter pairs have been chosen such that they all have a precipitation depth of approximately 100 mm for a 10-year return period. For MEV, the mean shape parameter of all yearly Weibull distributions is displayed. The colorbar min and max are based on the 1st and 99th percentile. For GEV and POT, red indicates a thin shape with an upper limit, white an exponential shape, and blue a heavy power-law shape. For MEV all median shapes are indicative of sub-exponential heavy shapes, though darker colors are heavier than lighter colors. T2-T20-T200   T5-T50-T500 T10-T100-T1000   Figure S13: The heaviness amplification factor h T10−T100−T1000 (Eq. 5) for 10-day precipitation calculated for different extreme value methods: (a) GEV, (b) POT, (c) MEV. Red indicates a thin tail, white an exponential tail, and blue a heavy tail. See section 2.2.2 for more information on the heaviness metric.

Dataset Usage Notes
The GPEX dataset created in this study is available at the 4TU repository . It provides openly accessible and readily available hydrologically relevant return levels of extreme precipitation estimates worldwide. It contains the precipitation estimates of the three extreme values distributions, the observed estimates, the parameters of the three distributions, as well as a few other variables used in this study (Table S1). Furthermore, the extreme precipitation estimates and parameters of the Gumbel distribution are included in the dataset. In this section we provide some possible uses of the dataset, and instructions and disclaimers for proper use, both for large or regional-scale usage as well as for a single cell or point-scale.

Large-Scale Applications
The GPEX dataset contains global scale extreme precipitation estimates and its parameters at a spatial resolution of 0.1 • , covering 3-hourly to 10-day durations. The dataset contains information about precipitation extremes for the entire Earth's land surface except Antarctica. The estimates of three distributions described in the main manuscript, as well as the return levels as observed, and estimated using the Gumbel distribution are included in the dataset. Among the three distributions, the traditional GEV and POT provide comparable large-scale average extremes, although differences can be substantial at smaller scales. When using the dataset at regional scales, we advise taking the average of the precipitation estimates, as neighboring cells could differ. Note that since only 38 years of data were available, the fitted model parameters and associated return values are subject to considerable uncertainty. Furthermore, we acknowledge that the use of just one dataset does not represent the true uncertainty in the generation of the dataset created. We do not think this affects our results for observed global spatial patterns significantly, but in a practical setting we recommend verifying the estimates with local observations if available, and to reproduce the precipitation return level estimates with a full uncertainty range estimation. The novel MEV distribution provides more spatially coherent patterns of the extremes. Its mean shape parameter for daily events captures the (heavy-)tail behavior, and follows orographic patterns. The extremes estimated by MEV are higher than those estimated by GEV and POT. However, for large return periods and long durations, MEV can overestimate the extremes, due to the small number of events available for the fitting. We, therefore, recommend analyzing the extremes of all distributions in this dataset to obtain an indication of the uncertainty.

Small-Scale Applications
The dataset is also suitable for small-scale applications either in comparative studies or for direct use in data sparse regions, but one should be aware of the different statistical characteristics of point-scale and grid-scale. Due to averaging effects in gridded datasets, precipitation extremes of point-scale observations are higher (Cavanaugh & Gershunov, 2015;De Michele, Kottegoda, & Rosso, 2001;Ensor & Robeson, 2008;Hu et al., 2020;Sivapalan & Blöschl, 1998;Zorzetto & Marani, 2019). Illustrative examples of two locations, Vienna and San Francisco, are included in Figure S14. Analysis of the return level plots shows the estimates of the three distributions discussed in the main manuscript, as well as the estimates using the Gumbel distribution compared to the observed ones. We converted the annual maximum precipitation to 'observed' return levels ( Figure S14e+j). It should be kept in mind though that these 'observed' return levels are also different from the 'true' return levels. For (sub-)daily durations and low return periods, there is generally a good agreement between the observed return levels and the estimates of the three EVDs. For longer durations and return periods, however, the estimated extremes deviate from the observed extremes. This is seen in San Fransisco ( Figure S14f  Furthermore, increasing event durations result in lower shape parameters (less heavy tails), which was seen for all three distributions discussed in the main manuscript. An implication of this is that for long-durations the shape parameter indicates a finite endpoint (GEV and POT), or a very thin tail (MEV), while heavier tails are generally observed for short-durations. When estimating very large return periods (e.g., T 500), it is therefore possible for shorter duration estimates to be more intense than the corresponding quantiles computed for longer durations, which is physically impossible (see also Figure S14a,b,f and g). Additionally, Papalexiou and Koutsoyiannis (2013) argued based on long time series that the true population GEV shape parameter of daily precipitation lies between exponential and heavy-tail behavior, and thin-tails lead to an underestimation of the extremes. For applications where underestimation is undesirable, we have included the extreme precipitation estimates using the Gumbel distribution as well. The Gumbel distribution is an exponential distribution, and equal to the GEV distribution where the shape parameter equals zero. Therefore, if the shape parameter of GEV or POT is negative, the Gumbel estimates could be used instead in order to avoid underestimation.
To get a better understanding of the range and uncertainty of a single cell location, we recommend to look at return level plots of the four distributions at the cell of interest in combination with its neighboring cells. This is particularly important for GEV and POT, due to the absence of coherent spatial patterns and the erratic manifestation of the tail behaviors. Previous results (Zorzetto, Botter, & Marani, 2016) show that the benefits of MEV over GEV are greater for large return periods relative to the sample size available for the fit. Hence, for the estimation of large quantiles, MEV may be presumed to be more accurate. Depending on the practical application one could then choose to use the most extreme value, use the MEV value, use the Gumbel value in case of a negative shape for GEV or POT, or use a spatial average of the GEV and POT estimates.