Practical strategies for generalized extreme value‐based regression models for extremes

The generalized extreme value (GEV) distribution is the only possible limiting distribution of properly normalized maxima of a sequence of independent and identically distributed random variables. As such, it has been widely applied to approximate the distribution of maxima over blocks. In these applications, GEV properties such as finite lower endpoint when the shape parameter ξ$$ \xi $$ is positive or the loss of moments due to the magnitude of ξ$$ \xi $$ are inherited by the finite‐sample maxima distribution. The extent to which these properties are realistic for the data at hand has been widely ignored. Motivated by these overlooked consequences in a regression setting, we here make three contributions. First, we propose a blended GEV (bGEV) distribution, which smoothly combines the left tail of a Gumbel distribution (GEV with ξ=0$$ \xi =0 $$ ) with the right tail of a Fréchet distribution (GEV with ξ>0$$ \xi >0 $$ ). Our resulting distribution has, therefore, unbounded support. Second, we proposed a principled method called property‐preserving penalized complexity (P 3$$ {}^3 $$ C) prior to decide on the existence of the GEV distribution first and second moments a priori. Third, we propose a reparametrization of the GEV distribution that provides a more natural interpretation of the (possibly covariate‐dependent) model parameters, which in turn helps define meaningful priors. We implement the bGEV distribution with the new parameterization and the P 3$$ {}^3 $$ C prior approach in the R‐INLA package to make it readily available to users. We illustrate our methods with a simulation study that reveals that the GEV and bGEV distributions are comparable when estimating the right tail under large‐sample settings. Moreover, some small‐sample settings show that the bGEV fit slightly outperforms the GEV fit. Finally, we conclude with an application to NO 2$$ {}_2 $$ pollution levels in California that illustrates the suitability of the new parameterization and the P 3$$ {}^3 $$ C prior approach in the Bayesian framework.


INTRODUCTION
The generalized extreme value distribution (GEVD) is a three parameters family that describes the asymptotic behavior of properly renormalized maxima of a sequence of independent and identically distributed random variables. It is parameterized in terms of a location ∈ R, a scale > 0 and a shape parameter ∈ R that controls the type of tail of the GEVD. Indeed, if is zero, the GEVD has unbounded (parameter-free) support, whereas if is positive, the limiting distribution is heavy-tailed with infinite upper endpoint but finite lower endpoint that depends on the parameters. If < 0, then the short-tailed GEVD has a finite upper endpoint that also depends on the parameter. We can see then that when ≠ 0, the GEVD does not obey the general regularity conditions for likelihood-based inference (Stuart et al., 2004); see Smith (1985) for an in-depth study of maximum likelihood inference for a class of irregular models that includes the GEVD. Moreover, by using the asymptotic GEVD as an approximation for the distribution of maxima over finite blocks, GEV properties, such as finite lower bound in the case > 0 and finite upper bound in the case < 0, are inherited by the original maxima distribution, which might not have bounded support.
Despite the above features, many authors have shown the practical usefulness of the GEV family as an approximation to the distribution of block maxima. For instance, Broussard and Booth (1998) use the GEV distribution to analyse the behavior of extreme losses in a German stock index, whereas Bruun and Tawn (1998) assume a GEVD to estimate the probability of coastal flooding. Extensions using regression models to include covariate effects have been used in El Adlouni et al. (2007), El Adlouni and Ouarda (2009), and Cannon (2010), among others. Flexible parameter smoothing using the class of vector generalized linear and additive models was proposed in Yee and Stephenson (2007), and this framework was recently exploited by Zhong et al. (2022) for an application to heatwave hazard assessment in Europe. Further extensions to combine spatial information across locations have been analyzed in Casson and Coles (1999), while Westra et al. (2013) studied the effect of climate change on global precipitation annual maxima, and Jóhannesson et al. (2021) embedded space-time covariates and spatially correlated random effects within GEV parameters to predict extreme river flows over the whole U.K. territory. Additionally, Huang et al. (2016) conducted a detailed study on the usefulness and potential limitations of the GEVD to model seasonal maximum and minimum temperatures from a millennial-scale climate simulation. From a theoretical perspective, Stein (2017) introduces two new results on the limiting distribution of block maxima, motivated by the study of annual temperature extremes. The need for such new results comes from the upper bound imposed by the GEV when it is used as an approximation to the distribution of properly renormalized annual maximum temperatures. Indeed, Stein (2017) argues that the existence of an upper bound means that for the GEVD to be a suitable approximation of the annual maximum temperature distribution, we would need a theoretical result that applies to the maximum of a large number of random variables with varying upper bounds. To solve this artefact created by the GEVD, Stein (2017) develops a framework based on triangular arrays, where the GEVD appears as a particular case.
As in Stein (2017), here we focus on an artefact created by the GEV distribution when it is used as an approximation to the distribution of finite block maxima. Our focus, though, is on the case where the shape parameter is positive and therefore, the limiting GEV distribution has a lower bound. To illustrate this artefact, consider X 1 , … , X n independent and identically distributed (i.i.d.) according to a standard Cauchy distribution F, which has support in R. Then, if M n = max{X 1 , … , X n } and M ⋆ n = (M n − b n )∕a n with sequences a n > 0 and b n ∈ R, one has that Pr(M n ≤ z) = F n (z) and Pr(M ⋆ n ≤ z) = F n (a n z + b n ) (where a n and b n can be chosen as a n = n −1 and b n = 0). This implies that both M n and its standardized version M ⋆ n have also support in R. It can be shown (see, e.g., Schmiedt, 2016) that in this case, , z > 0, as n → ∞, that is, properly renormalized maxima of a sequence of i.i.d. standard Cauchy variables with support in R converge to a unit Fréchet random variable, which has support in (0, ∞). Therefore, the GEV limit imposes an artificial bounded support to the finite-sample maxima of Cauchy random variables, whose actual finite-n distribution may have non-negligible mass over (−∞, 0). Note that many other similar theoretical examples can be found to illustrate the same mismatch between the finite-n and asymptotic supports of block maxima (e.g., maxima of exponential, beta, or Weibull random variables, just to name a few examples).
The artificial bound imposed by the GEV approximation is particularly troublesome when the GEV parameters are covariate-dependent, that is, y(x) ∼ GEV( (x), (x), (x)), for x ∈ R d , d ≥ 1. In practical applications, the components of y(x) are usually constructed as the block-maxima of a certain variable, say z(x), for a given block size. In this case, the GEV support, {z ∶ 1 + (x){z − (x)}∕ (x) > 0}, is also covariate-dependent. In particular, when (x) > 0, the lower bound is (x) − (x)∕ (x). A GEVD fitted to data y implies that the maximum possible value of z should also have a lower bound for any configuration of x, even those that have not been observed. Moreover, this lower bound is changing with the covariates that have been observed. Currently, there is no theoretical guarantee that a new configuration of covariates outside the sample support would not predict a value y ⋆ below the fitted lower bound. In practice, if we wish to predict with covariates outside the sample support, then the whole model is refitted to account for y ⋆ . Furthermore, note that if y ⋆ is observed, this single observation will control the estimates and consequently the lower bound, reducing the importance of the remaining observations. Therefore, following the reasoning of Stein (2017), we could argue that for the GEV to be a reasonable model for y, we would need, in principle, a result that applied to maxima of a large number of random variables with varying lower bound.
The examples presented here illustrate that when the GEVD with positive shape is scaled back to be used as an approximation for finite-sample maxima, it introduces a purely theoretical, unrealistic and inconvenient artefact. Furthermore, while this artefact may be considered negligible for the actual model being fitted, it can have major implications for prediction and computing in regression settings. We address this issue by modifying the lower tail of the GEVD with positive shape. Specifically, we construct a blended GEV (bGEV) distribution with support in the whole real line, based on a combination of the Gumbel ( = 0) and the Fréchet ( > 0) distributions. Our approach shares similarities with spliced models (see, e.g,. Reynkens et al., 2017;Castro-Camilo et al., 2019;Opitz et al., 2018) and mixture models for extremes (Behrens et al., 2004;Carreau & Bengio, 2009;de Melo Mendes & Lopes, 2004;Do Nascimento et al., 2011;Frigessi et al., 2002;Leonelli & Gamerman, 2020) in the sense that all of these methods join two distributions. But besides this, our approach is fundamentally different from that of spliced or mixture models. First of all, mixture models blend two or more distributions at the density level. This means that when the mixture weights are not constant in the argument, the definition of the associated distribution function requires the computation of a normalizing constant, which may not have a closed-form expression. Our bGEV model blends the Frechét and Gumbel distributions at the distribution level so that the resulting density function can straightforwardly be computed and is automatically normalized. Second, the mixture and spliced models cited above are proposed for threshold exceedances. For instance, Castro-Camilo et al. (2019) and Opitz et al. (2018) propose a blend of a Gamma distribution for the bulk and a generalized Pareto distribution (GPD) for the tail, and their approach incorporate the modeling of the exceedance probability. Frigessi et al. (2002) provides a continuous transition between the bulk (described using a Weibull distribution) and GPD-tail models, avoiding the threshold selection associated with fitting a generalized Pareto distribution. As noted by Scarrott and MacDonald (2012), their model is appropriate when there is a lower bound on the support. On the contrary, the goal of our bGEV model is to modify the left tail of the GEV distribution with positive shape parameter while keeping its upper tail behavior.
Note that since the GEVD is the only possible nondegenerate limit to properly renormalized maxima, we do not propose a different limiting model. Instead, we create a distribution that behaves very closely to the GEVD, bypassing the lower bound problem. This means that we expect to get similar results for both distributions in terms of parameter estimates and return level estimation, which is confirmed by our first simulation study in Section 5. The benefit of our proposed bGEV model is that it is numerically more stable and reliable when we want to make predictions. It is also more statistically meaningful in the sense that the support does not depend on parameters.
The bGEVD is expressed in terms of a new parametrization of the GEVD, that provides a more meaningful interpretation of the model's parameters. Indeed, the usual location-scale GEV parametrization loses interpretability when the mean and standard deviation do not exist, which is the case when ≥ 1 and ≥ 0.5, respectively. Our proposed bGEV parametrization is based on a GEV quantile q (location) and a quantile range (spread) s = q 1− ∕2 − q ∕2 (0 < , < 1) whose existence, unlike the mean and standard deviation, are not influenced by the shape parameter. A consequence of this is that we have bGEV location and spread parameters over which we can meaningfully assign priors in the Bayesian setting. In particular, we have that independent priors over the bGEV location and spread parameters induce a joint prior distribution for the original location and scale parameters.
The selection of a prior distribution for the bGEV shape parameter motivates an additional contribution of this paper. As mentioned above, the shape parameter has a major role in determining the type of tail of the GEVD. It is also key in determining important features of the distribution, such as the existence of moments. Indeed, the kth moment exists if and only if < 1∕k (Muraleedharan et al., 2011). In a regression context where might depend on covariates, we might have that, for example, the first and second moments are not continuous as a function of the covariates, that is, mean and variance exist only for some configuration of the covariates. The consequences of this issue have not received much attention in the literature. By focusing on these low-order moments, we proposed a principle method to decide a prior on the existence of the first and second moments. Our method is called property-preserving penalized complexity (P 3 C) prior approach, and exploits the framework introduced by Simpson et al. (2017) to ensure desired features of the bGEVD.
To summarize, the goal of this article is threefold: (1) to offer an alternative to the classical GEVD that corrects the left tail, providing a parameter-free support; (2) to propose a reparametrization of the GEVD in terms of a quantile and a quantile range that eases the interpretation of the model's parameters; and (3) to introduce a principled method to decide the existence of low-order moments a prior. Our proposed bGEV model is implemented adopting a hierarchical Bayesian framework in the context of latent Gaussian models, using the new parametrization and the P 3 C prior approach. Marginal posterior distributions of interest are computed using the integrated nested Laplace approximation (INLA; Rue et al., 2009). The bGEV model is freely available and efficiently implemented in the R-INLA package (Bivand et al., 2015;Lindgren & Rue, 2015).
The remainder of the article is organized as follows. In Section 2 we describe the new reparametrization of the GEVD, while in Section 3 we describe our modeling approach based on the bGEVD. The concept of property-preserving penalized complexity priors is introduced in Section 4. In Section 5 we present three simulation studies that compare the performance of the GEVD and the bGEVD at estimating return levels and provide insights on the influence of the bGEV model parameters. Section 6 presents the improvements implied by our model using NO 2 concentrations in California. Conclusions and a discussion of our methods are given in Section 7.

A REPARAMETRIZATION OF THE GEV DISTRIBUTION
The usual GEV parametrization in terms of location, scale, and shape parameters is somewhat convenient as it resembles well-known location-scale families. Within these families, we usually associate the location and scale parameters with the mean and standard deviation. Nonetheless, in skewed distributions such as the GEVD, the mean and standard deviation are no longer reasonable proxies for the location and scale of the distribution. Moreover, for large enough values of the shape parameter, the GEV mean and variance grow to infinity, which prevents us from interpreting the location-scale GEV parametrization as we usually do in other models. This is particularly troublesome in a regression context where the natural choice is to use a regression model in the location parameter, the scale parameter, or both. In such cases, covariate coefficients may not have a clear meaning if the GEV mean and variance are not defined. This is even more critical in a Bayesian context, as it might not be clear how to assign reasonable priors to GEV parameters. In this section, we propose a reparametrization of the GEV that has a broader interpretation for all values of ∈ R, where prior information can be assigned to parameters that can still be widely interpreted even in the cases where the first and second moments are not finite. Specifically, following the spirit of Coles and Tawn (1996), we reparametrize the GEVD in terms of new quantile-based location and spread parameters. The new location parameter is the -quantile (0 < < 1), denoted q ∈ R, while the new spread is the quantile range s = q 1− ∕2 − q ∕2 ∈ R + (0 < < 1). So, for instance, if = = 0.5, then the new location q 0.5 is the median, and the spread s 0.5 is the interquartile range. The GEVD reparametrized in terms of q , s and can be written as where a + = max(a, 0) and for any a > 0, a, = (− log a) − . Note that the case = 0 simplifies to with a = log(− log a), for any a > 0. There is a one-to-one mapping between (q , s , ) ⊤ and the parametrization in terms of ( , , ) ⊤ . For the case ≠ 0, the mapping is given by The case = 0 is interpreted as the limit when → 0, that is, Note that the location and spread parameters are still interpretable in this new parametrization even when the mean and variance are undefined. The probabilities and that define the quantiles q , q ∕2 , q 1− ∕2 are fixed hyperparameters and should be defined by the user.

THE BLENDED GEV MODEL
As mentioned in Section 1, we want to modify the left tail of the GEVD with positive shape parameter to avoid inheriting a left-bounded support for the finite-sample maxima. We want to do this by blending two distributions, F(x) and G(x), at the distribution level, so that the resulting distribution H(x) resembles the GEVD's right tail, has infinite support, and the resulting density function can straightforwardly be computed and is automatically normalized. Our blended GEV (bGEV) distribution function is defined as where F is the Fréchet distribution with location q , spread s , and shape , and G is the Gumbel distribution with locatioñ q ≡q (q , s ) and spreads ≡s (q , s ). The function p is a weight function that controls the way the distributions F and G are blended together.
As per the notation in (1), H is parameterized solely in terms of q , s , and , and not in terms ofq ands . Indeed, in the following we will see thatq ands can be completely characterized via q , s , and a set of hyperparameters, of which the first two are the probabilities and . In principle, and can take any value, but as noted in Section 2, some choices might be easier to interpret than others (e.g., setting = = 0.5). The remaining hyperparameters are related to the weight function p. The function p is chosen as the cumulative distribution function of a beta distribution with shape parameters c 1 > 0 and c 2 > 0 (F Beta (⋅|c 1 , c 2 )) defined over the left-tail mixing area [a, b]. As the name implies, the left-tail mixing area is where F and G are merged; see the blue shaded rectangle in Figure 1. The weight function p is then defined The left-tail mixing area [a, b] is defined by setting a = F −1 (p a ) and b = F −1 (p b ) for relatively small probabilities p a and p b . The shape parameters c 1 and c 2 of the weight function p can take any positive value, but we restrict them to greater than 3. In that case, the second derivative of the log-density of H is always continuous (see Section A.2 in the Appendix).
Some comments related to our above choices are in order. First, note that the weight function p in (2) is 0 for x ≤ a and one for x ≥ b, which means that the left tail of H is equal to the left tail of G (which is unbounded) and the right tail of H is equal to the right tail of F. In other words, the distribution H has the correct limiting right tail, yet without the inconvenient, left-bounded, parameter-dependent support imposed by the limit. Note also that lim →0 H(x) = G(x) for all weight functions p, which means that our proposed distribution reduces to the classical Gumbel case when → 0, for all p. This also ensures that the resulting distribution H is not too far from the classical GEVD when ≈ 0. Second, note that by setting a = F −1 (p a ) and b = F −1 (p b ) we get that F(a) = G(a), F(b) = G(b) and the resulting distribution H is continuous. Moreover, we can express the Gumbel parametersq ands as which are injective functions of q and s . Finally, note that the reparametrization proposed in Section 2 is based on the quantiles q , q ∕2 , and q 1− ∕2 of the GEVD. Since the bGEVD is equal to the GEVD only for values above the left-tail mixing area (i.e., only for x ≥ b), the definition of q and s should be constrained to the value of b. Since b is also defined as a quantile of the GEVD, the above implies that we should have ≥ p b and ∕2 ≥ p b . Otherwise, q and s would be defined as quantiles of the bGEVD, which are different from those of the GEVD below b. Summarizing, H is completely defined through the parameters (q , s , ) and the hyperparameters ( , , p a , p b , c 1 , c 2 ). For our data application in Section 6, we set = = 0.5, in which case H is parameterized in terms of the median and the interquartile range of F. Additionally, we set p a = 0.05, p b = 0.2, and c 1 = c 2 = 5. The choice for c 1 and c 2 leads to a symmetric and computationally convenient weight function.

PROPERTY-PRESERVING PENALIZED COMPLEXITY PRIORS
Penalized complexity (PC) priors  provide a principled and widely applicable method to specify prior information that is difficult to obtain from expert knowledge. It uses the natural nested structure inherent in many model components to define these model components as flexible extensions of some base model. The method penalizes deviations from this base model by placing an exponential prior on the Kullback-Leibler divergence from the base model. Base models should then be defined for every model component of interest. This task is simplified by noticing that, in practice, model components are completely defined through parameters. Take, for example, a Gaussian random walk of order 2 (a definition can be found in (7)), which is entirely defined by its precision (inverse of its standard deviation). This fact can be used to define a PC prior using as base model a random walk with an infinite precision (see Section 6 for an application of this particular PC prior). PC priors can also be defined over data parameters, such as the positive shape parameter of the GEV/bGEV distributions. The natural base model, in this case, is a GEV/bGEV with a shape equal to 0 (i.e., a Gumbel distribution). Although it is possible to derive a formula for this PC prior, we can take advantage of the relationship between the GEV and generalized Pareto (GP) distributions and approximate the PC prior of the GEV shape by that of the GP shape. Opitz et al. (2018) derived the PC prior for the GP positive shape parameter using a GP with shape equal to 0 as base model. This PC prior is defined for 0 ≤ < 1, thus preventing infinite-mean models, and depends on a penalty parameter > 0. It can be expressed as We can use the PC prior framework to define priors that help preserve specific properties of the model components or the data distribution, such as the existence of moments. This is what we call the property-preserving penalized complexity (P 3 C) prior approach. In the case of the GEVD, the existence of the kth moment is guaranteed if < 1∕k. Although, in practice, the existence of high-order moments for the GEVD with strictly positive cannot be guaranteed, we can preserve low-order moments by shrinking the interval of possible values that can take a priori. Specifically, we can preserve the first two moments of the GEVD by conditioning on being less than 0.5. In practice, this means that we normalize (4) by the corresponding cumulative distribution evaluated at 0.5, that is, using the alternative prior A method such as the one presented here is particularly important in prediction problems. Indeed, if the prior for has positive support in [0, u ], then for finite sample sizes, the prediction of a new data point will inherit the moment properties of ∈ [0, u ]. Therefore, if, for instance, u = 1, the posterior distribution will assign a positive density to ∈ [0, 1], which means that all predictions for a new data point will have infinite mean and variance once the uncertainty is taken into account, even if the data suggest the opposite.
The P 3 C prior concept that we show here for the shape parameter can be extended to other models' parameters when important model properties are not continuous as a function of such parameters. We argue that distributional properties such as the existence of moments are too crucial to be determined by the randomness of the data-generating process. Therefore, the importance of the P 3 C prior approach lies in providing a framework where we can make rational decisions about important model properties through prior knowledge, avoiding any influence of the data or the model error inherent to any statistical model fit.
Note that, in principle, standard bounded priors over (such as a Beta(0, 1) distribution conditioned to be less than 1∕2) can also be used to preserve the first two moments. Here we choose the PC prior approach because it is a principled method flexible enough to represent different levels of prior knowledge, but alternative approaches could also be taken.

SIMULATION
We conduct simulation studies to compare the performance of the GEVD and the bGEVD at estimating return levels and to provide some insights on the effects of the hyperparameters , , p a , p b , and c 1 = c 2 on the bGEV model. Since the GEVD is the only possible non-degenerate limit to properly renormalized maxima, we expect the GEVD to outperform the bGEV for large sample sizes. However, it is very rare to find a sufficiently large number of block-maxima in practical applications.
In those cases, we expect both models to perform similarly. For our first simulation study, we draw n samples from a Fréchet distribution with location 0, scale 1 and shape = 1∕0.1 and retain the largest value. We repeat this N times to get a sample of N independent block-maxima and fit the GEV and bGEV distributions using maximum likelihood. We then use the fitted parameters to compute return levels associated with a return period T and compare them with the return levels from the true maxima distribution (Fréchet(0, n 1∕ , )). We repeat the above M times to obtain Monte Carlo performance measures. Table 1 show our results in terms of the difference (RMSE GEV − RMSE bGEV ), where RMSE GEV and RMSE bGEV are the root mean square error associated to the GEV and bGEV fits, respectively. These measures where computed based on M = 500 Monte Carlo replicates with n = 30, 50,100, 500, N = 30, 50,100, 500, 1000, and T = 30, 50,100. We can see that both models perform similarly for all block sizes (n), number of block-maxima (N) and return periods (T). As expected, the GEVD performs slightly better for large number of block-maxima (see the last column, where N = 1000). We can also see that for N = 30, the bGEVD performs slightly better than the GEVD for most block sizes. Note that both models were fitted using the same optimization method (Nelder & Mead, 1965) and the same starting values. Our second simulation study aims to assess the effect of the hyperparameters p a , p b , and c 1 = c 2 over the bGEV model. To this end, we generate N = 100 samples from the GEV distribution with = 0, = 1, and = 0.1. We then fit the bGEV using maximum likelihood and different values of the hyperparameters p a , p b , and c 1 = c 2 . Specifically, the values are p a = 0.05, 0.1, 0.15, p b = 0.2, 0.25, 0.3, and c 1 = c 2 = 3, 5. We compare the different models based on their ability to estimate a theoretical 50-year return level and summarize our results by computing the RMSE based on M = 500 Monte Carlo replicates. Our results in Table 2 show no major differences across all 18 configurations of p a , p b , and c 1 = c 2 , although for fixed values of p b , there seems to be a slight increase in the RMSE as p a increases, that is, as the left-tail mixing area become smaller. Note that the left-tail mixing region, defined by the p a -and p b -quantiles of the Frechét distribution F,  conditions the choice for and . Indeed, as mentioned in Section 3, we must have p a < p b ≤ min{ , ∕2}. Therefore, it is advisable to choose p a and p b to be relatively small probabilities. In light of the results from Table 2, it is also advisable to choose them not too close. Finally, note that all the bGEV RMSEs are better than the RMSE obtain with GEV fit, which is equal to 2.53. Our third simulation study analyses the effect of the hyperparameters and on the bGEV model. To this end, we follow the same setup used in our second simulation study with p a = 0.05, p b = 0.2, c 1 = c 2 = 5, and = (0.3, 0.5, 0.7, 0.9) and = (0.5, 0.7, 0.9). The spreads s associated with the different values of are, respectively, the length of the central 50%, 30%, and 10% interval. As before, we compare the different models based on their ability to estimate a theoretical 50-year return level and summarize our results by computing the RMSE based on M = 500 Monte Carlo replicates. From Table 3, we can see that combinations of and yield different levels of accuracy with no clear monotonic pattern. Overall, the bGEV fit with any combination of and leads to a better RMSE than the GEV fit, whose RMSE is 2.53.
To summarize, our first simulation study shows that the GEVD and bGEVD are comparable when estimating the right tail. Note that in this simulation study, maxima were taken over Fréchet random variables with location parameter equal to 0, which means that the generated values are bounded by 0. This represents a setting where the original observations have a natural (physical) lower bound, which is the case for, for example, precipitation or pollutant concentrations. From our results in Table 1, we can see that the bGEV model misspecification (support in the whole real line) does not affect the resulting upper tail estimation. Finally, the results presented in Tables 2 and 3 show the effect of the hyperparameters associated with the new parametrization, the left-tail mixing region and the weight function and can be used as guidelines when fitting the bGEVD.

APPLICATION
In this section, we illustrate the bGEV model using the new parametrization (Section 2) and the P 3 C prior approach (Section 4). To this end, we consider monthly maximum concentrations of nitrogen dioxide (NO 2 ), measured in micrograms per cubic meter or μg/m 3 , in Bakersfield, a city in Kern County, California. NO 2 is a chemical compound that primarily gets in the air from fuel-burning (U.S. Environmental Protection Agency, https://www.epa.gov); see Vettori et al. (2019) and Vettori et al. (2020) for related studies of extreme air pollutant concentrations in California. Monthly maximum NO 2 measurements are computed from January 2000 to March 2016, giving rise to 136 complete measurements.
Available covariates include monthly mean and monthly maximum measurements of wind speed, temperature, pressure, and relative humidity (eight covariates in total). To model (possible non-linear) relationships between the covariates and the NO 2 concentrations, we assume that monthly maximum concentrations of NO 2 follow a bGEVD where the location q and spread s vary with covariates. For simplicity, we assume the shape to be covariate-free. Although likelihood-based inference assuming independence is straightforward (see the density associated with the model in (1) in the Section A.1 of the Appendix), we here consider Bayesian inference in the context of latent Gaussian models using the integrated nested Laplace approximation (INLA; Rue et al., 2009). Within the INLA framework, we assume that observations are conditionally independent given a latent Gaussian field and a set of hyperparameters. Loosely speaking, the latent field describes the underlying trends and dependence structure of the observations and can be represented in terms of an additive linear predictor. For the bGEV model, we link the linear predictor to q , which means that the linear predictor describes the effect of the covariates over the -quantile. Additionally, our bGEV-INLA implementation allows the spread and the shape parameters to vary linearly with covariates. The goal of the Bayesian inference is to get posterior marginal distributions for the components of the linear predictor, the components of the linear models for the spread and shape parameters, and the set of hyperparameters. This translates into the computation of high-dimensional integrals, and INLA tackles these computations conveniently using the Laplace approximation in a nested way, leveraging modern numerical techniques for sparse matrices. It is important to notice that latent Gaussian models represent a very vast class of statistical models; therefore, INLA can be used in a variety of applications (see, e.g., Wang et al., 2018 andKrainski et al., 2018). For a gentle review of the INLA methodology, see Rue et al. (2017).
Model selection was carried out using the Watanabe-Akaike Information Criterion (WAIC), the Deviance Information Criterion (DIC), and graphical tools based on the probability integral transform over all the possible configurations of the eight covariates when we include them linearly and non-linearly in the location q and linearly in the spread s . The best model obtained using a forward-selection procedure has a covariate-dependent location and covariate-free spread and shape and is given by for t = 1, … , 136, where 1 yot t =i denotes the year of time t, 0 is an intercept, 1 and 2 are linear regression coefficient and f 1 (⋅) and f 2 (⋅) are non-linear effects defined as cyclic Gaussian random walks of order two with a sum-to-zero constraint for identifiability purposes. A cyclic Gaussian random walk of second order f can be defined as follows: let w = (w 1 , … , w K ) T be a discretization of the covariate x into K bins. Then, is a Gaussian random walk of order 2 with precision parameter that controls the level of smoothness of the random walk. Note that f 1 is defined over the original covariate (month) as it is naturally binned, while f 2 is defined over a binned version of monthly mean temperature. Prior distributions need to be specified for the three linear regression coefficient in (6), the precision parameters of the two random walks ( 1 and 2 for f 1 and f 2 , respectively), s ,0 and 0 . We choose a vague Gaussian prior with zero mean and precision 3 × 10 −3 for the linear regression coefficients ( 0 , 1 , 2 ). Preliminary exploratory analysis (not shown) suggests that the values of NO 2 vary smoothly with moths, exhibiting a concave relationship. Using a grid search, 1 was kept fixed to a value that reflects this relationship. For the precision parameter 2 , we set a weak prior distribution where the probability that the standard deviation 1∕ 2 is greater than the empirical standard deviation of the response is 0.01. For the spread, we assume that s ,0 ∼ Gamma(3, 3) a prior. Finally, for , we use the P 3 C prior approach in (5) to preserve the existence of the first two moments of the bGEVD, with = 7.
For comparison, we also implement the GEVD with the linear model in (6) linked to the GEV location (t). Table 4 shows posterior means and 95% credible intervals for the bGEV and GEV linear regression coefficients and hyperparameters. Figure 2 shows the posterior mean and 95% credible bands of the nonlinear effects associated with both fits as well as probability integral transform values (PITs). PITs are predictive measures of fit, commonly used TA B L E 4 Posterior means and 95% credible intervals for the linear regression coefficients and hyperparameters of the bGEV and GEV to assess model calibration (Gneiting et al., 2007). If a model is well-calibrated, the PITs histogram serves as a tool to empirically check for uniformity. As expected, we can see that both fits produce very similar results in terms of model estimates, although the GEV fit had less stable results and required additional steps to guarantee convergence. Both models are also comparable in terms of WAIC (934 bGEVD versus 927.8 GEVD) and DIC (938.7 bGEVD versus 926.8 GEVD).

DISCUSSION
Inspired by the usually overlooked restrictions inherited by finite-sample maxima distributions when they are approximated by the GEVD, we here make three main contributions. Firstly, we propose the blended generalized extreme value (bGEV) distribution as an alternative to the classical GEVD to fix the lower bound constraint imposed by the GEV distribution when the shape parameter is positive. Specifically, by using the GEV distribution as an approximation for the distribution of maxima over blocks, properties of the GEV distribution, such as a lower endpoint, are inherited by the finite-sample maxima distribution, which might not be lower-bounded. This issue is particularly troublesome in the presence of linear or nonlinear covariates and becomes crucial with high-dimensional covariate settings. The bGEVD smoothly combines the left tail of the Gumbel distribution ( = 0) and the right tail of the Fréchet distribution ( > 0) and depends on a weight function that controls the influence of both distributions. It is important to notice that since the GEVD is the only possible nondegenerate limit to properly renormalized maxima, we do not propose a different limiting model. Instead, we derive a distribution that resembles the GEVD but with infinite support, thus avoiding any lower bound restrictions while matching the right tail of the GEVD. The bGEV model is numerically more stable and is guaranteed to work correctly in all applications. Our simulations and application show that, as expected, the GEV and bGEV models produce similar results. Secondly, we propose a new parametrization of the GEVD in terms of a quantile and a spread parameter to allow for a more straightforward and meaningful interpretation of the model parameters. The spread is here defined as the difference between two quantiles. This parametrization has significant consequences in a regression setting where parameters are assumed to vary with covariates, as it ensures interpretability even when the first two moments of the distribution are not defined. The parametrization has natural connections with empirical quantiles, and it is particularly advantageous in the Bayesian framework as interpretable prior distributions can be easily assigned.
Thirdly, we introduce the concept of P 3 C priors to retain important model properties when these properties are not continuous as functions of model parameters. We use this concept to avoid inconsistencies related to the first two moments F I G U R E 2 Nonlinear effects over months and mean temperature and PIT values for the bGEV (top) and GEV (bottom) models fitted to the NO 2 pollution data of the GEVD. Specifically, since the first two GEV moments are defined for < 0.5, we restrict the values of to the interval (0, 0.5) by normalizing the PC prior using the corresponding cumulative distribution. We argue that the data at hand should not define critical distributional features such as the existence of moments. The P 3 C prior approach provides a framework to define those features through prior knowledge.
The GEVD is closely connected to the generalized Pareto distribution (GPD) for threshold exceedances. Indeed, if the conditions for the extremal types theorem hold, then the GPD is the only possible limiting distribution for exceedances over a threshold u when u grows to infinity. But unlike the GEVD, the GPD does not have a parameter-dependent support when > 0, so the methodology presented in Section 3 is not suitable nor needed.
Note that the new parametrization introduced in Section 2 is needed when using the bGEV-INLA implementation, but it is not crucial for defining the bGEV model. Indeed, we can define the bGEV distribution in terms of the GEV parameters , , > 0. The purpose of the new parametrization is to provide a way to interpret model parameters even in the absence of the first and second moments, which in turn helps define meaningful priors.
The bGEV model with the new parametrization and the P 3 C prior approach is freely available and efficiently implemented in the R-INLA package. In general, INLA requires the model likelihood to be log-concave (Rue et al., 2009), which is not the case for the GEV and the bGEV distributions. Although the lack of log-concavity does not necessarily mean that INLA will not converge, it is highly advisable to try to mitigate potential numerical instabilities by, for example, choosing informative priors for the likelihood parameters. Additional standardizations of the response variable can also help to reduce convergence issues. publication is partially based upon work supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. OSR-CRG2017-3434.

DATA AVAILABILITY STATEMENT
The INLA implementation and documentation for the bGEV model can be found in the R-INLA package via inla.doc("bgev"). A tutorial for fitting the bGEV model with R-INLA can be found in https://www.r-inla.org/ documentation. Additionally, codes to reproduce the simulation studies and the data application are freely available from https://www.github.com/dcastrocamilo/bGEV.

A.2 First and second derivatives of the bGEV density Let
The we can write h(x) = H(x)m(x) and