Postprocessing of Ensemble Weather Forecasts Using a Stochastic Weather Generator
| By Li, Zhi | |
| Proquest LLC |
ABSTRACT
This study proposes a new statistical method for postprocessing ensemble weather forecasts using a stochastic weather generator. Key parameters of the weather generator were linked to the ensemble forecast means for both precipitation and temperature, allowing the generation of an infinite number of daily times series that are fully coherent with the ensemble weather forecast. This method was verified through postprocessing reforecast datasets derived from the Global Forecast System (GFS) for forecast leads ranging between 1 and 7 days over two Canadian watersheds in the Province of
(ProQuest: ... denotes formulae omitted.)
1. Introduction
Ensemble weather forecasts offer great potential benefits for water resource management, as they provide useful information for analyzing the uncertainty of pre- dicted variables (
During the last two decades, a number of post- processing methods have been proposed and imple- mented to address the bias and underdispersion of ensemble weather forecasts. These include rank histo- gram techniques (Hamill and Colucci 1998; Eckel and Walters 1998; Wilks 2006), ensemble dressing (Roulston and Smith 2003; Wang and Bishop 2005; Wilks and Hamill 2007; Brocker and Smith 2008), Bayesian model averaging (BMA; Raftery et al. 2005; Vrugt et al. 2006; Wilson et al. 2007; Sloughter et al. 2007; Soltanzadeh et al. 2011), logistic regression (Hamill et al. 2006; Wilks and Hamill, 2007; Hamill et al. 2008), analog techniques (Hamill et al. 2006; Hamill and Whitaker 2007), and nonhomogeneous Gaussian regression (NGR; Gneiting et al. 2005; Wilks and Hamill, 2007; Hagedorn et al. 2008). Among these methods, the logistic regression method was most often used to calibrate both precipitation and temperature, and BMA and NGR were usually used to calibrate the temperature (Raftery et al. 2005; Hagedorn et al. 2008; Hamill et al. 2008). More recently, studies have also extended the BMA for the postprocessing of precipitation (Sloughter et al. 2007; Schmeits and Kok 2010).
Hamill et al. (2004) used a logistic regression method to improve the medium-range precipitation and tem- perature forecast skill using retrospective forecasts. The ensemble mean and ensemble mean anomaly were used as predictors for precipitation and temperature, respec- tively. The results showed that the logistic regression- based probability forecasts (using retrospective forecasts) were much more skillful and reliable than the operational NCEP forecast. Raftery et al. (2005) proposed using the BMA method to calibrate the ensemble forecasts of temperature and found that the calibrated predictive PDFs were much better than those of the raw forecast.
Wilks (2006) compared eight ensemble model out- put statistics (MOS) methods for the statistical post- processing of ensemble forecast using the idealized Lorenz'96 setting. The eight methods were classified into four categories: 1) early, ad hoc approaches (di- rect model output, rank-histogram recalibration, and multiple implementations of single-integration MOS equations), 2) the ensemble dressing approach, 3) re- gression methods (logistic regression and NGR), and 4) Bayesian methods (forecast assimilation and BMA). This is probably the most thorough study to date in terms of including the greatest number of MOS methods for the postprocessing of ensemble forecasts. The three best performing methods were found to be logistic regression, NGR, and ensemble dressing. Wilks and Hamill (2007) further compared these three methods for postprocessing daily temperature, and medium-range (6-10 and 8- 14 days) temperature and precipitation forecasts. The results showed there was not a single best method for all of the applications of daily and medium-range forecasts. For example, the logistic regression method yielded the best Brier score (BS) for central forecast quantiles, while the NGR forecasts displayed slightly greater ac- curacy for probability forecasts of the more extreme daily precipitation quantiles. Hagedorn et al. (2008) and Hamill et al. (2008) did a parallel study that used NGR and logistic regression for postprocessing temperature and precipitation, respectively, using the ECMWF and Global Forecast System (GFS) ensemble reforecasts. The skill and reliability of ECMWF and GFS ensemble temperature and precipitation forecasts were largely improved when using the NGR and logistic regression methods, respectively. These studies also emphasized the benefits of using ensemble retrospective forecasts (reforecasts).
Other studies such as Wilson et al. (2007) and Soltanzadeh et al. (2011) showed that BMA is also able to improve the skill and reliability of ensemble forecasts. However, most studies were only focused on the cali- bration of temperature rather than precipitation using BMA, because the original BMA developed by Raftery et al. (2005) was only suitable for variables whose pre- dictive PDFs are approximately normal. To use it for the calibration of precipitation, Sloughter et al. (2007) ex- tended BMA by modeling the predictive PDFs corre- sponding to an ensemble member as a mixture of a discrete event at zero and a gamma distribution. The extended BMA yielded calibrated and sharp predictive distributions for 48-h precipitation forecasts. It even outperformed the logistic regression at estimating the probability of high precipitation events, because it gives a full predictive PDF rather than separate forecast probability equations for different predictand thresholds. Similarly, Wilks (2009) also extended the logistic regression to provide full PDF forecasts. The main advantage of the extended logistic regression is that the forecasted probabilities are mutually consistent, thus, the cumulative probability for a small predictand threshold cannot be larger than the probability for a larger threshold (Wilks 2009). Based on the above- mentioned studies, Schmeits and Kok (2010) compared the raw ensemble output, modified BMA, and extended logistic regression for postprocessing ECMWF ensemble precipitation reforecasts. The results showed that, even though the raw ensemble precipitation forecasts were relatively well calibrated, their skill could be significantly improved by the modified BMA and extended logistic regression methods. However, the difference in skill be- tween modified BMA and extended logistic regression was not significant.
Even though a number of methods have been pro- posed for postprocessing the ensemble weather fore- casts, most of them are aimed at finding the underlying probabilistic distribution of forecasted variables. How- ever, for some practical applications, such as ensemble streamflow predictions, several sets of discrete, auto- correlated time series over several days are needed for driving the impact models (e.g., hydrological models). However, there is no simple way to go from the un- derlying distribution to the generation of a discrete, autocorrelated time series that is fully consistent with the underlying distribution. This study presents a new method for postprocessing ensemble weather forecasts using a stochastic weather generator. The ensemble mean precipitation and temperature anomalies are used as predictors for the calibration of precipitation and tem- perature, respectively. A great number of ensemble members can be produced using the stochastic weather generator with a gamma distribution for generating pre- cipitation amounts and a normal distribution for gener- ating temperature. A simple bias correction (BC) method is used as a benchmark to demonstrate the performance of the proposed method [i.e., the generator-based post- processing (GPP) method]. The GPP ensemble forecasts were compared with BC and raw GFS ensemble forecasts over two Canadian watersheds in
2. Study area and dataset
a. Study area
The ultimate goal of this project is to provide and evaluate ensemble streamflow forecasting. It is with this goal in mind that we chose to focus on watershed- averaged meteorological data instead than station data. Accordingly, this study is conducted over two Canadian catchments located in the Province of
1) CHUTE-DU-DIABLE (CDD)
The CDD watershed (48.58-50.28N, 70.58-71.58W) is located in central <location value="LS/ca.qc" idsrc="xmltag.org">Quebec. With a mostly forested sur- face area of 9700 km2, it is a subbasin of the
2)
The YAM watershed (45.18-46.18N, 72.28-73.18W) is composed of a number of tributaries draining a basin of approximately 4843 km2 in southern
b. Dataset
The dataset consists of observed and ensemble- forecasted daily total precipitation and mean tempera- ture. The observed daily precipitation and temperature over two watersheds were taken from the National Land and
Ensemble forecasts (daily total precipitation and mean temperature) with the global grid of 2.58 were taken from the GFS reforecast dataset (http://www.esrl.noaa.gov/ psd/forecasts/reforecast/; Hamill et al. 2006). Several previous studies (e.g., Hamill et al. 2004, 2006; Hamill and Whitaker 2006, 2007; Whitaker et al. 2006; Wilks and Hamill 2007) have presented the benefit of calibrating probabilistic forecasts using ensemble reforecast data- sets. Forecasts for each day since 1979 were made with GFS, composed of a 15-member run out to 15 days. Since little skill is retained for precipitation after 1 week, only 1-7 lead days are used in this study over the 1979-2003 time frame. Two grid boxes were selected and averaged for the CDD watershed and only one grid box was se- lected for the YAM watershed.
3. Methodology
a. Stochastic weather generator
A stochastic weather generator is a computer model that can produce climate time series of arbitrary length and with statistical properties similar to those of the ob- served data (
The precipitation occurrence is usually generated using a Markov chain with various orders based on transition probabilities. Alternatively, the precipitation occurrence can also be generated based on an unconditional pre- cipitation probability if the precipitation model only considers the wet- and dry-day probabilities rather than the wet- and dry-spell structures. In this sense, if a random number drawn from a uniform distribution for one day is less than the unconditional precipitation probability, a wet day is predicted. Since the weather generator is used in this study to synthesize the wet and dry states of ensemble members for a given day rather than to generate the continuous time series of precipitation occurrence, only the second method was used. For a predicted wet day, stochastic weather generators usually produce the pre- cipitation amount by using a parametric probability dis- tribution (e.g., exponential and gamma distributions). The two-parameter gamma distribution is the most widely used method to simulate wet-day precipitation. Tem- perature is usually generated using a two-parameter (mean and standard deviation) normal distribution. In this study, the gamma and normal distributions are used to generate the ensemble members of precipitation and temperature, respectively, for a given day. Similarly to stochastic weather generators such as Weather Genera- tor (WGEN;
b. Generator-based postprocessing (GPP) method
The GFS ensemble forecasts are postprocessed using the GPP method. The observed daily precipitation and temperature are used as predictands, and the forecasted ensemble mean precipitation and temperature anoma- lies are used as predictors, respectively. The evaluation of the GPP method is based on a cross-validation ap- proach (Wilks 2005) to ensure the independence of the training and evaluation data. Given 25 years of available forecasts, when making forecasts for a particular year, the remaining 24 years were used as training data.
1) POSTPROCESSING FOR PRECIPITATION
The calibration of precipitation is based on four sea- sons: winter [January-March (JFM)], spring [April-June (AMJ)], summer [July-September (JAS)], and autumn [October-December (OND)]. The methodology for the precipitation calibration is based on the hypothesis that a relationship must exist between the mean of the en- semble forecast and both the probability of precipitation occurrence and wet-day precipitation amounts. The larger the mean of the ensemble forecast, the more likely that rainfall will occur, and the more likely that a large pre- cipitation amount will be registered. For each season and lead day, the ensemble precipitation is calibrated with the following three steps.
1) The ensemble mean precipitation is first calculated using the 15-member ensemble precipitation fore- casts. The calculated ensemble mean precipitation for each lead day in the given season is then classified into several classes based on wet-day precipitation amounts. Depending on the training samples, the numberofclassesisdifferent.Amaximumof10 classes with wet-day precipitation amounts between 0-1, 1-2, 2-3, 3-4, 4-5, 5-7, 7-10, 10-15, 15-25, and
2) The second step involves establishing relationships between the forecasted precipitation classes and the probabilities of observed precipitation occurrence and observed mean wet-day precipitation amounts. Figure 2 presents the probabilities of the observed precipitation occurrence and mean wet-day precipi- tation amounts as functions of the forecasted precip- itation classes for summer precipitation at 1 and 3 lead days over the two selected watersheds (solid lines in the Fig. 2). The results clearly show the re- lationship between the mean of the ensemble forecast and the observed probability of precipitation occurrence (left-hand side), and between that same mean and the observed mean precipitation amount (right-hand side). For a large ensemble mean, the observed precipitation occurrence is nearly 100% for the larger basin. For a 7-day lead time (not shown), both relationships are close to a horizontal line, indicating that the ensemble precipitation forecast has little relevance for that lead time. The variability observed in the graphs is due to sampling times that are too short. Accordingly, the lines were smoothed using a second-order polynomial (dashed lines in Fig. 2).
3) In the third step, the relationships (smoothed func- tions) between the probability of the observed precip- itation occurrence and the forecasted precipitation class are directly used to determine the probability of precipitation occurrence for a given day. For any given day in the evaluation period, a forecasted pre- cipitation class is first determined according to the ensemble mean precipitation for that day. For exam- ple, if the ensemble mean precipitation is 0.5 mm for a given day, it is classified into the first class (between 0 and 1 mm). The corresponding probability of ob- served precipitation occurrence (e.g., 40% for the YAM basin) is then used as the precipitation proba- bility for this day. Then 1000 random numbers drawn from a uniform distribution are generated to rep- resent 1000 members for this day. If the random numbers are less than or equal to the corresponding probability of observed precipitation occurrence (e.g., 40%), the corresponding members are predicted to be wet, otherwise, they are predicted to be dry. Finally, if a member is deemed wet, the fitted gamma function in the corresponding class is used to generate the pre- cipitation amounts with uniform random numbers. Overall, 1000 members are generated for any given day. A large number of members are used to obtain the truest possible results of a weather generator. A small number of samples could result in biases due to the random nature of the stochastic process. The proposed postprocessing approach does not directly take into account the autocorrelation of precipitation occurrence. During the period covered by the ensem- ble weather forecast, the probability of precipitation is directly given by the forecast for each lead day, and thus preserves the coherence of the ensemble forecast. As such the autocorrelation of precipitation occur- rence is directly governed by the forecast. If the forecast is wet for several days, all 1000 members will carry this information stochastically and all sequences will be dominated by wet days. As long as the forecasts have skill, using the probability of precipitation oc- currence given by the forecasts is highly preferable to using the mean probabilities used to generate the occurrence series in a pure stochastic mode. Similarly to most stochastic weather generators, the proposed method does not account for the possible autocorre- lation of precipitation amounts.
2) POSTPROCESSING FOR TEMPERATURE
The postprocessing for temperature is performed on a daily basis. The calibration of ensemble temperature forecasts includes two stages. The first stage consists of the BC of the ensemble mean temperature using a linear regression method. The second stage adds the ensemble spread using a weather generator-based method. For each evaluation year and lead day, the ensemble tem- perature forecast BC follows three specific steps:
1) Similarly to precipitation, the ensemble mean tem- perature (24 yr 3 365 days) is first calculated using the 15-member ensemble temperature forecasts (24 yr 3 365 days 315 members). The mean observed daily temperature (1 yr 3 365 days) is also calculated using the 24-yr daily time series (24 yr 3 365 days). The temperature anomalies (24 yr 3 365 days) of both observed and forecasted data are then determined by subtracting the mean observed daily temperature (1 yr 3 365 days) from the observed temperature (24 yr 3 365 days) and from the ensemble mean temperature (24 yr 3 365 days), respectively.
2) Linear equations are fitted between observed and forecasted temperature anomalies using a 31-day win- dow centered on the day of interest. For example, when fitting the linear equation for 16 January, the tempera- ture anomalies from 1 January to 31 January over 24 yr are pooled. The use of a 31-day window ensures there will be enough data points to fit a reliable equation. This process is conducted for each day to obtain 365 equa- tions, which can be used to correct the bias of ensemble mean temperature anomaly for an entire year.
3) The fitted linear equations in step 2 are used to correct the daily ensemble mean temperature anomaly for each validation year. Finally, the bias-corrected en- semble mean temperature is obtained by adding the mean observed temperature to the bias corrected temperature anomalies.
A scatterplot of the ensemble mean temperature be- fore and after BC is plotted against the corresponding observed temperature for the 1 lead day over the two selected watersheds (Fig. 3). In this case, all 25 years of raw and corrected mean forecasts are pooled together rather than separated by 31-day windows. Only a slight bias is observed for the raw GFS ensemble mean temper- ature for both watersheds, as displayed in Figs. 3a and 3c, and where the linear regression line slightly deviates from the 1:1 line. However, this bias is removed by using the linear regression method, as shown in Figs. 3b and 3d where the linear regression and 1:1 lines overlap each other.
After the BC of the ensemble mean temperature, the ensemble spread is added using a stochastic weather generator-based method. The ensemble temperature of any given day is supposed to follow a two-parameter (mean and standard deviation) normal distribution. The bias-corrected ensemble mean temperature is used as the mean of the normal distribution. The standard deviation for each season (the same standard deviation is used for each day in a specific season) is obtained using an opti- mization algorithm to minimize the root-mean-square error (RMSE) of rank histogram bins. Specifically, the optimization algorithm involves four steps. 1) A number of standard deviation values (ensemble spreads) are preset for each season. For example, they are set between 0.58 and 58C with an interval of 0.058C in this study. 2) The ensemble temperature for every day in this season is calculated by multiplying each standard deviation by a normally distributed random number and adding the bias-corrected ensemble mean temperature. This step is repeated for all preset standard deviation values to obtain a number of temperature ensembles. 3) Rank histograms are constructed for all temperature ensembles. 4) The RMSEs of rank histogram bins are calculated for all histograms. The standard deviation corresponded to the lowest RMSE is selected as the optimized one for this season. These four steps are repeated for all four seasons to obtain four optimized standard deviations for entire year postprocessing. To insure there are enough samples to construct reliable rank histograms, the standard de- viation is optimized at the seasonal scale.
For any given day, the postprocessed ensemble tem- perature is found by multiplying the optimized standard deviation in the specific season with a normally distrib- uted random number (1000 in this study to represent 1000 members) and adding the bias-corrected ensemble mean temperature for that day. However, the ensemble temperature generated this way lacks an autocorrelation structure. For hydrological studies, autocorrelated time series of Tmax and Tmin are usually needed to run hy- drological models. Applying a similar technique used in weather generators, the observed auto- and cross corre- lation for and between Tmax and Tmin can be preserved using a first-order linear autoregressive model. With this model, the Tmax and Tmin ensembles over several lead days are generated at the same time, rather than generated day after day and variable after variable.
The ensemble mean Tmax and Tmin are first obtained by adding the mean observed Tmax and Tmin to the biased-corrected temperature anomalies (obtained from step 2), respectively, for all lead days. The residual series of Tmax and Tmin with desired auto- and cross correla- tion are then generated using a first-order linear autore- gressive model:
... (1)
where xi( j)isa(23 1) vector for lead day i whose el- ements are the residuals of the generated Tmax ( j 5 1) and Tmin ( j 5 2); ei( j)isa(23 1) vector of independent random components that are normally distributed with a mean of zero and a variance of unity. Here A and B are (2 3 2) matrices whose elements are defined such that the new sequences have the desired auto- and cross- correlation coefficients. The A and B matrices are de- termined by
... (2)
... (3)
where the superscripts 21 and T denote the inverse and transpose of the matrix, respectively, and M0 and M1 are the lag 0 and lag 1 covariance matrices and calculated using the observed time series for each season. With Eq. (1), a number of the residual series over all lead days are generated to represent the ensemble members. Fi- nally, the ensemble Tmax and Tmin over several days can be found by multiplying the optimized standard de- viation in the specific season by the generated residual series and adding the bias-corrected ensemble mean Tmax and Tmin.
The first-order linear autoregressive model has been tested extensively in several studies (e.g.,
c. Bias correction (BC) method
A simple BC method is used as a benchmark to dem- onstrate the advantages of the proposed GPP method. The BC step for temperature is similar to that of the GPP method. Linear equations with the form of y 5 ax 1 b (where a and b are two estimated coefficients) are fitted between observed and forecasted temperature anomalies using a 31-day window centered on the day of interest. The fitted linear equations are than used to correct the daily ensemble temperature anomaly for all 15 members. This step supposes that all ensemble members have the same bias. The variance optimization stage of the GPP method was not applied to the BC method. As such, it can be expected to outline the advantages of the GPP method over the simpler direct bias correction method.
A bias correction procedure is also applied to the ensemble precipitation forecast. Linear equations of the form y 5 ax (where a is the estimated coefficient) are fitted between the observed and forecasted mean pre- cipitation using a 31-day window centered on the day of interest. It differs from the temperature correction in that the linear equation for precipitation is fitted using mean precipitation values and not the daily values. This results in a more reliable estimation of the linear de- pendence between observed and forecasted values. Moreover, since the distribution of the daily precipitation is highly skewed, a fourth root transformation was ap- plied to precipitation values prior to fitting the linear equations. Similarly to the temperature, for a given day, the same linear equation is used for all ensemble members.
d. Verification of the postprocessing method
Rank histograms permit a quick examination of the quality of ensemble weather forecasts. Consistent biases in an ensemble weather forecast result in a sloped rank histogram, and a lack of variability (underdispersion) is revealed as a U-shaped, concave, population of the ranks (Hamill 2001). Thus, the rank histogram is first used to evaluate ensemble precipitation and temperature fore- casts. However, a uniform rank histogram is a necessary but not a sufficient criterion for determining the re- liability of an ensemble forecast system (Hamill 2001). Besides, some other characteristics are not evaluated by rank histograms, such as the resolution. Other verifica- tion metrics are thus necessary for testing the predictive power of an ensemble weather forecast. In this study, the GFS, BC, and GPP ensemble precipitation and temper- ature forecasts are verified using the Ensemble Verifica- tion System (EVS) developed by Brown et al. (2010). The selected verification metrics include two deterministic metrics for verifying the ensemble mean, and two prob- abilistic metrics for verifying the distribution. The con- tinuous ranked probability skill score (CRPSS) and the Brier skill score (BSS) are also used to verify the skill of the ensemble forecasts relative to the climatology.
The two deterministic metrics are the mean absolute error (MAE) and RMSE. The MAE measures the mean absolute difference between the ensemble mean forecast and the corresponding observation and the RMSE mea- sures the average square error of the ensemble mean forecast. The two probabilistic metrics include the BS and the reliability diagram. The BS measures the average square error of a probability forecast. It is analogous to the mean square error of a deterministic forecast. It can be decomposed into three components: reliability, reso- lution, and uncertainty. A reliability diagram measures the accuracy with which a discrete event is forecast by an ensemble forecast system. The BS and reliability diagram only verify discrete events in the continuous forecast distributions. Thus, one or more thresholds have to be defined to represent cutoff values from which discrete events are computed. Six thresholds corresponding to the probability of precipitation and temperature exceeding 10% (lower decile), 33% (lower tercile), 50% (median), 67% (upper tercile), 90% (upper decile), and 95% (95th percentile) are used in this study. Details of these metrics can be found in Brown et al. (2010), Demargne et al. (2010), and in the user manual of the EVS (Brown 2012) (http://amazon.nws.noaa.gov/ohd/evs/evs.html).
4. Results
Figure 4 presents the rank histograms of GFS and GPP ensemble precipitation forecasts for 1 lead day over two watersheds. Only wet-day precipitation is used to pro- duce the rank histograms. To allow for a proper com- parison with the raw ensemble forecasts, only 15 members are generated using the GPP method in this case. The results show that the distributions of the raw GFS en- semble forecasts are highly nonuniform; there is a marked tendency for the distribution to be most popu- lated at the lowest and extreme ranks to form U-shaped rank histograms (Figs. 4a,c). This indicates that the raw GFS forecasts are considerably underdispersive for both watersheds. Wet biases are observed for the CDD wa- tershed and dry biases for the YAM watershed. However, after the calibration with the GPP method, the rank his- tograms are much flatter for both watersheds (Figs. 4b,d), even though only 15 members are generated in this case. Using more members would result in more uniform rank histograms.
Figure 5 shows the rank histograms of ensemble tem- perature forecasts before and after calibration for 1 lead day over the two watersheds. Similarly to precipitation, to allow for a fair comparison with the raw ensemble fore- casts, only 15 members are generated for the GPP en- semble forecasts in this case. The results show that the distribution of raw GFS ensemble forecasts is highly nonuniform (U shaped) for temperature. There is a marked tendency for the distribution to be most popu- lated at the extreme ranks, indicating the underdispersion and cold bias of the raw forecasts over the two water- sheds. However, rank histograms of calibrated ensemble forecasts tend to be uniform for both watersheds.
Figures 6 and 7 show the quality of the ensemble mean forecast before and after postprocessing in terms of the MAE and RMSE, respectively, for both precipitation and temperature over both watersheds. Both statistics are computed using all forecast-observation pairs (25 yr 3 365 days). Overall, the GFS ensemble mean forecasts display large errors for both precipitation and tem- perature covering leading days from 1 to 7 days. How- ever, the GPP method consistently improves the qualities of the ensemble mean forecasts for all leads. In terms of the MAE, the BC method displays more benefits than the GPP method for precipitation over both watersheds. This is expected, since the BC method specifically accounts for the bias of the GFS forecast. However, in terms of the RMSE, the GPP method consistently performs better than the BC method for precipitation. Since both BC and GPP methods share the same step at removing the bias of the ensemble mean temperature, the MAE and RMSE of the forecast temperature are the same for both.
As displayed in Figs. 6a and 6c, the quality of raw en- semble mean forecasts decreases slightly with increasing lead time for precipitation in terms of the MAE. How- ever, the RMSE of raw ensemble mean forecasts tends to decrease with an increase in lead time for precipitation (Figs. 7a,c). After postprocessing, the forecast quality slightly decreases with the increase in lead days. For the ensemble mean temperature, there is a progressive de- cline in forecast quality with increasing lead time in terms of both MAE and RMSE.
Moreover, the quality of the ensemble mean forecast at the CDD watershed is consistently better than that at the YAM watershed for precipitation, suggesting that watershed size plays an important role. This likely indi- cates that the numerical weather forecast system is better at representing precipitation events over a larger area, since the representation of convective events is very dif- ficult considering the horizontal resolution of the com- putational grid. In this work, the observed precipitation is watershed averaged, and as such, convective precipi- tation extremes are smoothed over the larger basin. The same extremes would play a more important role in a smaller watershed.
The skill of ensemble forecasts relative to unskilled climatology is assessed using the mean continuous ranked probability skill score (MCRPSS; Fig. 8). The GFS en- semble precipitation forecasts show negative skill relative to the climatology over both watersheds. The forecast skill consistently increases with the forecast leads. This is caused primarily by the lack of spread (greater sharpness) in shorter lead ensemble forecasts and the larger spread in longer lead ensemble forecasts. Even though the BC method is able to improve the ensemble precipitation forecast to a certain extent, the skill is still negative for all 7 lead days. The GPP method considerably increases the skill of the ensemble forecast for both watersheds and is consistently better than the BC method. The skill of the GPP forecast decreases with increasing lead times, and is close to zero at the 7-day lead, indicating that the ensemble weather forecast has reached its predictability limit. Thus, the calibration of ensemble precipitation forecasts for a period of 7 lead days is appropriate in this study.
The GFS ensemble temperature forecasts are much more skillful than their precipitation forecasts for all leads over both watersheds. Even though the GFS ensemble temperature forecasts are skillful for the period up to 1 week, they can be further improved by both GPP and BC methods. The GPP ensemble temperature forecasts are consistently better than the BC ones for all 7 lead days and both watersheds, indicating that benefits of the GPP method not only come from the BC stage, but also from the variance optimization stage. The BC stage plays a slightly more important role at improving the raw forecasts. Moreover, the skill of ensemble tem- perature forecasts (before and after postprocessing) consistently decreases with the increase in lead time for both watersheds.
For probabilistic metrics computed for discrete events, such as the BS, BSS, and reliability diagrams used in this study, a number of thresholds have been defined. As mentioned earlier, six thresholds were used in this study. Since similar patterns are obtained, only the results with the threshold exceeding the median are presented for il- lustration for all four metrics.
Figures 9 and 10 show the BS of GFS, GPP, and BC ensemble precipitation and temperature forecasts for both watersheds, with leads ranging between 1 and 7 days. The reliability, resolution, and uncertainty components of the BS and the BSS, which measure the performance of an ensemble weather forecast relative to the climatology, are also presented. The reliability term of BS measures how close the forecast probabilities are to the true probabilities, with smaller values indicating a better forecast system. The resolution term measures how much the predicted probabilities differ from a climatological average and therefore contribute valuable information. Thus, a larger resolution value suggests a better forecast. According to its definition, the uncertainty term of the BS is always equal to 0.25 [0.5 3 (1 2 0.5)] when using the median as the threshold.
In terms of the BS, the ensemble forecasts are less ac- curate (in overall performance) for precipitation (Fig. 9), and reasonably accurate for temperature (Fig. 10) for both watersheds. In terms of the BS, the BC method performs slightly better for the temperature forecasts, but shows no improvement for the ensemble precipitation forecasts. Nevertheless, the GPP method consistently increases the accuracy for both precipitation and tem- perature for all 7 lead days for both watersheds, with a consistent increase in the resolution component of the BS. In addition, the reliability component of the BS is also improved for the ensemble precipitation forecasts for all lead times. In contrast, the BS' reliability component is slightly degraded for the ensemble temperature forecasts at all lead times. This is because the raw ensemble forecasts are very reliable for temperature to begin with (mean reliability component of 0.005 for the CDD watershed and 0.003 for the YAM watershed). The moderate decrease in the BS is due to a relatively large increase in the resolution component and a slight de- terioration of the reliability component.
In terms of the BSS, the skill of the GFS ensemble precipitation forecast is negative for all 7 lead days. The BC method results in small improvements for preci- pitation forecasts, but only for the first few lead days. It then progressively becomes worse than the GFS fore- casts for the other lead days. The GPP method consid- erably improves the skill of the ensemble forecast for both watersheds and is consistently better than the BC method. The skill of the GPP ensemble forecast de- creases with increasing lead times, with the BSS being close to zero at 7 lead days, further indicating that the ensemble weather forecast retains some skill for a pe- riod of up to 1 week for precipitation. The BSS shows high skill in GFS ensemble temperature forecasts, for all lead times and for both watersheds. The BC method slightly improves the skill of the ensemble temperature forecast, but at the expense of the resolution. However, the GPP ensemble forecast consistently exceeds the skill of GFS and BC ensemble forecasts. In particular, the BSS of the GPP ensemble forecast at 7 lead days is greater than that of the GFS forecast at the 1 lead day for the CDD watershed.
The reliability diagram (
The cold biases of the raw temperature forecast (Fig. 5) are also reflected in the reliability diagrams (Figs. 8b,d) for both 1 and 3 lead days for both watersheds, as displayed by the underforecasting. The ensemble tem- perature forecasts calibrated using the weather generator- based method are much more reliable than the GFS and BC forecasts for both 1 and 3 lead days over both wa- tersheds, as indicated by the reliability curves, which are very close to the 1:1 reference line. More importantly, the improvement in reliability results in a slight decline in the sharpness. The better performance of the GPP method over the BC method is a clear indication that a significant part of the performance is derived from the variance optimization stage.
5. Discussion and conclusions
Ensemble weather forecasts generally suffer from bias and tend to be underdispersive, which limits their pre- dictive power. Several methods, such as logistic regression, BMA, and ensemble dressing have been proposed for postprocessing these ensemble forecasts. These methods are relatively complex to set up and generally aim at es- timating the underlying predictive PDFs. However, a series of point values are often more convenient for practical applications such as ensemble streamflow fore- casts, which need discrete, autocorrelated time series over several days in order to run hydrological models. These discrete, autocorrelated time series of precipitation and temperature need to be physically constrained. For example, temperatures changes from one day to the next and the probability of precipitation occurrence are not random. Even if a method is adequate at reconstructing the underlying PDF, there is no simple way to go from the underlying distributions to generating several time series fully consistent with the underlying distributions. The GPP method presented in this study is significantly sim- pler to implement than most existing methods, and it can readily generate an infinite number of discrete, auto- correlated time series over the forecasting horizon. The auto- and cross correlation of and between Tmax and Tmin were specifically taken into account with this method. Moreover, the GPP method specifically takes into account the precipitation occurrence biases of en- semble forecasts. Precipitation amounts are modeled using a parametric distribution. This underlying assump- tion allows extreme values outside the range of the ob- served data to be simulated. A gamma distribution was used in this study. Other distributions with a heavy tail (e.g., mixed exponential distribution and hybrid expo- nential and generalized Pareto distribution) can also be used to better represent the ensemble spread (
A simple BC method was used as a benchmark to demonstrate the performance of the GPP method over two Canadian watersheds located in the Province of
The GFS and GPP ensemble weather forecasts were preliminarily tested using rank histograms. Similarly to previous studies (Hamill and Colucci 1997; Hagedorn et al. 2008), the GFS forecasts were found to be biased and underdispersed, as illustrated by the excess pop- ulations of the extreme ranks. This underdispersion was more pronounced at the shorter forecast leads than for longer forecast leads (results not shown). Uniform rank histograms could be achieved for both precipitation and temperature when postprocessed using the GPP method.
The performance of GFS, GPP, and BC ensemble weather forecasts was further verified using both de- terministic and probabilistic metrics. The deterministic metrics (MAE and RMSE) showed large errors in GFS ensemble mean forecasts for both precipitation and temperature at all 7 lead days over both watersheds. The GPP method was able to consistently improve the quality of the ensemble mean forecasts. The skill of ensemble weather forecasts relative to the climatology was mea- sured using MCRPSS. The raw forecast had negative to near-zero skill at all forecast leads for precipitation. The GPP method substantially improved the skill of the en- semble forecasts for precipitation, with MCRPSS being positive for all 7 lead days. Even though relatively good skill was observed for the raw ensemble temperature forecasts, they could be further benefitted by applying the postprocessing method. The performance of the GPP method was consistently better than that of the BC method.
Probabilistic metrics computed for discrete events including the BS, BSS, and reliability diagrams were further used to verify the overall performance (accu- racy, skill, reliability, resolution, and sharpness) of en- semble weather forecasts. Overall, the GPP method was able to consistently improve the accuracy of ensemble forecasts for both precipitation and temperature over both watersheds. It also consistently outperformed the BC method. The GFS ensemble forecasts showed neg- ative skill for precipitation for all 7 lead days. This in- dicated that the underdispersed GFS forecasts were even worse than the climatology for precipitation. However, with the GPP method, a positive skill was achieved for a period of up to 7 lead days. With the GPP method, the skill of the ensemble temperature forecasts was also im- proved, even though they usually had revealed reason- able skills before postprocessing. Underdispersion of the GFS ensemble precipitation forecasts was reflected in the reliability diagrams, indicating that the GFS precipitation forecasts were poorly calibrated and showed little skill and resolution. The GPP method markedly improved their reliability and resolution for all leads over both watersheds. However, the sharpness was somewhat diminished. This is consistent with other studies (e.g., Hamill et al. 2008) in that the reliability was improved at the expense of sharpness. The reliability diagrams showed cold biases for GFS ensemble temperature. However, the reliability curves were very close to the 1:1 perfect line after postprocessing.
Overall, even though GFS ensemble forecasts are bi- ased and tend to be underdispersed, their overall per- formance was considerably improved using the proposed GPP method. Predictably, the performance of the GPP method decreased with increasing lead days. For the GFS ensemble reforecasts and selected basins, 7 days was the maximum lead day for precipitation. For temperature, postprocessing over a longer period may be possible. The use of the BC method for temperature provided an op- portunity to separate the advantage of the GPP method into that from the bias correction and variance optimiza- tion stages. The better performance of the GPP method clearly demonstrates the importance of the variance optimization stage, even though the bias correction carries the largest part of the performance gain.
Owing to restrictions in paper length, the only results that were show for the probabilistic metrics (BS, BSS, and reliability diagrams) are those with a threshold ex- ceeding the median value. The use of higher thresholds is also interesting for many real-world applications. The ensemble weather forecasts were also tested using higher thresholds (67%, 90%, and 95%). While the results for the higher thresholds slightly differ from those obtained using the median values, the patterns were very similar. Spe- cifically, the skill of the GPP forecasts decreased slightly with the increasing threshold, because the GFS forecast performance gets worse for the higher thresholds. How- ever, the degree of improvement obtained from the GPP method increased with the larger thresholds.
The excellent performance of the postprocessing scheme may partly be due to the choice of basin-averaged meteorological time series. While still much smaller than the numerical model scale, the basin scale (9700 and 3330 km2) is nevertheless slightly more comparable and may result in a better representation of the precipitation than at the station scale, as is more common. The method performed very well with only one predictor (ensemble mean for precipitation and ensemble mean anomaly for temperature). No attempt was made at using additional predictors. In particular, the use of the ensemble standard deviation may yield additional improvements.
To obtain the true expectancy of a weather generator, a large number of ensembles need to be generated with the proposed method. Short time series could result in biases due to the random nature of the stochastic process. Thus, a 1000-member ensemble was generated in this study. However, for hydrological studies, running such a large ensemble may be time consuming for the fully distributed model. However, as indicated by rank histo- grams in Figs. 4 and 5, the ensemble with only 15 mem- bers was nevertheless better than the GFS forecast. Therefore, depending on research purposes and hydrol- ogy model complexity, an ensemble with fewer members may also be acceptable.
For the real climate system, a correlation exists be- tween precipitation and temperature. Generally, mean temperature is generally cooler on wet days. However, the proposed GPP method generated precipitation and temperature independently. This may affect this correlation to a certain extent. To investigate thus, the precipitation-temperature correlations were calculated for observed and forecasted datasets using all 25-yr daily time series. The correlation coefficients for GFS, GPP, and BC forecasts were obtained by averaging all co- efficients over all 7 lead days and all ensemble members. The correlation coefficients are 0.189, 0.363, 0.161, and 0.227 for observed data, GFS, GPP, and BC forecasts, respectively, over the CDD watershed. They are equal to 0.119, 0.239, 0.088, and 0.069, respectively, over the YAM watershed. These results indicate that the GFS forecasts overestimated the precipitation-temperature correlation. However, the precipitation-temperature correlation was slightly underestimated when using the GPP method. This is as expected, since the ensemble precipitation and temperature were generated indepen- dently. However, it should be noted that any bias cor- rection or postprocessing method may be expected to alter the precipitation-temperature correlation, unless specifically taken into account.
The goal of this work was to provide a postprocessing method to improve the ensemble weather forecasts for ensemble streamflow forecasts in the Province of
Acknowledgments. This work is part of a project fun- ded by the Projet d'Adaptation aux Changements Cli- matiques (PACC26) of the Province of
REFERENCES
Bertotti, L., J. R. Bidlot,
Brocker, J., and
Brown, D. J., 2012: Ensemble Verification System (EVS), version 4.0. User's manual, 107 pp.
_____, J. Demargne,
Buizza, R., 1997: Potential forecast skill of ensemble prediction and spread and skill distributions of the ECMWF ensemble pre- diction system. Mon. Wea. Rev., 125, 99-119.
_____,
Chen, J.,
_____, _____, and _____, 2011: Assessment and improvement of sto- chastic weather generators in simulating maximum and mini- mum temperatures. Trans. ASABE, 54 (5), 1627-1637.
_____, _____, _____, and A. Caron, 2012: A versatile weather gener- ator for daily precipitation and temperature. Trans. ASABE, 55 (3), 895-906.
Coulibaly, P., 2003: Impact of meteorological predictions on real- time spring flow forecasting. Hydrol. Processes, 17 (18), 3791- 3801.
Cui, B., Z. Toth, Y. Zhu, and
Demargne, J.,
Eckel, F. A., and
Gneiting, T.,
Hagedorn, R., T. Hamill, and
Hamill, T. M., 2001: Interpretation of rank histograms for verifying ensemble forecasts. Mon. Wea. Rev., 129, 550-560.
_____, and
_____, and _____, 1998: Evaluation of Eta-RSM ensemble proba- bilistic precipitation forecasts. Mon. Wea. Rev., 126, 711- 724.
_____, and
_____, and _____, 2007: Ensemble calibration of 500-hPa geopotential height and 850-hPa and 2-m temperatures using reforecasts. Mon. Wea. Rev., 135, 3273-3280.
_____, _____, and X. Wei, 2004: Ensemble reforecasting: Improving medium-range forecast skill using retrospective forecasts. Mon. Wea. Rev., 132, 1434-1447.
_____, _____, and
_____,
_____,
Hutchinson, M. F.,
Li, C.,
Li, Z., F. Brissette, and
Nicks, A. D., and
Pellerin, G.,
Raftery, A. E., T. Gneiting, F. Balabdaout, and
Roulston, M. S., and
Schmeits, M. J., and
Semenov, M. A., and
Sloughter, J. M.,
Soltanzadeh, I.,
Toth, Z., Y. Zhu, and T. Marchok, 2001: The use of ensembles to identify forecasts with small and large uncertainty. Wea. Forecasting, 16, 463-477.
Vrugt, J. A.,
Wang, X., and
Whitaker, J. S., X. Wei, and F. Vitart, 2006: Improving week-2 forecasts with multimodel reforecast ensembles. Mon. Wea. Rev., 134, 2279-2284.
Wilks, D. S., 2005: Statistical Methods in the Atmospheric Sciences. 3rd ed.
_____, 2006: Comparison of ensemble-MOS methods in the Lorenz '96 setting. Meteor. Appl., 13, 243-256.
_____, 2009: Extending logistic regression to provide full-probability- distribution MOS forecasts. Meteor. Appl., 16, 361-368.
_____, and
Wilson, L. J., S. Beauregard,
JIE CHEN AND FRANĂOIS P. BRISSETTE
(Manuscript received
Corresponding author address:
E-mail: [email protected]
DOI: 10.1175/MWR-D-13-00180.1
| Copyright: | (c) 2014 American Meteorological Society |
| Wordcount: | 10378 |



Saving money in operations
Rural SC hospitals explore avenues for help
Advisor News
- CFP Board appoints K. Dane Snowden as CEO
- TIAA unveils ‘policy roadmap’ to boost retirement readiness
- 2026 may bring higher volatility, slower GDP growth, experts say
- Why affluent clients underuse advisor services and how to close the gap
- Americaâs âconfidence recessionâ in retirement
More Advisor NewsAnnuity News
- Insurer Offers First Fixed Indexed Annuity with Bitcoin
- Assured Guaranty Enters Annuity Reinsurance Market
- Ameritas: FINRA settlement precludes new lawsuit over annuity sales
- Guaranty Income Life Marks 100th Anniversary
- Delaware Life Insurance Company Launches Industryâs First Fixed Indexed Annuity with Bitcoin Exposure
More Annuity NewsHealth/Employee Benefits News
- Access Health CT Adds Special Enrollment Period For New State Subsidy
- Trademark Application for âEVERY DAY, A DAY TO DO RIGHTâ Filed by Hartford Fire Insurance Company: Hartford Fire Insurance Company
- Researchers at City University of New York (CUNY) Target Mental Health Diseases and Conditions (Impact of Medicaid Institution for Mental Diseases exclusion on serious mental illness outcomes): Mental Health Diseases and Conditions
- Reports Outline Health and Medicine Findings from Jameela Hyland and Colleagues (Embedding Racial Equity in a Health Access Campaign in New York City: The Importance of Tailored Engagement): Health and Medicine
- Data on CDC and FDA Reported by Jia Li and Co-Researchers (Healthcare Access and Health Status by Primary Source of Health Insurance and Occupation): CDC and FDA
More Health/Employee Benefits NewsLife Insurance News