This study investigates impacts on convection-permitting ensemble forecast performance of different methods of generating the ensemble IC perturbations in the context of simultaneous physics diversity among the ensemble members. A total of 10 convectively active cases are selected for a systematic comparison of different methods of perturbing IC perturbations in 10-member convection-permitting ensembles, both with and without physics diversity. These IC perturbation methods include simple downscaling of coarse perturbations from a global model (LARGE), perturbations generated with ensemble data assimilation directly on the multiscale domain (MULTI), and perturbations generated using each method with small scales filtered out as a control. MULTI was found to be significantly more skillful than LARGE at early lead times in all ensemble physics configurations, with the advantage of MULTI gradually decreasing with increasing forecast lead time. The advantage of MULTI, relative to LARGE, was reduced but not eliminated by the presence of physics diversity because of the extra ensemble spread that the physics diversity provided. The advantage of MULTI, relative to LARGE, was also reduced by filtering the IC perturbations to a commonly resolved spatial scale in both ensembles, which highlights the importance of flow-dependent small-scale (<~10 m) IC perturbations in the ensemble design. The importance of the physics diversity, relative to the IC perturbation method, depended on the spatial scale of interest, forecast lead time, and the meteorological characteristics of the forecast case. Such meteorological characteristics include the strength of synoptic-scale forcing, the role of cold pool interactions, and the occurrence of convective initiation or dissipation.
Convection-allowing model (CAM; i.e., grid spacing < ~4 km) forecasts of convective precipitation are sensitive to errors on a broad range of spatial scales. For example, as theorized by Lorenz (1969) and more recently demonstrated in the context of CAM forecasts (e.g., Zhang et al. 2007; Hohenegger and Schar 2007), very small initial condition (IC) errors can rapidly grow in both amplitude and spatial scale. The upscale error growth is a source of forecast uncertainty not only for the evolution of convective systems (e.g., Flora et al. 2018), but also for the mesoscale and synoptic-scale environments in which they occur (e.g., Perkey and Maddox 1985; Zhang et al. 2007). The uncertainty of the ambient environment then feeds back to the convective-scale uncertainty on time scales as short as a few hours (e.g., Hohenegger and Schar 2007; Cintineo and Stensrud 2013; Kerr et al. 2019). The IC uncertainty on both convective scales and mesoscales contributes to substantial forecast uncertainty even out to lead times of a day or more (Johnson et al. 2014).
The motivation for ensemble forecasting, simply stated, is to account for the predictability limitations resulting from the nonlinear error growth described by Lorenz (1969) through Monte Carlo sampling of a large number of equally plausible forecasts (Ehrendorfer 1997). Ensemble forecasting aims to predict not the exact future state of the atmosphere, but rather the distribution of possible future states of the atmosphere given the best guess of the IC, the model dynamics and physics, and the uncertainty in each. Thus, a key consideration for optimally designing a CAM ensemble is how to choose a finite number of equally plausible model configurations and IC states to adequately sample both the analysis uncertainties that dominate relevant forecast uncertainties, and the uncertainties in the forward integration of the model (e.g., Anderson 1996; Stensrud et al. 2000; Hamill et al. 2000; Romine et al. 2014; Johnson et al. 2014; Johnson and Wang 2016).
A common method of generating IC perturbations for CAM ensembles has been to interpolate them from a coarser resolution (e.g., global) ensemble (e.g., Hohenegger et al. 2008; Zhang et al. 2010; Xue et al. 2010; Schwartz and Liu 2014). Early studies by Durran and Gingrich (2014) and Durran and Weyn (2016) have shown that the convective-scale predictability at lead times beyond a few hours is more strongly limited by errors on spatial scales of ~100 km or larger, rather than the smaller scales. From an ensemble design perspective, this raises the question of whether it is necessary to pay much attention to ensemble IC perturbations on spatial scales smaller than ~100 km. As a counterpoint, an early study by Johnson et al. (2014) showed that convective-scale IC perturbations contribute a similar amount of uncertainty to 1-day lead-time CAM forecasts of mesoscale precipitation as contributed by mesoscale IC perturbations. However, both Durran and Gingrich (2014) and Johnson et al. (2014) used random homogeneous IC perturbations rather than flow-dependent IC perturbations that sample the fast-growing modes of IC uncertainty, which are expected to contribute the most to forecast uncertainty.
Johnson and Wang (2016) used observation system simulation experiments (OSSEs) with a “perfect-model” assumption to go a step further and compare CAM ensemble forecasts with IC perturbations generated by a CAM ensemble-based data assimilation (DA) system compared to IC perturbations downscaled from a coarser resolution convection-parameterizing (i.e., 12-km grid spacing) ensemble. It was shown that the mesoscale precipitation forecasts out to at least the 9-h lead time were improved in the experiment with flow-dependent multiscale IC perturbations from the CAM ensemble compared to the flow-dependent coarser resolution IC perturbations from the convection-parameterizing ensemble. There were two factors contributing to the difference. First, the multiscale IC perturbations contained convective-scale structure while the coarser resolution IC perturbations did not. Second, the multiscale IC perturbations were more consistent with the analysis errors than the coarser resolution IC perturbations, even on the commonly resolved scales.
Past studies aimed at understanding optimal methods of sampling the analysis uncertainty in CAM ensembles through the IC perturbations have generally omitted interactions with the model and physics diversity within the ensemble design (e.g., Wang et al. 2014; Johnson and Wang 2016; Keresturi et al. 2019). However, it is well established that including a representation of model and physics uncertainty during ensemble forecast integration is necessary for achieving optimal forecast performance (e.g., Romine et al. 2014; Johnson et al. 2017; Gasperoni et al. 2020). It is unclear if IC perturbation methods optimized in the context of a fixed-model, fixed-physics ensemble would still be optimal in the presence of other sources of forecast diversity, such as in a multiphysics ensemble. In contrast to the OSSE framework with perfect model assumption of Johnson and Wang (2016), the present study uses real data cases and a multiphysics forecast ensemble. This allows for a more realistic assessment of the impacts on ensemble forecast performance of the different IC perturbation methods in the operationally applicable situation where model and physics errors are also contributing to the forecast uncertainty, and an existing global ensemble system is used for the downscaled IC perturbations. This study uses the operational global ensemble from the National Centers for Environmental Prediction (NCEP) to provide downscaled IC perturbations that more closely replicate what would be available to initialize an operational CAM ensemble in the absence of a CAM ensemble-based DA system. Specifically, we aim to compare and understand ensemble forecast performance when IC perturbations are downscaled from coarser resolution perturbations from the Global Ensemble Forecast System (GEFS) from NCEP to the performance when multiscale IC perturbations are generated using the GSI-based EnVar DA system (Johnson et al. 2015; Wang and Wang 2017) directly on the convection-permitting grid in the presence of model error and ensemble physics diversity. Any advantages of the multiscale IC perturbations for forecast performance would represent an additional benefit to justify the cost of operationally running ensemble-based DA directly on the convection-permitting grid in comparison to other DA approaches.
The organization of this paper is as follows. Section 2 describes the experiment design, including an overview of 10 case studies considered in this study, data assimilation and forecast system configuration, and details of the different IC perturbation methods. Results are then presented in section 3 and a summary and discussion are contained in section 4.
2. Experiment design
a. Overview of cases
A total of 10 cases are selected for these experiments (Table 1). The cases were selected because of the presence of widespread, potentially hazardous, deep convection and were also used in Gasperoni et al. (2020). The cases have been subjectively categorized as strongly forced or weakly forced based on the apparent strength of the synoptic-scale forcing, as implied by the strength of the jet stream and the presence or absence of synoptic features such as surface fronts or jet streaks near the convective systems of interest. The mean 250-hPa heights and winds at the analysis time for the strongly forced and weakly forced cases are shown in Figs. 1a and 2a, respectively. Also shown in Figs. 1b–f and 2b–f are the observed reflectivity at the 6-h forecast time in the strongly forced and weakly forced cases, respectively. While the strongly forced cases are generally characterized by diffluent southwesterly flow of >50 kt (~26 m s−1) at 250 hPa (Fig. 1a), the weakly forced cases are characterized by generally westerly flow of <50 kt over most of the forecast domain (Fig. 2a). The cases are chosen such that a variety of convective systems occur during the 18-h forecast period, although much of the convection dissipates or moves out of the forecast domain by the 18-h forecast lead time.
b. Ensemble analysis and forecast system configuration
This study leverages the ensemble analyses produced by Gasperoni et al. (2020). The data assimilation (DA) system adopted is the two-way coupled GSI-based ensemble-variational (EnVar) hybrid system based on the Advanced Research version of the Weather Research and Forecasting (WRF) Model (ARW; version 3.9; Skamarock et al. 2008) that has been used for U.S. operational global numerical weather prediction (e.g., Wang et al. 2013; Wang and Lei 2014). This system has been extended for mesoscale and convective-scale DA, including the capability to directly assimilate ground-based radar observations (Johnson et al. 2015; Wang and Wang 2017). Conventional surface and upper-air observations from the North American Mesoscale model DA system were assimilated every hour for a 6-h period, followed by a 1-h period of assimilating NEXRAD reflectivity observations every 20 min, following the system configuration further detailed in Gasperoni et al. (2020) and Johnson et al. (2020). In short, the system consists of a 40-member Ensemble Kalman filter (EnKF), coupled to a 3D EnVar that provides a control analysis around which each EnKF ensemble member is recentered after each analysis cycle in order to prevent divergence of the two ensembles. The DA covers the CONUS [see Fig. 1 of Gasperoni et al. (2020)]. The horizontal grid spacing is 3 km, and there are 50 vertical levels using the WRF stretched terrain-following coordinate. The relaxation to prior spread (RTPS) method of posterior covariance inflation (Whitaker and Hamill 2012) was employed with a coefficient of 0.95 to maintain ensemble spread during the DA cycles, consistent with the coefficient found to be optimal in several past studies (e.g., Whitaker and Hamill 2012; Harnisch and Keil 2015; Maldonado and Ruiz 2020). The large value of RTPS coefficient that is needed to maintain ensemble consistency likely compensates for undersampled model and physics uncertainty in the DA ensemble design. To reduce the impacts of potentially spurious ensemble covariances, horizontal covariance localization was applied with a cutoff length scale of 300 km for all observations except for the more densely spaced radar reflectivity which used a zero-correlation cutoff length of 15 km. The vertical covariance localization was applied with a cutoff scale of 1.1 and 0.55 for radar and nonradar observations, respectively (Gasperoni et al. 2020). The vertical length scale is a scale height calculated from the difference in the natural logarithm of pressure (hPa) between the observation height and the height of the model grid point being updated.
A fixed physics configuration is used during DA, similar to the operational High-Resolution Rapid Refresh Ensemble (HRRRE; Benjamin et al. 2016; Jankov et al. 2019). These physics schemes include the Mellor–Yamada–Nakanishi–Niino (Nakanishi and Niino 2009) boundary layer parameterization, Thompson et al. (2008) microphysics parameterization, Rapid Update Cycle (Smirnova et al. 2016) land surface model, and Rapid Radiative Transfer Model (Mlawer et al. 1997) radiation parameterization. The lateral boundary conditions during DA for the first 20 members are driven by GEFS forecasts from NCEP, initialized at the most recent GEFS cycle time before the first DA cycle on each case. For example, for the ensemble forecast initialized at 2300 UTC 16 May 2015 with DA beginning at 1700 UTC 16 May 2015, the GEFS forecasts initialized at 1200 UTC 16 May 2015 are used as boundary conditions for the first 20 background forecasts in the DA ensemble. Similarly, the most recent Short-Range Ensemble Forecast (SREF) cycle from NCEP is used to drive members 21–40 in the DA ensemble background forecasts. Two different ensemble systems were combined to drive the lateral boundaries of the 40-member DA ensemble because the ensemble size for GEFS and SREF individually is 21 and 22 members, respectively.
In this study, a smaller domain of 400 × 400 grid points (i.e., 1200 km × 1200 km; e.g., Figs. 1a and 2a) is used to investigate the interaction between IC and physics perturbation methods. We denote these domains as “forecast domain.” These forecast domains are centered differently for each case to focus on the convective systems of interest (Figs. 1b–f and 2b–f). The forecast domain is smaller than CONUS in order to make the large number of experiments that were conducted computationally tractable. Following the OU MAP real time forecast ensembles from the 2017–19 Hazardous Weather Testbed (HWT) Spring Forecasting Experiments (SFEs), a 10-member forecast ensemble is adopted, using the EnVar control analysis and 9 perturbed members as defined in section 2c for the different experiments. The lateral boundary conditions for the forecast domains are specified using forecasts from the operational GEFS ensemble members. The LBC tendency is updated using the analysis over the forecast domain before launching the free forecast (Barker et al. 2012).
The forecast ensembles are configured with two different multiphysics configurations (Tables 2 and 3 correspond to “phys1” and “phys2,” respectively), as well as two fixed-physics configurations using each of the members in bold font in Table 3 (members 001 and 002 correspond to “fixed1” and “fixed2,” respectively). These configurations are chosen to emphasize the impact of physics diversity on the differences among IC perturbation experiments by comparing them to a fixed-physics ensemble while also considering the robustness of the results to other plausible configurations of a multi- or fixed-physics ensemble. The forecast ensembles are configured to replicate a current operationally practical ensemble size (10 members), one of which is an EnVar deterministic analysis and the others are perturbed analyses from the coupled EnKF part of the hybrid EnVar system.
c. Definition of IC perturbation methods
All forecast experiments in this study are initialized with the same 10-member ensemble-mean multiscale IC, obtained from the mean of the EnVar control analysis and the EnKF analyses from the first 9 DA ensemble members as used in the real time OU MAP forecast ensembles in the 2017–19 HWT SFEs. Centering the perturbations around the ensemble mean, rather than the control analysis, ensures both a zero-mean of the perturbations and that only the perturbations of each member from the shared ensemble mean are different among the experiments. The experiment that directly uses the analysis perturbations from the DA system described in section 2b is referred to as “MULTI.” An experiment referred to as “LARGE” uses IC perturbations taken from the perturbations of the GEFS members from their own ensemble mean before being added to the MULTI ensemble mean. The GEFS perturbations are obtained from short-term (0–6 h) forecasts from the operational global ensemble at NCEP.
The GEFS spectral model forecasts for these 2015–16 case studies were run with a T574 (~34-km) resolution (Zhou et al. 2017) and were obtained from the National Centers for Environmental Information archive on a 0.5° (~50-km) grid. Although moist convection on scales of ~1–10 km is an important contributor to forecast uncertainty (e.g., Hohenegger and Schar 2007), such processes are not resolved and can only be represented in the LARGE perturbations indirectly through the cumulus parameterization scheme and the stochastic physics schemes applied during the GEFS DA cycles (Zhou et al. 2017). Therefore, we would not expect the LARGE perturbations to accurately reflect the forecast uncertainty in convectively active scenarios (Johnson and Wang 2016).
Finally, experiments referred to as “MULTI_FILTER” and “LARGE_FILTER” are similar to MULTI and LARGE, except that a low-pass filter is applied to each perturbation before adding it to the full-resolution ensemble mean, which is the same for every experiment. MULTI_FILTER and LARGE_FILTER control for the difference in resolvable spatial scales between MULTI and LARGE. We choose a simple filter that linearly decreases the response function from 1.0 to 0.0 over a range of wavelengths, rather than a spectral truncation, to minimize the introduction of any unphysical artifacts of the filtering process itself. There are three experiments each of MULTI_FILTER (MULTI_FILTER1, MULTI_FILTER2 and MULTI_FILTER3) and LARGE_FILTER (LARGE_FILTER1, LARGE_FILTER2 and LARGE_FILTER3) corresponding to low-pass filters with the response functions shown in Fig. 3. The filtering was performed using the 2D discrete cosine transform (DCT2D; Denis et al. 2002) by converting the perturbations to spectral space using the DCT2D, reducing the amplitude of the spectral coefficient by a factor between 0.0 and 1.0 depending on the wavelength of the spectral coefficient, then converting the modified coefficients back to physical spacing using the inverse DCT2D. Figure 4 illustrates qualitatively the differences among several experiments for a single perturbation of the u component of wind at model level 5 (~850 hPa). The presence of convective-scale detail in MULTI but not LARGE is clear from comparing Figs. 4a and 4b, respectively. However, while the MULTI_FILTER3 and LARGE_FILTER3 perturbations have similar spatial scale, they also have different spatial patterns which sample different modes of IC uncertainty, as seen by comparing Figs. 4c and 4d, respectively.
a. Nonprecipitation variables
The distribution of perturbation energy across spatial scales in the different experiments is first evaluated using domain-wide perturbation energy spectra of nonprecipitation variables. The variables considered include u and υ wind components and potential temperature, at model levels 5 (~850 hPa), 18 (~500 hPa), and 29 (~250 hPa). Results were generally consistent among the different variables and levels, so Fig. 5 shows u wind at model level 5 as a representative example. The spectra in Fig. 5 are calculated as the average of all 1D latitudinal spectra (Skamarock 2004). The spectra are calculated using the discrete cosine transform, and averaged over all 10 cases and all 10 ensemble members.
While the initial differences in perturbation energy on small scales between MULTI and LARGE (Fig. 5a) are rapidly reduced, there remains a pronounced difference on meso-β scales of ~30 to ~300 km at the 3-h forecast time (Fig. 5d). The impact of physics diversity dominates over the impact of IC perturbation method on larger scales, as indicated by the primary clustering of the spectra according to the physics configuration on scales greater than ~400 km (Fig. 5). However, the impact of IC uncertainty is dominant on the mesoscales for these lead times. In general, as lead time increases the relative impact of the physics configuration dominates on increasingly smaller scales. At the 1-h lead time the spectra cluster by IC up to ~400 km (Fig. 5b), while at the 2-h (3-h) lead time this clustering is only seen on scales up to about 200 km (100) km (Figs. 5c,d).
In terms of the total spread of nonprecipitation variables, the differences between IC perturbation methods generally dominate during the first ~3 h while the differences among physics configurations generally dominate after the first ~6 h (Fig. 6). The relative impact of the IC perturbation method lasts a little longer at upper levels (Figs. 6b,d) than at lower levels (Figs. 6a,c). For each physics configuration, MULTI tends to maintain slightly more spread than the corresponding LARGE experiment even after the time when the physics configuration dominates the clustering of the lines.
b. Hourly accumulated precipitation
The experiments are next evaluated in terms of the skill of probabilistic forecasts of hourly accumulated precipitation in mesoscale neighborhoods. Precipitation thresholds of 2.54, 6.35, and 12.7 mm h−1 were all evaluated and found to provide similar conclusions. Therefore, the representative threshold of 6.35 mm h−1 is used in the following figures. The verification is based on the neighborhood maximum ensemble probability (NMEP; Schwartz and Sobash 2017) with a neighborhood radius of 15 km. Other neighborhood radii revealed similar qualitative conclusions (not shown). To avoid arbitrarily defining a “reference” forecast, the Brier skill score is used to evaluate the skill of forecasts from experiment A, with respect to the forecasts from experiment B, as follows:
where BS is the Brier score (Brier 1950) of the NMEP forecast from the corresponding experiment. In this context a positive BSS indicates a more skillful forecast with experiment A than experiment B, while a negative BSS indicates a more skillful forecast with experiment B than experiment A. When evaluating skill over all 10 cases, statistical significance is determined using paired sample permutation resampling (Hamill 1999; Johnson and Wang 2012).
MULTI shows a skill advantage over LARGE for all experiments (Fig. 7a). The advantage generally decreases with lead time, although it is not clear how much of the loss of MULTI advantage with forecast lead time is simply a result of convection dissipating or moving out of the model domain at later times for these cases. The MULTI advantage generally corresponds to an improved reliability of the probabilistic forecasts (e.g., at the 1–3-h lead times in Fig. 8). In both the fixed and multiphysics MULTI ensembles, there is less underforecasting at the low (below about 30%) forecast probabilities than all LARGE ensembles (Fig. 8). For each physics configuration, the MULTI ensemble also shows reduced overforecasting at the higher (above about 60%) forecast probabilities than the corresponding LARGE ensemble. The reduced overconfidence of precipitation forecasts in MULTI compared to LARGE at early lead times is consistent with the pronounced increase in spread of nonprecipitation variables at these times in Fig. 6.
The impact of including the ensemble physics diversity is generally to reduce the skill advantage of MULTI over LARGE (red and orange lines; Fig. 7a). However, even with the physics diversity a clear advantage of MULTI over LARGE is still found. The relative sensitivity of the NMEP forecasts to IC and physics perturbations is further evaluated by calculating at each grid point the absolute value of the difference between the MULTI_FIXED1 and LARGE_FIXED1 NMEP forecast, minus the difference between the LARGE_FIXED1 and LARGE_PHYS2 NMEP forecast. Positive (negative) values of sensitivity indicate that the LARGE_FIXED1 forecast is more sensitive to changing the IC (physics) configuration than the physics (IC) configuration (Fig. 9). On average, a dominant impact of IC configuration relative to the physics configuration is greatest at the early lead times and lasts longer for the strongly forced cases than the weakly forced cases (Fig. 9). At later lead times, the greater relative impact of physics configuration for strongly forced cases (blue line; Fig. 9) than weakly forced cases (green line; Fig. 9) is likely due to greater overall precipitation on the strongly forced cases (not shown).
For ease of qualitative interpretation, the sensitivity values are smoothed with a Gaussian convolution with radius 90 km before plotting an illustrative example from the (weakly forced) case of forecasts initialized at 0400 UTC 26 June 2015 in Fig. 10. Figure 10 shows the overall transition from IC configuration dominated probabilistic forecast sensitivity (positive values) to physics configuration dominated sensitivity (negative values), consistent with Fig. 9. However, the transition is not uniform across the domain, and there are some exceptions to the overall trend. For example, a particularly large area of physics-dominated sensitivity develops at forecast hours 4–6 in the vicinity of the maturing MCS in northern Missouri, and at forecast hours 9–12 (and beyond) in the vicinity of another maturing MCS in southeast Kansas and southwest Missouri (Fig. 10). Areas of IC-dominated sensitivity also temporarily develop after the first few forecast hours in areas of newly initiating convection, such as in northeast Kansas at forecast hours 7–8 and in northeast to central Iowa at forecast hours 5–12 (Fig. 10). Similar qualitative trends were also observed in other cases, in addition to areas of temporary IC-dominated sensitivity around the dissipation time of an MCS in a few other cases (not shown). Thus, while improved physics diversity can lead to similar skill improvements as the improved IC perturbations from MULTI instead of LARGE, the uncertain processes sampled by the IC and physics perturbations can be quite different. An optimal ensemble configuration therefore should include both multiphysics and multiscale IC perturbations that are optimized together.
c. Skill contributions from different spatial scales
We now consider the contributions to the MULTI skill advantage from different spatial scales in the context of the best-performing phys2 physics configuration. As shown in Fig. 7b by the generally reduced skill of MULTI_FILTER1, MULTI_FILTER2, and MULTI_FILTER3 with respect to LARGE_FILTER1, LARGE_FILTER2, and LARGE_FILTER3 (blue, cyan, and green lines, respectively; Fig. 7b) compared to the skill of MULTI with respect to LARGE (black line; Fig. 7b) at lead times of ~3–8 h, most of the advantage of MULTI over LARGE during these times comes from scales that are not well resolved by the LARGE perturbations. At the later lead times, the skill differences are generally not statistically significant. Although consistent differences among the lines in Fig. 7b are difficult to discern after 10-h lead time, the MULTI_FILTER1 skill (blue line) is reduced from the MULTI skill (black line) mainly during forecast hours ~5–9. The MULTI_FILTER2 (cyan line) and MULTI_FILTER3 (green line) skills are reduced compared to the MULTI skill (black line) even during the first few forecast hours.
For the nonprecipitation variables, upscale growth of the small-scale, flow-dependent IC perturbations is evident at even longer lead times. Figure 11 shows the ensemble spread after smoothing the perturbations with a Gaussian convolution with a radius of 90 km before calculating the spread. Comparing the smoothed spread in MULTI and LARGE (Fig. 11a) to the smoothed spread in MULTI_FILTER3 and LARGE_FILTER3 (Fig. 11b) reveals the impact of upscale growth from the filtered small-scale perturbations to the larger-scale spread. Although the relatively large-scale spread is greater in MULTI than LARGE throughout the forecast period (Fig. 11a), the large-scale spread in LARGE_FILTER3 is generally greater than in MULTI_FILTER3 throughout most of the forecast period (Fig. 11b).
The impact of this upscale growth in spread for nonprecipitation variables on the skill of precipitation forecasts was investigated subjectively for several cases and is now described with a representative case as an example. In the 26 June 2015 case, MULTI generally remains more skillful than LARGE throughout most of the forecast period (Fig. 12). The skill advantage of MULTI over LARGE at later lead times is still present on this case when filtering the IC perturbations to a commonly resolved spatial scale (green line; Fig. 12). The skill results from enhanced spread of some convection in Iowa that developed several hours after the IC time (Fig. 13). The enhanced spread can be traced back to the IC perturbations in the vicinity of an upstream upper-level shortwave disturbance (Fig. 14). Although LARGE is able to resolve this feature, the structure of the spread, in terms of variance of the υ component of wind at jet stream level, is different between MULTI and LARGE. While LARGE generally has a larger amplitude of spread here in the first few hours (e.g., Figs. 14b,e), the MULTI perturbations exhibit greater growth such that there is more spread in the location of the shortwave disturbance in MULTI than LARGE by forecast hour 5 when it is supporting the development of new convection in Iowa (Figs. 14c,f). The spread in the shortwave location is implied by the variance of υ wind just ahead of and behind the shortwave trough axis (Figs. 14c,f). This case thus demonstrates an example of better upscale growth of the MULTI perturbations to nonprecipitation variables surrounding the upper-level shortwave disturbance leading to improved ensemble forecasts of the uncertainty in the initiation and evolution of precipitating convection at later lead times, compared to the slower-growing LARGE perturbations to the initial shortwave disturbance.
4. Summary and conclusions
While previous studies have demonstrated advantages of multiscale ensemble IC perturbations from DA on a convection-permitting grid, to the authors’ knowledge these studies have not yet been conducted in the context of also having physics diversity in the ensemble configuration. The present study compares ensemble forecasts with IC perturbations generated during multiscale ensemble DA on the convection-permitting grid (MULTI) to forecasts with IC perturbations interpolated from a global ensemble (LARGE) on 10 retrospective cases in the context of two different fixed-physics and two different multiphysics ensembles.
While the small-scale variance in the LARGE ensemble quickly spins up to a similar perturbation energy spectrum as MULTI during the first few forecast hours, significant advantages of MULTI for probabilistic forecasts of hourly accumulated precipitation are found well beyond that time. The magnitude of this advantage is somewhat reduced, but is still present, when the ensemble members also contain different physics configurations compared to the fixed-physics ensembles. The variance of nonprecipitation variables is mostly determined by the physics configuration on spatial scales larger than ~400, ~200, and ~100 km at the 1-, 2-, and 3-h lead times, respectively. It is mostly determined by the IC configuration on the ~10–400-, ~30–200-, and ~30–100-km spatial scales at the 1-, 2-, and 3-h lead times, respectively. The total variance of nonprecipitation variables is generally larger in MULTI than LARGE, for any given physics configuration, throughout the 18-h forecast period.
The variance of smoothed forecast perturbations was greater in MULTI than LARGE, but was greater in LARGE_FILTER1 than MULTI_FILTER1. This shows the importance of upscale growth of the flow-dependent small-scale IC perturbations for maintaining forecast spread even out to the 18-h forecast lead time. The advantage of MULTI over LARGE for precipitation forecasts was significantly reduced by filtering the small scales out from the IC perturbations, further emphasizing the importance of the initial small-scale perturbations even after the nonprecipitation variables have spun up similar spread on the small scales. These results suggest that it is only after ~10 h of forecast lead time that the small-scale IC perturbations could potentially be neglected without loss of precipitation forecast skill. However, this also suggests that there is substantial upscale growth from the convective and meso-β scales of flow-dependent IC perturbation that affects the quality of the ensemble spread throughout the 18-h forecast period. Thus, it is possible that the importance for precipitation forecasting of sampling the meso-β-scale uncertainty with IC perturbations from a multiscale DA system may last even longer in cases containing more active convection at even later lead times than the cases considered in this study. Furthermore, additional cases with active convection at later lead times may also reveal more pronounced advantages of MULTI over LARGE even on the commonly resolved scales.
Qualitative evaluations of the relative impact of the physics and IC perturbation configuration revealed that the potential advantages of optimizing the multiscale IC configuration in a system that also uses physics diversity to maintain ensemble spread can be quite dependent on the physical processes controlling convective evolution. In particular, cold pool–driven MCS evolution was more strongly determined by the ensemble physics diversity while the IC perturbation configuration was particularly important for cases of newly initiating convection or the timing of MCS decay. Consistent with previous studies such as Stensrud et al. (2000), the relative dominance of the IC perturbation configuration for precipitation forecasts persists longer on strongly forced cases than weakly forced cases.
Previous studies have suggested that, compared to limited area regional models, global-scale models may be better suited for initializing longer-lead-time forecasts than considered in the present study (e.g., Merkova et al. 2011). In the context of IC perturbation design, it seems plausible that global forecasts cycled at 6-h intervals would better suited to sample the growing modes of IC uncertainty than the more frequently cycled multiscale DA system (Pena and Kalnay 2004). Therefore, further forecast skill improvements may result from blending the subsynoptic-scale perturbations from the multiscale DA system with the synoptic and larger-scale perturbations from the operational global ensembles. Additional cases and larger forecast domains with longer forecast lead times would be needed to conclusively test this hypothesis. Such experiments are ongoing and will be reported in a future study.
Data availability statement
All WRF forecast data produced during this study have been archived locally and are available from the authors upon request.
The work is primarily supported by Grant NA17OAR4590116. The authors appreciate discussions and model datasets contributed to this work by Nicholas Gasperoni and Yongming Wang in the OU MAP lab. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation Grant ACI-1053575. Some of the computing for this project was also performed at the OU Supercomputing Center for Education and Research (OSCER) at the University of Oklahoma (OU).