Ensemble Forecasting: Spaghetti Plots
Computer forecasts can go awry for several reasons. Right off the bat, numerical weather predictions can "bust" pretty quickly if the model is poorly initialized (in other words, if the initialization does not accurately represent the current state of the atmosphere). Meteorologists compensate by implementing a technique that gauges the sensitivity of computer model forecasts to the way they're initialized. Specifically, forecasters tweak (make minor changes to) the initialization of a model, and then run the model again using this slightly different initial state. Then they tweak the initialization another time, in a slightly different way, and run the model using this new initial state. This process of "tweaking" the initial conditions is repeated a number of times (for some models, many dozen times). If all or most of the tweaked model runs come up with basically the same numerical prediction for a specific forecast day, meteorologists have a relatively high degree of confidence in that day's forecast. If, however, the tweaked model runs predict several noticeably different scenarios for the day in question, then forecasters have a fairly low degree of confidence in the numerical prediction. The collection of different runs from the series of initialization "tweaks" is called an ensemble, and each individual run is an ensemble member.
How do meteorologists use ensemble forecasts? Let's take a look at an example from the field of tropical weather forecasting. The image below shows the results of the GFS ensemble forecast (from the 06 UTC run on August 24, 2005), which likely helped to determine the cone of uncertainty for the future track of Hurricane Katrina (August 23-30, 2005). Here's the color-coded Saffir-Simpson Scale that you can use to interpret Katrina's intensity during the period.
At the time, ensemble members diverged wildly by 96 hours in their forecast of Katrina's track, casting great uncertainty on the eventual landfall location along the Gulf Coast (there was noticeably greater certainty for where the storm would make landfall in Florida). Hopefully, it's becoming clear to you that the GFS ensemble played a role in determining the forecast of the possible future tracks of the developing depression (we're sure that forecasters at the National Hurricane Center also used other models to predict the system's track and get a firmer handle on the cone of uncertainty).
After Katrina crossed southern Florida, it intensified into a major hurricane over the Gulf of Mexico. By the 18 UTC run of the GFS ensemble on August 27, 2005 (see below), there was much greater agreement among ensemble members about the eventual landfall location of Hurricane Katrina along the central Gulf Coast.
Of course, ensemble forecasting has more routine applications than pinning down the cone of uncertainty for major hurricanes. Indeed, weather forecasters use ensemble forecasting on a day-to-day basis to gauge the uncertainty of weather patterns for medium-range (and short-range) prediction. For example, forecasters will look an the ensemble forecasts for specific 500-mb heights in order to assess the uncertainty of regional medium-range forecasts.
To illustrate our point, check out the 24-hour GFS ensemble forecast (below) for the 5760-meter 500-mb height line (21 different members in blue) and 5940-meter 500-mb height line (21 different members in red) over Europe from the 00 UTC run on August 6, 2009. During summer, a 500-mb height of 5760 meters typically marks the southern edge of the relatively "strong" 500-mb westerlies, while heights above 5940 meters typically correspond to hot weather. On the ensemble forecast below, the more northern green contour represents the average position of the 5760-meter height at that time of year, while the more southern green contour marks the average position of the 5940-meter height line. Having the climatological positions of these heights helps meteorologists to gauge whether heights are forecast to be above or below average at this time of year.
Given that this is only a 24-hour forecast, it's not surprising that there was a high degree of confidence with the predicted position of the 5760-meter height line (and the general 500-mb pattern) over Europe (the blue lines are pretty much in agreement). With a ridge over Scandinavia and much of central Europe, for example, a pattern of generally warmer-than-average, dry weather seemed likely over the region at the forecast time. There was a bit more uncertainty associated with the closed 5940-meter contour over northern Africa, but any slight shift in the position of this rather stagnant 500-mb high would not have much impact on the weather pattern over northern Africa; it was going to be hot no matter what!
Okay, now let's look at the 168-hour GFS ensemble forecast from the same run (see below). Look at all those lines! Is it any wonder why some meteorologists call such a graphical depiction of an ensemble forecast a "spaghetti plot." Looks like they would have to use their "noodles" to figure out the forecast. Indeed, there was clearly more uncertainty in the forecast, as you would expect from a 168-hour forecast. There were even a few blue lines south of the climatological position of the 5760-meter height, suggesting cooler-than-average weather over central Europe - but these were the exception, rather than the rule. Eyeballing the mean ensemble forecast (the average position of the 21 members), it seems that there would be warmer-than-average weather over northern Europe around the forecast time (a lot of blue lines north of the climatological position of the 5760-meter height). Still, you just never know (hence the uncertainty). Meanwhile, the 168-hour GFS ensemble forecast suggested that it would get warmer over Spain (compared to the 24-hour ensemble forecast) . But how much warmer? Those red lines (representing ensemble forecasts for the 5940-meter height) are all over the place (lots of spaghetti), with a few lines never really making it to Spain. To complete our story, here's the animation of this GFS ensemble forecast out to 360 hours. Note how the uncertainty associated with the predicted position of the 5760-meter height grows with time.
We note that the European Centre for Medium-Range Weather Forecasting runs an ensemble version of its medium-range model, known as the Ensemble Prediction System (EPS). The EPS has 51 ensemble members and produces 15-day forecasts twice a day, intialized at 00 UTC and 12 UTC. Unlike the suite of ensemble forecasts produced by NCEP, the entire set of EPS forecasts is generally not available for free to the public.
As computational power has increased, the ensemble approach has also been applied to short-range forecasting. For example, NCEP runs a 26-member ensemble forecast consisting of several short-range models four times a day (at 03 UTC, 09 UTC, 15 UTC and 21 UTC). This Short-Range Ensemble Forecast (SREF, for short) provides weather forecasters with a way to gauge the sensitivity of short-range forecasts to small changes in initial conditions. Most of the time, the SREF mean forecast (the average forecast from the 26 members) provides a reasonable starting point for meteorologists to hone their predictions. The bottom line here is that ensemble forecasting is revolutionizing the way meteorologists use the computer models, both in the short range and medium range.
What about forecasts beyond the medium range, at monthly and even seasonal time scales? Research has shown that the medium-range forecasting models lose any hint of skill after approximately two weeks; in other words, the line of people becomes so long that the message of the original whisper is too twisted and mangled to be trusted. Yet the National Weather Service and many private forecasting companies routinely issue monthly and even seasonal forecasts. Let's see how they do it.