METEO 810
Weather and Climate Data Sets

The NOMADS Ensembles

Prioritize...

After completing this section, you will be able to retrieve data from the ensemble runs of a particular model. Ensemble data gives you an indication of the variability (and to some degree uncertainty) regarding a particular model solution. Creating model "spaghetti plots" helps visualize how the solutions to the ensembles diverge over time. 

Read...

To retrieve model ensemble members from NOMADS, we start with code that looks very similar to previous retrievals. There are two exceptions... First, we need to specify a model that contains the ensemble members. In this case, we chose the model name "gens". Remember that you can query all of the model names and other information with NOMADSRealTimeList("dods"). Second, we need to define the number of ensembles to retrieve. This variable is passed to DODSGrab along with the other parameters.

# check to see if rnoaa has been installed
# if not, install it
if (!require("rNOMADS")) {
  install.packages("rNOMADS") }

library(rNOMADS)

# get the available dates and urls for this model (14 day archive)
# "gens" is the GFS ensemble run
model.urls <- GetDODSDates("gens")
latest.model <- tail(model.urls$url, 1)

# Get the model runs for a date
model.runs <- GetDODSModelRuns(latest.model)
latest.model.run <- tail(model.runs$model.run, 1)

# set the variable, location, times, levels (if needed)
# and most importantly the ensemble members
variable <- c("tmp2m")
time <- c(0, 40) 
lon <- c(280, 290) 
lat <- c(130, 135) 
levels    <- NULL
ensemble <- c(0, 20)  #All available ensembles

# retrieve the data. Notice the included the "ensemble" keyword
model.data <- DODSGrab(latest.model, latest.model.run, variable, time, lon, lat, levels = levels, ensemble = ensemble)

# reorient the lat/lon (makes it easier to navigate)
model.data$lon<-ifelse(model.data$lon>180,model.data$lon-360,model.data$lon)

Now the ensemble members exist as one of the dimensions of the retrieved data (just as time and lat/lon). There are many ways to look at this data. They way I do it here is to create a function called assemble_data which subsets each ensemble, looks for the point I need and then adds those values to a data frame called model.timeseries. The sapply command loops through a sequence of all the ensembles.

# This will be our placeholder for the specific point data we want
model.timeseries <- data.frame()

assemble_data <- function (ens_member) {    
  
  model.data.sub <- SubsetNOMADS(model.data, ensemble = ens_member)
  
  model.value<-model.data.sub$value[which(model.data.sub$lon==-78 & model.data.sub$lat==41)]
  model.fdate<-model.data.sub$forecast.date[which(model.data.sub$lon==-78 & model.data.sub$lat==41)]
  model.value<-(model.value-273.15)*9/5+32
  
  # add the row to the data frame
  row <- data.frame(Ensemble=ens_member, Date=model.fdate, Value=model.value)
  model.timeseries <<- rbind(model.timeseries,row)
  }

sapply(seq(ensemble[1]+1,ensemble[2]+1), assemble_data)

Now we have all of the ensemble predictions stored in model.timeseries. To plot all of these solutions, I do something a bit different. I first create a series of arrays of the x and y values. Next, I create the empty plot (without an x-axis... xaxt="n"). Then I add a custom x-axis that has specific times, rather than letting the plot draw them in. Finally, I use the command mapply to plot a line for each ensemble member, using the arrays of x and y values and a rainbow color scheme.

attr(model.timeseries$Date, "tzone") <- "GMT"

xvals <- tapply(model.timeseries$Date,model.timeseries$Ensemble,
                function(x) return(x))
yvals <- tapply(model.timeseries$Value,model.timeseries$Ensemble,
                function(x) return(x))

# plot out all of the ensemble solutions (called "plumes")
plot(model.timeseries$Date,model.timeseries$Value,type="n",
     xlab = "Date", ylab = "Temperature (F)", xaxt="n",
     main = paste0("GFS Ensembele 2-m Temperature\n",
                   head(model.timeseries$Date,1),"Z to ",
                   tail(model.timeseries$Date,1),"Z")
)

times=seq(xvals[[1]][1],xvals[[1]][length(xvals[[1]])], by="24 hours")
axis(1, at=times, labels=format(times, "%HZ %b %d"))

mapply(lines, xvals, yvals,lwd=2,lty=1,col=rainbow(21))

Here is the "spaghetti" plot that is produced by the above code. Notice that the ensemble members start out following nearly the same solution, but as time goes on, the ensemble members began to diverge. The timing of the divergence along with the amount can tell you something about how unstable a given model forecast is. Be suspect of the forecast when the predicted variable ensembles diverge rapidly.

As an alternative, let's instead consider a box plot of the suite of solutions. The code below produces this box plot. Notice here we can see clearly where the ensemble solutions begin to diverge substantially. However, as discussed in the previous section, bulk statistics computed on ensemble members (such as the median value) can be misleading if the solutions are, for example, bi-modal.

# Make a boxplot of the same data
boxplot(model.timeseries$Value~model.timeseries$Date,
        xlab = "Date", ylab = "Temperature (F)",
        main = paste0("GFS Ensembele 2-m Temperature\n",
                      head(model.timeseries$Date,1),"Z to ",
                      tail(model.timeseries$Date,1),"Z"),
        las=1,
        xaxt="n",
        pars=list(medcol="red", whisklty="solid", 
                  whiskcol="blue", staplecol="blue")
        )
times=seq(xvals[[1]][1],xvals[[1]][length(xvals[[1]])], by="24 hours")
axis(1, at=seq(1,length(xvals[[1]]),4), labels=format(times, "%HZ %b %d"))

Before we leave the topic of model data, let me give you a warning and some advice on using numerical weather prediction. Read on.