Prioritize...
After completing this section, you will be able to retrieve data from the ensemble runs of a particular model. Ensemble data gives you an indication of the variability (and to some degree uncertainty) regarding a particular model solution. Creating model "spaghetti plots" helps visualize how the solutions to the ensembles diverge over time.
Read...
To retrieve model ensemble members from NOMADS, we start with code that looks very similar to previous retrievals. There are two exceptions... First, we need to specify a model that contains the ensemble members. In this case, we chose the model name "gens". Remember that you can query all of the model names and other information with NOMADSRealTimeList("dods")
. Second, we need to define the number of ensembles to retrieve. This variable is passed to DODSGrab
along with the other parameters.
# check to see if rnoaa has been installed # if not, install it if (!require("rNOMADS")) { install.packages("rNOMADS") } library(rNOMADS) # get the available dates and urls for this model (14 day archive) # "gens" is the GFS ensemble run model.urls <- GetDODSDates("gens") latest.model <- tail(model.urls$url, 1) # Get the model runs for a date model.runs <- GetDODSModelRuns(latest.model) latest.model.run <- tail(model.runs$model.run, 1) # set the variable, location, times, levels (if needed) # and most importantly the ensemble members variable <- c("tmp2m") time <- c(0, 40) lon <- c(280, 290) lat <- c(130, 135) levels <- NULL ensemble <- c(0, 20) #All available ensembles # retrieve the data. Notice the included the "ensemble" keyword model.data <- DODSGrab(latest.model, latest.model.run, variable, time, lon, lat, levels = levels, ensemble = ensemble) # reorient the lat/lon (makes it easier to navigate) model.data$lon<-ifelse(model.data$lon>180,model.data$lon-360,model.data$lon)
Now the ensemble members exist as one of the dimensions of the retrieved data (just as time and lat/lon). There are many ways to look at this data. They way I do it here is to create a function called assemble_data
which subsets each ensemble, looks for the point I need and then adds those values to a data frame called model.timeseries
. The sapply
command loops through a sequence of all the ensembles.
# This will be our placeholder for the specific point data we want model.timeseries <- data.frame() assemble_data <- function (ens_member) { model.data.sub <- SubsetNOMADS(model.data, ensemble = ens_member) model.value<-model.data.sub$value[which(model.data.sub$lon==-78 & model.data.sub$lat==41)] model.fdate<-model.data.sub$forecast.date[which(model.data.sub$lon==-78 & model.data.sub$lat==41)] model.value<-(model.value-273.15)*9/5+32 # add the row to the data frame row <- data.frame(Ensemble=ens_member, Date=model.fdate, Value=model.value) model.timeseries <<- rbind(model.timeseries,row) } sapply(seq(ensemble[1]+1,ensemble[2]+1), assemble_data)
Now we have all of the ensemble predictions stored in model.timeseries
. To plot all of these solutions, I do something a bit different. I first create a series of arrays of the x and y values. Next, I create the empty plot (without an x-axis... xaxt="n"
). Then I add a custom x-axis that has specific times, rather than letting the plot draw them in. Finally, I use the command mapply
to plot a line for each ensemble member, using the arrays of x and y values and a rainbow color scheme.
attr(model.timeseries$Date, "tzone") <- "GMT" xvals <- tapply(model.timeseries$Date,model.timeseries$Ensemble, function(x) return(x)) yvals <- tapply(model.timeseries$Value,model.timeseries$Ensemble, function(x) return(x)) # plot out all of the ensemble solutions (called "plumes") plot(model.timeseries$Date,model.timeseries$Value,type="n", xlab = "Date", ylab = "Temperature (F)", xaxt="n", main = paste0("GFS Ensembele 2-m Temperature\n", head(model.timeseries$Date,1),"Z to ", tail(model.timeseries$Date,1),"Z") ) times=seq(xvals[[1]][1],xvals[[1]][length(xvals[[1]])], by="24 hours") axis(1, at=times, labels=format(times, "%HZ %b %d")) mapply(lines, xvals, yvals,lwd=2,lty=1,col=rainbow(21))
Here is the "spaghetti" plot that is produced by the above code. Notice that the ensemble members start out following nearly the same solution, but as time goes on, the ensemble members began to diverge. The timing of the divergence along with the amount can tell you something about how unstable a given model forecast is. Be suspect of the forecast when the predicted variable ensembles diverge rapidly.
As an alternative, let's instead consider a box plot of the suite of solutions. The code below produces this box plot. Notice here we can see clearly where the ensemble solutions begin to diverge substantially. However, as discussed in the previous section, bulk statistics computed on ensemble members (such as the median value) can be misleading if the solutions are, for example, bi-modal.
# Make a boxplot of the same data boxplot(model.timeseries$Value~model.timeseries$Date, xlab = "Date", ylab = "Temperature (F)", main = paste0("GFS Ensembele 2-m Temperature\n", head(model.timeseries$Date,1),"Z to ", tail(model.timeseries$Date,1),"Z"), las=1, xaxt="n", pars=list(medcol="red", whisklty="solid", whiskcol="blue", staplecol="blue") ) times=seq(xvals[[1]][1],xvals[[1]][length(xvals[[1]])], by="24 hours") axis(1, at=seq(1,length(xvals[[1]]),4), labels=format(times, "%HZ %b %d"))
Before we leave the topic of model data, let me give you a warning and some advice on using numerical weather prediction. Read on.