After you have read this section, you should be able to assess the skill of a deterministic forecast using both graphical methods and statistical metrics.
So far in this lesson, we have highlighted when we need to assess a forecast. A natural follow-on is how do we assess a forecast? There are many different ways to test the forecast skill, both qualitatively and quantitatively. The choice of technique depends upon the type of predictand and what we're going to use the forecast for. In this section, we will discuss techniques used specifically for deterministic forecasts, which are forecasts in which we state what is going to happen, where it will happen and when (such as tomorrow’s temperature forecast for New York City).
As always, plotting the data is an essential and enlightening first step. In this case, the goal is to compare each forecast to the corresponding observation. There are two main types of plots we will focus on for deterministic forecasts: scatterplots and error histograms.
Scatterplots, which you should remember from Meteo 815 and 820, plot the forecast on the horizontal axis and the corresponding observation on the vertical axis. Scatterplots allow us to ask questions such as:
Let’s work through an example. We are going to create a scatterplot of forecasted daily maximum temperature versus observed daily maximum temperature in Ames, IA. Begin by downloading these predictions [1] of daily MOS (Model Output Statistics) GFS (Global Forecast System) forecasted maximum temperature (degrees F) for Ames, IA. You can read more about the forecast here [2]. In addition, download this set of observations [3] of daily maximum temperature (degrees F) for Ames, IA. You can read more about the observational dataset here [4].
Now, let’s start by loading in the data using the code below:
Your script should look something like this:
### Example of MOS Max Temp (https://mesonet.agron.iastate.edu/mos/fe.phtml)
# load in forecast data
load("GFSMOS_KAMW_MaxTemp.RData")
# load in observed data (https://mesonet.agron.iastate.edu/agclimate/hist/dailyRequest.php)
load("ISUAgClimate_KAMW_MaxTemp.RData")
I have already paired the data (and all data in future examples) so you can begin with the analysis. Let’s overlay a linear model fit on the scatterplot. To do this, we must create a linear model between the observations and the predictions. Use the code below:
# create linear model
linearMod <- lm(obsData$maxTemp~
forecastData$maxTemp,na.action=na.exclude)
Now, create the scatterplot with the forecasted maximum temperatures on the x-axis and the observed on the y-axis by running the code below.
You should get the following figure:
The linear fit is quite good, and you can see we do not need to apply any correction.
The second type of visual used in assessing deterministic forecasts is the error histogram. The formula for error is as follows:
An error histogram, as the name suggests, is a histogram plot of the errors. You can ask the following questions when using an error histogram:
Now, let’s work on an example using the data from above. To start, calculate the error for each forecast and observation.
# calculate error error <- forecastData$maxTemp-obsData$maxTemp
Now, estimate the 1st, 5th, 10th, 90th, 95th, and 99th percentiles which we will include on the error histogram.
# estimate the percentiles (1st, 5th, and 10th)
errorPercentiles <- quantile(error,
probs=c(0.01,0.05,0.1,0.9,0.95,0.99),
na.rm=TRUE)
Now, plot the error histogram using the ‘hist’ function. Run the code below:
You should get the following figure:
The figure above shows the frequency histogram of the errors with confidence intervals at 1% (1st and 99th percentile), 5% (5th and 95th percentile), and 10% (10th and 90th percentile). The errors appear to be normally distributed.
Now that we have visually assessed the forecast, we need to perform some quantitative assessments. As a note, most of the statistical metrics that we will discuss should be a review from Meteo 815. If you need a refresher, I suggest you go back to that course and review the material.
There are four main metrics that we will discuss in this lesson.
# initialize errorRef variable
errorRef <- array(NA,length(obsData$Date))
# create climatology of observations and reference error (anomaly)
for(i in 1:12){
index <- which(as.numeric(format(obsData$Date,"%m"))==i)
# compute mean
monthMean <- mean(obsData$maxTemp[index],na.rm=TRUE)
# estimate anomaly
errorRef[index] <- obsData$maxTemp[index]-monthMean
}
mseRef <- sum((errorRef^2),na.rm=TRUE)/length(errorRef)
Links
[1] https://pennstateoffice365-my.sharepoint.com/:u:/g/personal/sas405_psu_edu/EUQLlzJEpPJJv37X9Fz7gr4BdJaTJ78NLH2r80XJ5M7i3A?download=1
[2] https://mesonet.agron.iastate.edu/mos/fe.phtml
[3] https://pennstateoffice365-my.sharepoint.com/:u:/g/personal/sas405_psu_edu/Eehp7Qx3rn9AhBAreH6QMVMBrmKJ6vzb-FHuhSr6e5UU-g?download=1
[4] https://mesonet.agron.iastate.edu/agclimate/hist/dailyRequest.php