METEO 825
Predictive Analytic Techniques for Meteorological Data

Probabilistic Forecasts

Prioritize...

Once you have completed this section, you should be able to quantitatively and qualitatively assess a probabilistic forecast.

Read...

Now, we are on to the final forecast type, which we will discuss in this lesson. A probabilistic forecast is much like a categorical forecast, except that the forecast system is allowed to express its uncertainty by giving the probability of an event instead of just a yes or a no. While this provides more useful information for the decision maker, it requires different verification procedures, more closely related to those we used for continuous variables. Read on to learn more about how to assess the skill of a probabilistic forecast.

Brier Score

The basic metric for a probabilistic forecast is a variant of the MSE called the Brier Score (BS). It measures the mean squared difference of the predicted probability of the outcomes to the actual outcome. Mathematically:

BS= 1 N i=1 N ( f i o i ) 2

Where fi is the probability (e.g. 25% chance) that was forecasted of an event occurring and oi is the actual outcome (0 if the event does not occur and 1 if it does occur). The range of BS is 0 to 1 with a perfect score being 0. The closer to 0 the better.

Let’s try estimating the BS for a Probability of Precipitation (PoP) forecast. Again, we will use MOS GFS output and observations from Ames, IA using the same data sources from before. The model predictions can be downloaded here and the observations can be downloaded here. Begin by loading in the data:

Show me the code...
# load in forecast data
load("GFSMOS_KAMW_PoP.RData")

# load in observed data (https://mesonet.agron.iastate.edu/agclimate/hist/dailyRequest.php)
load("ISUAgClimate_KAMW_PoP.RData")

Now, turn the predicted probability of precipitation into a decimal value (instead of a percentage).

Show me the code...
# turn pop into decimal value
f <- forecastData$pop/100

Now, save the observed value (1 if it rained and 0 if it did not).

Show me the code...
# save observed value
o <- obsData$rainfall

Next, estimate the difference squared for each data point.

Show me the code...
# estimate difference squared
errorSquared <- (f-o)^2

Finally, sum up and divide by the length to estimate the Brier Score.

In this example, the BS is 0.15, which is relatively close to 0 suggesting a good forecast skill.

Graphical Methods

BS does not, however, capture the value of a probabilistic forecast to a particular user. The user will be picking some threshold probability above which to assume the event will occur. They’ve got to decide which threshold is best for their particular decision.

Generally, this depends on how expensive false alarms are relative to missed events. For example, consider a hurricane shut-down decision for an oil refinery. If you shut the refinery down when you didn’t have to (false alarm), you will lose several days of production. On the flip side, if you were flooded out while operating (missed events), you might have considerable repairs.

So, we need a way to measure the tradeoff between false alarms and missed events as we change our threshold probability. If you set the threshold near 0%, then you’ll always forecast the event and will have less missed events and more false alarms. If you set the threshold high, you will increase the number of missed events but reduce the number of false alarms.

How do we visualize this? One way is with a Receiver Operating Characteristic (ROC) curve. A ROC curve plots the fraction of events correctly forecasted (True Positive Rate - TPR) on the vertical axis versus the fraction of the non-events which were forecasted to be events (False Positive Rate - FPR) on the horizontal axis. The TPR is the True Positive Rate and is given by the formula below:

TPR= TP TP+FN = a a+b

The FPR is the False Positive Rate (also called the false alarm rate) and is given by the formula below:

FPR= FP FP+TN = c c+d

Let’s give this a try with our PoP example. To begin, initialize variables TPR and FPR using the code below:

Show me the code...
# initialize variables
truePositiveRate <- {}
falsePositiveRate <- {}

Now, create an index that contains the instances when rain was observed and when it wasn’t.

Show me the code...
# create rain index 
rainIndex <- which(o==1)
noRainIndex <- which(o==0)

Now, create a list of threshold probabilities to use for creating the curve.

Show me the code...
# select the probabilities
probs <- seq(0,1,0.1)

Now, loop through the probabilities and estimate the TP and FP rates.

Show me the code...
# loop through probabilities and estimate the True Positive and False Positive rate
for(i in 1:length(probs)){
  popIndex <- which(f > probs[i])
  noPopIndex <- which(f < probs[i])
  TP <- length(intersect(popIndex,rainIndex))
  FP <- length(intersect(popIndex,noRainIndex))
  FN <- length(intersect(noPopIndex,rainIndex))
  TN <- length(intersect(noPopIndex,noRainIndex))
  truePositiveRate[i] <- TP/(TP+FN)
  falsePositiveRate[i] <- FP/(FP+TN)
}

So for each probability threshold, you found when that probability or higher was forecasted (popIndex). Then, you counted the number of times TP, FP, FN, and TN occurred based on that probability threshold (you essentially created the confusion matrix). Finally, you estimated the rate.

Now, we plot the false positive rate on the x-axis and the true positive rate on the y-axis by running the code below.

I’ve included a one-to-one line in blue. I also displayed the probability threshold as a text. Below is a larger version of the figure:

TPR v. FPR for Ames, IA
True Positive Rate vs False Positive Rate for various threshold probabilities for PoP in IA
Credit: J. Roman

As we set our threshold probability high, we end up at the lower left of the curve with few cases forecasted as events, so low rates of both true and false positive. In contrast, if we set our threshold probability low, we forecast the event most of the time, so we move to the upper right end of the ROC curve with high rates of true positive (good) and false positive (bad). The trick here is finding the right threshold for our particular decision cost structure.

Cost-Based Decision Making

So, how do we pick the right threshold? Well, by applying a threshold to a probabilistic forecast, we turn the probabilistic forecast into a categorical forecast. This means we can compute the confusion matrix and all the associated error metrics.

The goal is to find a threshold probability that guides us to the most cost-effective decision. To do this, we need the decision’s cost matrix. This has the same columns (forecast category) and rows (observed outcome category) as the confusion matrix. But it contains losses (or profits) for each combination of forecast and observed category.

For example, below is the cost matrix for a hypothetical refinery in a potential hurricane-prone position:

Hypothetical Decision Cost Matrix for a Refinery
Profit or loss in millions of $ per day Fcst Yes Fcst No
Obs Yes -1.5 -5.0
Obs No -1.0 0.25

Let’s say that the corresponding confusion matrix for a threshold value of p is as follows:

Corresponding Confusion Matrix
Total Count=N Fcst Yes Fcst No
Obs Yes a b
Obs No c d

If we multiply each cell in the cost matrix by the corresponding confusion matrix, sum them up, and divide by the total number of forecasts, we get the expected profit or loss using the forecast system with this threshold (p). In this example, the formula would be:

Expected Profit or Loss= 1.5*a+5.0*b+1.0*c+0.25d N

You then try a range of thresholds from 0 to 100% and see which yields the best financial outcome. The optimal threshold probability depends strongly on the cost matrix, so it’s not the same for any two decisions. This is why probabilistic forecasts are more valuable than categorical forecasts; they can be tuned to the decision problem at hand by picking the best threshold probability.

So let’s give it a try. Below is a hypothetical cost matrix for a bike share company when precipitation greater than 0 occurs in Ames, IA. When precipitation occurs, they lose out on people renting bikes. In addition, if precipitation occurs, and they don’t expect it, the bikes can get ruined if not stored correctly. On the flip side, if precipitation is forecasted, and they remove bikes, but then rain does not occur, they miss out on sales. (Please note, this is all hypothetical).

Hypothetical Decision Cost Matrix for Bike Share Company
Profit or Loss $ Per Day Fcst Yes Fcst No
Obs Yes -200 -500
Obs No -150 50

Now, let’s use our data and apply several different threshold probabilities to estimate the expected profit or loss. To begin, select the thresholds we will test:

Show me the code...
# select the probabilities
probs <- seq(0,1,0.1)

Now, initialize the variable that will save the expected profit or loss for each threshold.

Show me the code...
# initialize variables
expectedProfitLoss <- {}

Now, determine when it rained and when it did not rain.

Show me the code...
# create rain index 
rainIndex <- which(obsData$rainfall==1)
noRainIndex <- which(obsData$rainfall==0)

Now, enter the cost matrix values.

Show me the code...
# enter cost matrix values
costTP <- -200
costFN <- -500
costFP <- -150
costTN <- 50

Now, we can loop through the probabilities and estimate the expected gain/loss.

Show me the code...
# loop through probabilities and estimate the expectedProfitLoss
for(i in 1:length(probs)){
  popIndex <- which(f > probs[i])
  noPopIndex <- which(f < probs[i])
  TP <- length(intersect(popIndex,rainIndex))
  FP <- length(intersect(popIndex,noRainIndex))
  FN <- length(intersect(noPopIndex,rainIndex))
  TN <- length(intersect(noPopIndex,noRainIndex))
  sumProfitLoss <- (costTP*TP)+(costFN*FN)+(costFP*FP)+(costTN*TN)
  expectedProfitLoss[i] <- sumProfitLoss/length(obsData$Date)
}

You could do a number of things to visualize or quantify the best threshold. I will simply make a plot of the threshold probabilities and the expected profit or loss. Run the code below:

Below is a larger version of the figure:

Plot of threshold probabilities and the expected profit or loss. Refer to text below.
Expected profit/loss for hypothetical bike share decision cost matrix for various probability thresholds in IA
Credit: J. Roman

The optimal threshold is 30%, as this minimizes their loss. So, when there is a 30% or larger chance of precipitation, they should store bikes and expect a loss of roughly $60/day.