METEO 825
Predictive Analytic Techniques for Meteorological Data

CART Part 1: Defining Rules

Prioritize...

After you have read this section, you should be able to interpret a tree diagram and understand how rules are built.

Read...

As I mentioned earlier, CART, or Classification and Regression Trees, is a rule-based machine learning method. It’s a common AI technique used in the field of weather and climate analytics. We actually used regression trees in Meteo 815 when we discussed data mining, so this should not be the first time you have heard of the method. In this lesson, however, we will dive deeper into the details of CART and explore exactly how the tree forms and how the rules are built that define the tree. Read on to learn more.

A Tree of Rules

In the end, we will have a set of rules that are used to predict the categorical outcome. So, how does the tree of rules work? Each case goes to the first rule and gets sent down one branch of a split (Yes or No). Each of the two branches leads to another rule. The case goes into the next rule and splits again, getting sent down another branch. This process continues to the tips of the branches (leaves).

The goal is to have all the cases that end up at a given branch tip to have the same categorical outcome. Below is an example of a decision tree for flight rules.

Decision tree for flight rules. Refer to text below.
Example of a decision tree for flight rules
Credit: J. Roman 

You start at the top with all of the observations (or whatever other inputs you are using). You follow Rule 1, which will lead you down one of the branches. Whichever branch you follow, you then go to Rule 2 which will lead you down two more branches which end up at the branch tip with the final outcome. The branch tip is formally called the leaf.

Building a Rule

Before we can build our tree, we need to build the rules. Each rule takes the form:

If predictor > threshold, then Yes, else No

Thus, we can build different rules using different predictors. And we can create different rules by changing the threshold applied to any one predictor.

We build rules by testing all the predictors to see which one gives us the biggest increase in purity (e.g., the biggest decrease in H). For each predictor, we have to test multiple threshold values to see which works best. Usually, about 20 thresholds spread across the range of a predictor is sufficient. We test each of these predictor threshold combinations on our entire set of training cases (cases where we know both the predictors and the actual categorical outcome).

Below is an example of a purity calculation for different thresholds.

Minneapolis, MN Predict Tmin less than 32 from Tmax. Refer to text below.
Purity for different maximum temperature thresholds in Minneapolis, MN to predict if the minimum temperature will be less than freezing
Credit: J. Roman

The goal was to predict whether the minimum temperature would be below 32°F in Minneapolis, MN based on the maximum temperature from the day before. I created 12 thresholds starting at 35°F and going up to 60°F. For each threshold, I estimated the probability of observing a Tmin < 32°F given Tmax was between the threshold values. There is an optimal threshold with worse (bigger) values of H on either side (this threshold is 43°F to 45°F). The H value for the optimal threshold is 0.15, which means it does not yield 0 or a perfect purity. This just means that our one forecast rule wasn’t enough to give perfect forecasts on the training data. Hence the need for multiple rules (e.g., CART).