How to Extract Information from the Decision Rules in rpart Package?

The rpart package in R is widely used for creating decision tree models. Decision trees are valuable because they provide a clear and interpretable set of rules for making predictions. Extracting and understanding these rules can offer insights into how the model makes decisions and which features are most important. This article will guide you through extracting information from the decision rules created by the rpart package in R.

Creating a Decision Tree

Before we can extract information from decision rules, we need to create a decision tree. For this example, we will use the iris dataset.

Install and Load Necessary Libraries

Ensure you have the rpart and rpart.plot packages installed and loaded.

R
install.packages("rpart")
install.packages("rpart.plot")
library(rpart)
library(rpart.plot)

Load the Dataset

Load the built-in iris dataset in R Programming Language.

R
data(iris)

Create a Decision Tree Model

Create a decision tree model using the rpart function.

R
set.seed(123)  # Set seed for reproducibility
tree_model <- rpart(Species ~ ., data = iris, method = "class")

Plot the Decision Tree

Visualize the decision tree using the rpart.plot function.

R
rpart.plot(tree_model)

Output:

Extract Information from the Decision Rules in rpart Package

Extracting Information from Decision Rules

To understand and extract the decision rules from the tree model, we can use various functions and methods.

Print the Detailed Summary of the Tree

The printcp function provides a detailed summary of the decision tree, including the complexity parameter and error rates.

R
printcp(tree_model)

Output:

Classification tree:
rpart(formula = Species ~ ., data = iris, method = "class")

Variables actually used in tree construction:
[1] Petal.Length Petal.Width

Root node error: 100/150 = 0.66667

n= 150

CP nsplit rel error xerror xstd
1 0.50 0 1.00 1.20 0.048990
2 0.44 1 0.50 0.76 0.061232
3 0.01 2 0.06 0.07 0.025833

Extract the Rules

The rpart package allows you to extract decision rules using the path.rpart function or by directly parsing the model.

R
# Extract rules from the tree model
rules <- path.rpart(tree_model, node = 1:tree_model$frame$n)

# Print the extracted rules
for (i in 1:length(rules)) {
  cat(paste("Rule for Node", i, ":\n"))
  cat(paste(rules[[i]], collapse = "\n"), "\n\n")
}

Output:

Rule for Node 1 :
root

Rule for Node 2 :
root
Petal.Length< 2.45

Rule for Node 3 :
root
Petal.Length>=2.45

Rule for Node 4 :
root
Petal.Length>=2.45
Petal.Width< 1.75

Rule for Node 5 :
root
Petal.Length>=2.45
Petal.Width>=1.75

Detailed Node Information

You can also extract detailed information about each node, including the split condition, number of observations, and predicted class.

R
# Extract detailed node information
tree_details <- as.data.frame(tree_model$frame)

# Display node details
print(tree_details)

Output:

           var   n  wt dev yval complexity ncompete nsurrogate    yval2.V1
1 Petal.Length 150 150 100 1 0.50 3 3 1.00000000
2 <leaf> 50 50 0 1 0.01 0 0 1.00000000
3 Petal.Width 100 100 50 2 0.44 3 3 2.00000000
6 <leaf> 54 54 5 2 0.00 0 0 2.00000000
7 <leaf> 46 46 1 3 0.01 0 0 3.00000000
yval2.V2 yval2.V3 yval2.V4 yval2.V5 yval2.V6 yval2.V7
1 50.00000000 50.00000000 50.00000000 0.33333333 0.33333333 0.33333333
2 50.00000000 0.00000000 0.00000000 1.00000000 0.00000000 0.00000000
3 0.00000000 50.00000000 50.00000000 0.00000000 0.50000000 0.50000000
6 0.00000000 49.00000000 5.00000000 0.00000000 0.90740741 0.09259259
7 0.00000000 1.00000000 45.00000000 0.00000000 0.02173913 0.97826087
yval2.nodeprob
1 1.00000000
2 0.33333333
3 0.66666667
6 0.36000000
7 0.30666667

tree_model$frame contains detailed information about each node in the decision tree, including variables used for splitting, number of observations, and more.

Visualize Important Splits

Plotting the variable importance can help you understand which variables are most influential in the decision-making process.

R
# Extract and plot variable importance
importance <- tree_model$variable.importance
barplot(importance, main = "Variable Importance", col = "lightblue", las = 2)

Output:

Extract Information from the Decision Rules in rpart Package

Convert the Tree to Rules

The rattle package can convert the decision tree into readable rules.

R
install.packages("rattle")
library(rattle)

# Convert the decision tree to rules
asRules(tree_model)

Output:

 Rule number: 2 [Species=setosa cover=50 (33%) prob=1.00]
Petal.Length< 2.45

Rule number: 7 [Species=virginica cover=46 (31%) prob=0.00]
Petal.Length>=2.45
Petal.Width>=1.75

Rule number: 6 [Species=versicolor cover=54 (36%) prob=0.00]
Petal.Length>=2.45
Petal.Width< 1.75

The rattle package simplifies the decision tree into readable rules, facilitating easier interpretation.

Conclusion

Extracting and understanding decision rules from the rpart package in R is a valuable skill for interpreting decision tree models. By following the steps outlined in this article, you can create a decision tree, extract detailed decision rules, and gain insights into the model’s decision-making process. This enhances the transparency and interpretability of your machine learning models, providing clearer insights for decision-making.



Contact Us