How to Draw Decision Boundaries in R

How to Specify Split in a Decision Tree in R Programming?

Decision boundaries are essential concepts in machine learning, especially for classification tasks. They define the regions in feature space where the model predicts different classes. Visualizing decision boundaries helps us understand how a classifier separates different classes. In this article, we’ll explore how to draw decision boundaries in R with practical examples.

Understanding Decision Boundaries

Decision boundaries are the lines, surfaces, or hyperplanes that separate different classes in the feature space. For binary classification, decision boundaries are typically linear or nonlinear, depending on the complexity of the underlying data distribution and the chosen classification algorithm.

In R Programming Language we can draw decision boundaries using various packages, such as ggplot2, caret, and e1071. Let’s explore two popular methods: using built-in datasets and custom datasets.

Draw Decision Boundaries in R Using Built-in Datasets (iris):

We’ll use the classic iris dataset, which contains measurements of iris flowers, to visualize decision boundaries for a simple classification task.

# Load required library
library(ggplot2)

# Load iris dataset
data(iris)

# Plot decision boundaries using ggplot2
ggplot(iris, aes(x = Petal.Length, y = Petal.Width, color = Species)) +
  geom_point() +
  geom_smooth(method = "glm", method.args = list(family = "binomial")) +
  labs(title = "Decision Boundaries for Iris Dataset",
       x = "Petal Length", y = "Petal Width")

Output:

Draw Decision Boundaries in R

The provided R code generates a plot depicting decision boundaries for the Iris dataset using ggplot2, a popular data visualization package in R. Here’s a detailed explanation of the output:

Data Visualization: The plot visualizes the Iris dataset, specifically focusing on the relationship between petal length (Petal.Length) and petal width (Petal.Width).
Scatter Plot: Each data point in the plot represents an observation from the Iris dataset. The x-axis represents petal length, while the y-axis represents petal width. Data points are colored according to the species of iris they belong to (Species), facilitating easy identification of different species.
Smoothed Lines: In addition to the scatter plot, smoothed lines are added using geom_smooth. These lines represent the decision boundaries generated by fitting a generalized linear model (GLM) with a binomial family. In essence, these lines attempt to capture the separation between different species based on the given petal length and width.
Title and Axis Labels: The plot is titled “Decision Boundaries for Iris Dataset,” providing context for what is being visualized. The x-axis is labeled as “Petal Length,” and the y-axis is labeled as “Petal Width,” clearly indicating the variables being plotted.

This visualization is useful for understanding how well a simple model, like logistic regression, can separate different species of iris based on petal measurements. It provides a visual representation of the decision boundaries, which can aid in model interpretation and evaluation.

Draw Decision Boundaries in R with Logistic Regression

We’ll generate synthetic data for a binary classification task and visualize decision boundaries using the svm function from the e1071 package.

# Generate synthetic data
set.seed(123)
num_samples <- 200
class1 <- data.frame(x = rnorm(num_samples, mean = 2), y = rnorm(num_samples, mean = 2))
class2 <- data.frame(x = rnorm(num_samples, mean = -2), y = rnorm(num_samples, mean = -2))
data <- rbind(class1, class2)
# Assign binary labels: 0 for one class and 1 for the other
labels <- c(rep(0, num_samples), rep(1, num_samples))

# Train logistic regression model
logit_model <- glm(labels ~ ., data = data, family = binomial(link = "logit"))

# Create grid
x_range <- range(data$x)
y_range <- range(data$y)
x_vals <- seq(x_range[1], x_range[2], length.out = 100)
y_vals <- seq(y_range[1], y_range[2], length.out = 100)
grid <- expand.grid(x = x_vals, y = y_vals)

# Predict on grid
predictions <- predict(logit_model, newdata = grid, type = "response")

# Plot decision boundaries
library(ggplot2)
ggplot(data, aes(x = x, y = y, color = factor(labels))) +
  geom_point() +
  geom_contour(data = grid, aes(z = predictions), color = "black") +
  scale_color_manual(values = c("blue", "red"), breaks = c(0, 1)) +
  labs(title = "Decision Boundaries with Logistic Regression",
       x = "X", y = "Y", color = "Class") +
  theme_minimal()

Output:

Draw Decision Boundaries in R

This code generates a visualization showing the decision boundaries produced by a logistic regression model trained on the synthetic data. The decision boundaries separate the two classes based on their x and y coordinates. Blue and red colors represent the two classes, respectively.

Conclusion

Drawing decision boundaries in R is crucial for understanding how classifiers make predictions and how they generalize to new data. By visualizing decision boundaries using built-in or custom datasets, we gain insights into the model’s behavior and performance. Whether it’s a simple linear decision boundary or a complex nonlinear boundary, R provides powerful tools and libraries to create informative visualizations for machine learning tasks. With the techniques demonstrated in this guide, you’re equipped to explore decision boundaries and gain deeper understanding in your own classification projects using R.

Tags:

#AI-ML-DS With R #Data Science Blogathon 2024 #AI-ML-DS #Blogathon #R Machine Learning

How to shade a graph in R?