Create and Train the k-NN Model

Now, it’s time to create and train our k-NN model. We’ll use the `nearest_neighbor()` function from the `parsnip` package, which is part of tidymodels.

R

# Create a k-NN model specification
knn_spec <- train(
  Species ~ .,
  data = iris,
  method = "kknn",
  trControl = trainControl(method = "cv", number = 5, verboseIter = TRUE),
  tuneLength = 5
)

Output:

+ Fold1: kmax= 5, distance=2, kernel=optimal 
- Fold1: kmax= 5, distance=2, kernel=optimal 
+ Fold1: kmax= 7, distance=2, kernel=optimal 
- Fold1: kmax= 7, distance=2, kernel=optimal 
+ Fold1: kmax= 9, distance=2, kernel=optimal 
- Fold1: kmax= 9, distance=2, kernel=optimal 
+ Fold1: kmax=11, distance=2, kernel=optimal 
- Fold1: kmax=11, distance=2, kernel=optimal 
+ Fold1: kmax=13, distance=2, kernel=optimal 
- Fold1: kmax=13, distance=2, kernel=optimal 
+ Fold2: kmax= 5, distance=2, kernel=optimal 
- Fold2: kmax= 5, distance=2, kernel=optimal 
+ Fold2: kmax= 7, distance=2, kernel=optimal 
- Fold2: kmax= 7, distance=2, kernel=optimal 
+ Fold2: kmax= 9, distance=2, kernel=optimal 
- Fold2: kmax= 9, distance=2, kernel=optimal 
+ Fold2: kmax=11, distance=2, kernel=optimal 
- Fold2: kmax=11, distance=2, kernel=optimal 
+ Fold2: kmax=13, distance=2, kernel=optimal 
- Fold2: kmax=13, distance=2, kernel=optimal 
+ Fold3: kmax= 5, distance=2, kernel=optimal 
- Fold3: kmax= 5, distance=2, kernel=optimal 
+ Fold3: kmax= 7, distance=2, kernel=optimal 
- Fold3: kmax= 7, distance=2, kernel=optimal 
+ Fold3: kmax= 9, distance=2, kernel=optimal 
- Fold3: kmax= 9, distance=2, kernel=optimal 
+ Fold3: kmax=11, distance=2, kernel=optimal 
- Fold3: kmax=11, distance=2, kernel=optimal 
+ Fold3: kmax=13, distance=2, kernel=optimal 
- Fold3: kmax=13, distance=2, kernel=optimal 
+ Fold4: kmax= 5, distance=2, kernel=optimal 
- Fold4: kmax= 5, distance=2, kernel=optimal 
+ Fold4: kmax= 7, distance=2, kernel=optimal 
- Fold4: kmax= 7, distance=2, kernel=optimal 
+ Fold4: kmax= 9, distance=2, kernel=optimal 
- Fold4: kmax= 9, distance=2, kernel=optimal 
+ Fold4: kmax=11, distance=2, kernel=optimal 
- Fold4: kmax=11, distance=2, kernel=optimal 
+ Fold4: kmax=13, distance=2, kernel=optimal 
- Fold4: kmax=13, distance=2, kernel=optimal 
+ Fold5: kmax= 5, distance=2, kernel=optimal 
- Fold5: kmax= 5, distance=2, kernel=optimal 
+ Fold5: kmax= 7, distance=2, kernel=optimal 
- Fold5: kmax= 7, distance=2, kernel=optimal 
+ Fold5: kmax= 9, distance=2, kernel=optimal 
- Fold5: kmax= 9, distance=2, kernel=optimal 
+ Fold5: kmax=11, distance=2, kernel=optimal 
- Fold5: kmax=11, distance=2, kernel=optimal 
+ Fold5: kmax=13, distance=2, kernel=optimal 
- Fold5: kmax=13, distance=2, kernel=optimal 
Aggregating results
Selecting tuning parameters
Fitting kmax = 9, distance = 2, kernel = optimal on full training set

R

# Print the model
print(knn_spec)

Output:

k-Nearest Neighbors 
150 samples
  4 predictor
  3 classes: 'setosa', 'versicolor', 'virginica' 
No pre-processing
Resampling: Cross-Validated (5 fold) 
Summary of sample sizes: 120, 120, 120, 120, 120 
Resampling results across tuning parameters:
  kmax  Accuracy   Kappa
   5    0.9466667  0.92 
   7    0.9533333  0.93 
   9    0.9533333  0.93 
  11    0.9466667  0.92 
  13    0.9466667  0.92 
Tuning parameter 'distance' was held constant at a value of 2
Tuning
 parameter 'kernel' was held constant at a value of optimal
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were kmax = 9, distance = 2 and kernel
 = optimal.

knn_spec <- train(…): This line creates a k-NN (k-nearest neighbors) model specification using the `train` function from the `caret` package. The `train` function is used for training various machine learning models.
Species ~ .: This formula specifies the target variable (Species) and the predictors (all other columns denoted by `.`) to be used in the model.
data = iris: This specifies the dataset to be used, in this case, the Iris dataset loaded using `data(iris)`.
method = “kknn”: Here, we specify that we want to use the “kknn” model, which stands for kernel k-nearest neighbors. This is a variation of the k-NN algorithm that uses kernel density estimation to make predictions.
trControl = trainControl(…): This part sets the control parameters for the training process. It specifies that we want to perform cross-validation (`method = “cv”`) with 5 folds (`number = 5`) and requests verbose output during the training process (`verboseIter = TRUE`).
tuneLength = 5: This parameter specifies the number of neighbors (`k`) to try during cross-validation. In this case, we are trying five different values of `k` to determine which one provides the best model performance.
print(knn_spec): Finally, we print the k-NN model specification to the console. This provides information about the model, including the method used, the tuning parameters, and other details.

This code sets up a k-NN classification model using the “kknn” method, performs cross-validation with different values of `k`, and prints information about the model specification. It’s a common practice in machine learning to explore different hyperparameters (like `k` in k-NN) to find the best model for a given problem. The resulting `knn_fit` will contain the trained and tuned k-NN model.

Predictions Multiple outcomes with KNN Model Using tidymodels

When dealing with classification problems that involve multiple classes or outcomes, it’s essential to have a reliable method for making predictions. One popular algorithm for such tasks is k-Nearest Neighbors (k-NN). In this tutorial, we will walk you through the process of making predictions with multiple outcomes using a k-NN model in R, specifically with the tidymodels framework.

K-Nearest Neighbors (KNN) is a simple yet effective supervised machine learning algorithm used for classification and regression tasks. Here’s an explanation of KNN and some of its benefits:

Create and Train the k-NN Model

R

R

Predictions Multiple outcomes with KNN Model Using tidymodels

Similar Reads

Contact Us