Create and Train the k-NN Model
Now, it’s time to create and train our k-NN model. We’ll use the `nearest_neighbor()` function from the `parsnip` package, which is part of tidymodels.
R
# Create a k-NN model specification knn_spec <- train ( Species ~ ., data = iris, method = "kknn" , trControl = trainControl (method = "cv" , number = 5, verboseIter = TRUE ), tuneLength = 5 ) |
Output:
+ Fold1: kmax= 5, distance=2, kernel=optimal
- Fold1: kmax= 5, distance=2, kernel=optimal
+ Fold1: kmax= 7, distance=2, kernel=optimal
- Fold1: kmax= 7, distance=2, kernel=optimal
+ Fold1: kmax= 9, distance=2, kernel=optimal
- Fold1: kmax= 9, distance=2, kernel=optimal
+ Fold1: kmax=11, distance=2, kernel=optimal
- Fold1: kmax=11, distance=2, kernel=optimal
+ Fold1: kmax=13, distance=2, kernel=optimal
- Fold1: kmax=13, distance=2, kernel=optimal
+ Fold2: kmax= 5, distance=2, kernel=optimal
- Fold2: kmax= 5, distance=2, kernel=optimal
+ Fold2: kmax= 7, distance=2, kernel=optimal
- Fold2: kmax= 7, distance=2, kernel=optimal
+ Fold2: kmax= 9, distance=2, kernel=optimal
- Fold2: kmax= 9, distance=2, kernel=optimal
+ Fold2: kmax=11, distance=2, kernel=optimal
- Fold2: kmax=11, distance=2, kernel=optimal
+ Fold2: kmax=13, distance=2, kernel=optimal
- Fold2: kmax=13, distance=2, kernel=optimal
+ Fold3: kmax= 5, distance=2, kernel=optimal
- Fold3: kmax= 5, distance=2, kernel=optimal
+ Fold3: kmax= 7, distance=2, kernel=optimal
- Fold3: kmax= 7, distance=2, kernel=optimal
+ Fold3: kmax= 9, distance=2, kernel=optimal
- Fold3: kmax= 9, distance=2, kernel=optimal
+ Fold3: kmax=11, distance=2, kernel=optimal
- Fold3: kmax=11, distance=2, kernel=optimal
+ Fold3: kmax=13, distance=2, kernel=optimal
- Fold3: kmax=13, distance=2, kernel=optimal
+ Fold4: kmax= 5, distance=2, kernel=optimal
- Fold4: kmax= 5, distance=2, kernel=optimal
+ Fold4: kmax= 7, distance=2, kernel=optimal
- Fold4: kmax= 7, distance=2, kernel=optimal
+ Fold4: kmax= 9, distance=2, kernel=optimal
- Fold4: kmax= 9, distance=2, kernel=optimal
+ Fold4: kmax=11, distance=2, kernel=optimal
- Fold4: kmax=11, distance=2, kernel=optimal
+ Fold4: kmax=13, distance=2, kernel=optimal
- Fold4: kmax=13, distance=2, kernel=optimal
+ Fold5: kmax= 5, distance=2, kernel=optimal
- Fold5: kmax= 5, distance=2, kernel=optimal
+ Fold5: kmax= 7, distance=2, kernel=optimal
- Fold5: kmax= 7, distance=2, kernel=optimal
+ Fold5: kmax= 9, distance=2, kernel=optimal
- Fold5: kmax= 9, distance=2, kernel=optimal
+ Fold5: kmax=11, distance=2, kernel=optimal
- Fold5: kmax=11, distance=2, kernel=optimal
+ Fold5: kmax=13, distance=2, kernel=optimal
- Fold5: kmax=13, distance=2, kernel=optimal
Aggregating results
Selecting tuning parameters
Fitting kmax = 9, distance = 2, kernel = optimal on full training set
R
# Print the model print (knn_spec) |
Output:
k-Nearest Neighbors
150 samples
4 predictor
3 classes: 'setosa', 'versicolor', 'virginica'
No pre-processing
Resampling: Cross-Validated (5 fold)
Summary of sample sizes: 120, 120, 120, 120, 120
Resampling results across tuning parameters:
kmax Accuracy Kappa
5 0.9466667 0.92
7 0.9533333 0.93
9 0.9533333 0.93
11 0.9466667 0.92
13 0.9466667 0.92
Tuning parameter 'distance' was held constant at a value of 2
Tuning
parameter 'kernel' was held constant at a value of optimal
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were kmax = 9, distance = 2 and kernel
= optimal.
- knn_spec <- train(…): This line creates a k-NN (k-nearest neighbors) model specification using the `train` function from the `caret` package. The `train` function is used for training various machine learning models.
- Species ~ .: This formula specifies the target variable (Species) and the predictors (all other columns denoted by `.`) to be used in the model.
- data = iris: This specifies the dataset to be used, in this case, the Iris dataset loaded using `data(iris)`.
- method = “kknn”: Here, we specify that we want to use the “kknn” model, which stands for kernel k-nearest neighbors. This is a variation of the k-NN algorithm that uses kernel density estimation to make predictions.
- trControl = trainControl(…): This part sets the control parameters for the training process. It specifies that we want to perform cross-validation (`method = “cv”`) with 5 folds (`number = 5`) and requests verbose output during the training process (`verboseIter = TRUE`).
- tuneLength = 5: This parameter specifies the number of neighbors (`k`) to try during cross-validation. In this case, we are trying five different values of `k` to determine which one provides the best model performance.
- print(knn_spec): Finally, we print the k-NN model specification to the console. This provides information about the model, including the method used, the tuning parameters, and other details.
This code sets up a k-NN classification model using the “kknn” method, performs cross-validation with different values of `k`, and prints information about the model specification. It’s a common practice in machine learning to explore different hyperparameters (like `k` in k-NN) to find the best model for a given problem. The resulting `knn_fit` will contain the trained and tuned k-NN model.
Predictions Multiple outcomes with KNN Model Using tidymodels
When dealing with classification problems that involve multiple classes or outcomes, it’s essential to have a reliable method for making predictions. One popular algorithm for such tasks is k-Nearest Neighbors (k-NN). In this tutorial, we will walk you through the process of making predictions with multiple outcomes using a k-NN model in R, specifically with the tidymodels framework.
K-Nearest Neighbors (KNN) is a simple yet effective supervised machine learning algorithm used for classification and regression tasks. Here’s an explanation of KNN and some of its benefits:
Contact Us