Model Development and Evaluation
Now by using the sklearn library implementation of the KNN algorithm we will train a model on that. Also after the training purpose, we will evaluate our model by using the confusion matrix and classification report.
Python3
from sklearn.metrics import classification_report,\ confusion_matrix from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import train_test_split X_train, X_test,\ y_train, y_test = train_test_split(scaled_features, df[ 'Taregt' ], test_size = 0.30 ) # Remember that we are trying to come up # with a model to predict whether # someone will Target or not. # We'll start with k = 1. knn = KNeighborsClassifier(n_neighbors = 1 ) knn.fit(X_train, y_train) pred = knn.predict(X_test) # Predictions and Evaluations # Let's evaluate our KNN model ! print (confusion_matrix(y_test, pred)) print (classification_report(y_test, pred)) |
Output:
[[19 3] [ 2 6]] precision recall f1-score support 0 0.90 0.86 0.88 22 1 0.67 0.75 0.71 8 accuracy 0.83 30 macro avg 0.79 0.81 0.79 30 weighted avg 0.84 0.83 0.84 30
K Nearest Neighbors with Python | ML
K-Nearest Neighbors is one of the most basic yet essential classification algorithms in Machine Learning. It belongs to the supervised learning domain and finds intense application in pattern recognition, data mining, and intrusion detection. The K-Nearest Neighbors (KNN) algorithm is a simple, easy-to-implement supervised machine learning algorithm that can be used to solve both classification and regression problems. The KNN algorithm assumes that similar things exist in close proximity. In other words, similar things are near to each other. KNN captures the idea of similarity (sometimes called distance, proximity, or closeness) with some mathematics we might have learned in our childhood— calculating the distance between points on a graph. There are other ways of calculating distance, which might be preferable depending on the problem we are solving. However, the straight-line distance (also called the Euclidean distance) is a popular and familiar choice. It is widely disposable in real-life scenarios since it is non-parametric, meaning, it does not make any underlying assumptions about the distribution of data (as opposed to other algorithms such as GMM, which assume a Gaussian distribution of the given data). This article illustrates K-nearest neighbors on a sample random data using sklearn library.
Contact Us