Python Implementation of Projected Clustering

Steps Required in Projected Clustering

For implementing projected clustering in Python with a dataset having 20 rows we will first. We will first apply PCA(principal component analysis) to reduce the dimension of the dataset from 20 rows to 2 rows. After reducing the dimension of the dataset we will apply the k-means clustering algorithm on the dataset to cluster the data points.

Python3

import numpy as np
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
 
# Generate example high-dimensional data
np.random.seed(0)
num_samples = 1000
num_dimensions = 20
data = np.random.randn(num_samples, num_dimensions)
 
# Dimensionality reduction using PCA
num_selected_dimensions = 2
pca = PCA(n_components=num_selected_dimensions)
projected_data = pca.fit_transform(data)
 
# Perform k-means clustering on the projected data
num_clusters = 3
kmeans = KMeans(n_clusters=num_clusters, random_state=0)
kmeans.fit(projected_data)
cluster_labels = kmeans.labels_
 
# Plot the clusters
plt.scatter(projected_data[:, 0], projected_data[:, 1], c=cluster_labels)
plt.title("Projected Clustering using K-means")
plt.xlabel("Dimension 1")
plt.ylabel("Dimension 2")
plt.show()

Output:

Projected clustering k-means

Projected clustering in data analytics

We already know about traditional clustering algorithms like k-means, DBSCAN, or hierarchical clustering that operate on all the dimensions of the data simultaneously. However, in high-dimensional data, clusters might only be present in a few dimensions, making the traditional clustering algorithms less effective. In this case, we use projected clustering.

Python Implementation of Projected Clustering

Python3

Projected clustering in data analytics

Similar Reads

Contact Us