Python Implementation of Projected Clustering

For implementing projected clustering in Python with a dataset having 20 rows we will first. We will first apply PCA(principal component analysis) to reduce the dimension of the dataset from 20 rows to 2 rows. After reducing the dimension of the dataset we will apply the k-means clustering algorithm on the dataset to cluster the data points.

Python3




import numpy as np
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
 
# Generate example high-dimensional data
np.random.seed(0)
num_samples = 1000
num_dimensions = 20
data = np.random.randn(num_samples, num_dimensions)
 
# Dimensionality reduction using PCA
num_selected_dimensions = 2
pca = PCA(n_components=num_selected_dimensions)
projected_data = pca.fit_transform(data)
 
# Perform k-means clustering on the projected data
num_clusters = 3
kmeans = KMeans(n_clusters=num_clusters, random_state=0)
kmeans.fit(projected_data)
cluster_labels = kmeans.labels_
 
# Plot the clusters
plt.scatter(projected_data[:, 0], projected_data[:, 1], c=cluster_labels)
plt.title("Projected Clustering using K-means")
plt.xlabel("Dimension 1")
plt.ylabel("Dimension 2")
plt.show()


Output:

Projected clustering k-means 



Projected clustering in data analytics

We already know about traditional clustering algorithms like k-means, DBSCAN, or hierarchical clustering that operate on all the dimensions of the data simultaneously. However, in high-dimensional data, clusters might only be present in a few dimensions, making the traditional clustering algorithms less effective. In this case, we use projected clustering.

Similar Reads

What is Projected Clustering

Projected clustering, also known as subspace clustering, is a technique that is used to identify clusters in high-dimensional data by considering subsets of dimensions or projections of the data into lower dimensions. The projected clustering algorithm is based on the concept of k-medoid clustering, which was presented by Aggarwal (1999)....

Steps Required in Projected Clustering

Step 1 Data Reading and Preprocessing – At first we read data coming from different sources and apply preprocessing on it like filling missing values, standardizing the training data, and handling categorical columns.  Step 2 Dimension Reduction – We will choose the subset of dimension in which we want to project our dataset using techniques like Principal Component Analysis (PCA) to project the high-dimensional data into a lower-dimensional space. Step 3 Clustering in subspace – We will apply traditional clustering algorithms like k-means to cluster our data from lower dimensional space.  Step 4  Evaluating the clustering – At the end we will evaluate our clusters centroids and data points to check if the data points are clustered into the correct group....

Python Implementation of Projected Clustering

For implementing projected clustering in Python with a dataset having 20 rows we will first. We will first apply PCA(principal component analysis) to reduce the dimension of the dataset from 20 rows to 2 rows. After reducing the dimension of the dataset we will apply the k-means clustering algorithm on the dataset to cluster the data points....

Contact Us