Python Implementation of Projected Clustering
For implementing projected clustering in Python with a dataset having 20 rows we will first. We will first apply PCA(principal component analysis) to reduce the dimension of the dataset from 20 rows to 2 rows. After reducing the dimension of the dataset we will apply the k-means clustering algorithm on the dataset to cluster the data points.
Python3
import numpy as np from sklearn.decomposition import PCA from sklearn.cluster import KMeans import matplotlib.pyplot as plt # Generate example high-dimensional data np.random.seed( 0 ) num_samples = 1000 num_dimensions = 20 data = np.random.randn(num_samples, num_dimensions) # Dimensionality reduction using PCA num_selected_dimensions = 2 pca = PCA(n_components = num_selected_dimensions) projected_data = pca.fit_transform(data) # Perform k-means clustering on the projected data num_clusters = 3 kmeans = KMeans(n_clusters = num_clusters, random_state = 0 ) kmeans.fit(projected_data) cluster_labels = kmeans.labels_ # Plot the clusters plt.scatter(projected_data[:, 0 ], projected_data[:, 1 ], c = cluster_labels) plt.title( "Projected Clustering using K-means" ) plt.xlabel( "Dimension 1" ) plt.ylabel( "Dimension 2" ) plt.show() |
Output:
Projected clustering in data analytics
We already know about traditional clustering algorithms like k-means, DBSCAN, or hierarchical clustering that operate on all the dimensions of the data simultaneously. However, in high-dimensional data, clusters might only be present in a few dimensions, making the traditional clustering algorithms less effective. In this case, we use projected clustering.
Contact Us