Steps Required in Projected Clustering

What is Projected Clustering

Python Implementation of Projected Clustering

Step 1 Data Reading and Preprocessing – At first we read data coming from different sources and apply preprocessing on it like filling missing values, standardizing the training data, and handling categorical columns.
Step 2 Dimension Reduction – We will choose the subset of dimension in which we want to project our dataset using techniques like Principal Component Analysis (PCA) to project the high-dimensional data into a lower-dimensional space.
Step 3 Clustering in subspace – We will apply traditional clustering algorithms like k-means to cluster our data from lower dimensional space.
Step 4 Evaluating the clustering – At the end we will evaluate our clusters centroids and data points to check if the data points are clustered into the correct group.

Input and Output for Projected Clustering:

Input –

The group of data points.
A number of clusters, indicated by k.
The average number of dimensions for each cluster is indicated by L.

Output –

The clusters identified, and the dimensions esteemed to such clusters.

Projected clustering in data analytics

We already know about traditional clustering algorithms like k-means, DBSCAN, or hierarchical clustering that operate on all the dimensions of the data simultaneously. However, in high-dimensional data, clusters might only be present in a few dimensions, making the traditional clustering algorithms less effective. In this case, we use projected clustering.

Steps Required in Projected Clustering

Input and Output for Projected Clustering:

Projected clustering in data analytics

Similar Reads

Contact Us