Steps Required in Projected Clustering
- Step 1 Data Reading and Preprocessing – At first we read data coming from different sources and apply preprocessing on it like filling missing values, standardizing the training data, and handling categorical columns.
- Step 2 Dimension Reduction – We will choose the subset of dimension in which we want to project our dataset using techniques like Principal Component Analysis (PCA) to project the high-dimensional data into a lower-dimensional space.
- Step 3 Clustering in subspace – We will apply traditional clustering algorithms like k-means to cluster our data from lower dimensional space.
- Step 4 Evaluating the clustering – At the end we will evaluate our clusters centroids and data points to check if the data points are clustered into the correct group.
Input and Output for Projected Clustering:
Input –
- The group of data points.
- A number of clusters, indicated by k.
- The average number of dimensions for each cluster is indicated by L.
Output –
- The clusters identified, and the dimensions esteemed to such clusters.
Projected clustering in data analytics
We already know about traditional clustering algorithms like k-means, DBSCAN, or hierarchical clustering that operate on all the dimensions of the data simultaneously. However, in high-dimensional data, clusters might only be present in a few dimensions, making the traditional clustering algorithms less effective. In this case, we use projected clustering.
Contact Us