Agglomerative clustering with and without structure
The AgglomerativeClustering class in Scikit-Learn provides two algorithms for hierarchical clustering: ward and complete. The ward algorithm is an agglomerative clustering algorithm that uses Ward’s method to merge the clusters. Ward’s method is a variance-based method that aims to minimize the total within-cluster variance.
The complete algorithm is an agglomerative clustering algorithm that uses the maximum or complete linkage method to merge the clusters. The maximum or complete linkage method is a distance-based method that measures the distance between the farthest points in the clusters.
The ward algorithm is useful when the data points have a clear structure, such as when the data points form clusters with a circular or spherical shape. The complete algorithm is useful when the data points do not have a clear structure and the clusters are more elongated or irregular in shape. Here is an example to illustrate the differences between the ward and complete algorithms.
Python3
# Import the necessary modules from sklearn.datasets import make_circles from sklearn.cluster import AgglomerativeClustering # Generate the data data, _ = make_circles(n_samples = 1000 , noise = 0.05 , random_state = 0 ) |
Next, we can use the AgglomerativeClustering class to perform hierarchical clustering on the data using the ward and complete algorithms. We can create two instances of the AgglomerativeClustering class, one for the ward algorithm and the other for the complete algorithm. Here is the code to perform hierarchical clustering on the data using the ward and complete algorithms:
Python3
# Create an instance of the AgglomerativeClustering # class for the ward algorithm ward = AgglomerativeClustering(n_clusters = 2 , affinity = "euclidean" , linkage = "ward" ) # Fit the ward algorithm to the data ward.fit(data) # Create an instance of the AgglomerativeClustering # class for the complete algorithm complete = AgglomerativeClustering(n_clusters = 2 , affinity = "euclidean" , linkage = "complete" ) # Fit the complete algorithm to the data complete.fit(data) |
Once the ward and complete algorithms are fitted to the data, we can use the labels_ attribute of the ward and complete objects to get the cluster labels for each data point. Here is the code to get the cluster labels for each data point using the ward and complete algorithms:
Python3
# Get the cluster labels for each # data point using the ward algorithm ward_labels = ward.labels_ # Get the cluster labels for each # data point using the complete algorithm complete_labels = complete.labels_ |
We can then plot the data points and the cluster labels to visualize the results of the ward and complete algorithms. Here is the code to plot the data points and the cluster labels using the ward and complete algorithms:
Python3
# Import the pyplot module import matplotlib.pyplot as plt # Plot the data points and the cluster # labels using the ward algorithm plt.scatter(data[:, 0 ], data[:, 1 ], c = ward_labels, cmap = "Paired" ) plt.title( "Ward" ) plt.show() # Plot the data points and the cluster # labels using the complete algorithm plt.scatter(data[:, 0 ], data[:, 1 ], c = complete_labels, cmap = "Paired" ) plt.title( "Complete" ) plt.show() |
Output:
The plot generated by the ward algorithm shows that the algorithm has successfully identified the circular clusters in the data. The plot generated by the complete algorithm shows that the algorithm has not been able to identify the circular clusters and instead has grouped the data points into two elongated clusters.
This example shows the differences between the ward and complete algorithms for agglomerative clustering. The ward algorithm is better suited for data with a clear structure, such as circular or spherical clusters, whereas the complete algorithm is better suited for data without a clear structure, such as elongated or irregular clusters.
In summary, hierarchical clustering is a type of clustering algorithm that is used to group similar data points into clusters. It is a bottom-up approach that starts by treating each data point as a single cluster and then merges the closest pair of clusters until all the data points are grouped into a single cluster or a pre-defined number of clusters.
Scikit-Learn is a popular machine-learning library for Python that provides a wide range of clustering algorithms, including hierarchical clustering. The AgglomerativeClustering class in Scikit-Learn provides two algorithms for hierarchical clustering: ward and complete. The ward algorithm is an agglomerative clustering algorithm that uses Ward’s method to merge the clusters and is useful for data with a clear structure. The complete algorithm is an agglomerative clustering algorithm that uses the maximum or complete linkage method to merge the clusters and is useful for data without a clear structure.
Agglomerative clustering with and without structure in Scikit Learn
Agglomerative clustering is a hierarchical clustering algorithm that is used to group similar data points into clusters. It is a bottom-up approach that starts by treating each data point as a single cluster and then merges the closest pair of clusters until all the data points are grouped into a single cluster or a pre-defined number of clusters.
In this blog, we will discuss how to perform agglomerative clustering in Scikit-Learn, a popular machine-learning library for Python. We will also discuss the differences between agglomerative clustering with and without structure.
Before diving into the details of agglomerative clustering in Scikit-Learn, let’s first understand the basics of hierarchical clustering and how it works.
Contact Us