Agglomerative clustering with and without structure

The AgglomerativeClustering class in Scikit-Learn provides two algorithms for hierarchical clustering: ward and complete. The ward algorithm is an agglomerative clustering algorithm that uses Ward’s method to merge the clusters. Ward’s method is a variance-based method that aims to minimize the total within-cluster variance.

The complete algorithm is an agglomerative clustering algorithm that uses the maximum or complete linkage method to merge the clusters. The maximum or complete linkage method is a distance-based method that measures the distance between the farthest points in the clusters.

The ward algorithm is useful when the data points have a clear structure, such as when the data points form clusters with a circular or spherical shape. The complete algorithm is useful when the data points do not have a clear structure and the clusters are more elongated or irregular in shape. Here is an example to illustrate the differences between the ward and complete algorithms.

Python3




# Import the necessary modules
from sklearn.datasets import make_circles
from sklearn.cluster import AgglomerativeClustering
  
# Generate the data
data, _ = make_circles(n_samples=1000,
                       noise=0.05,
                       random_state=0)


Next, we can use the AgglomerativeClustering class to perform hierarchical clustering on the data using the ward and complete algorithms. We can create two instances of the AgglomerativeClustering class, one for the ward algorithm and the other for the complete algorithm. Here is the code to perform hierarchical clustering on the data using the ward and complete algorithms:

Python3




# Create an instance of the AgglomerativeClustering
# class for the ward algorithm
ward = AgglomerativeClustering(n_clusters=2,
                               affinity="euclidean",
                               linkage="ward")
  
# Fit the ward algorithm to the data
ward.fit(data)
  
# Create an instance of the AgglomerativeClustering
# class for the complete algorithm
complete = AgglomerativeClustering(n_clusters=2,
                                   affinity="euclidean",
                                   linkage="complete")
  
# Fit the complete algorithm to the data
complete.fit(data)


Once the ward and complete algorithms are fitted to the data, we can use the labels_ attribute of the ward and complete objects to get the cluster labels for each data point. Here is the code to get the cluster labels for each data point using the ward and complete algorithms:

Python3




# Get the cluster labels for each 
# data point using the ward algorithm
ward_labels = ward.labels_
  
# Get the cluster labels for each 
# data point using the complete algorithm
complete_labels = complete.labels_


We can then plot the data points and the cluster labels to visualize the results of the ward and complete algorithms. Here is the code to plot the data points and the cluster labels using the ward and complete algorithms:

Python3




# Import the pyplot module
import matplotlib.pyplot as plt
  
# Plot the data points and the cluster
# labels using the ward algorithm
plt.scatter(data[:, 0], data[:, 1],
            c=ward_labels, cmap="Paired")
plt.title("Ward")
plt.show()
  
# Plot the data points and the cluster
# labels using the complete algorithm
plt.scatter(data[:, 0], data[:, 1],
            c=complete_labels, cmap="Paired")
plt.title("Complete")
plt.show()


Output:

Clusters formed by using Agglomerative Clustering

The plot generated by the ward algorithm shows that the algorithm has successfully identified the circular clusters in the data. The plot generated by the complete algorithm shows that the algorithm has not been able to identify the circular clusters and instead has grouped the data points into two elongated clusters.

This example shows the differences between the ward and complete algorithms for agglomerative clustering. The ward algorithm is better suited for data with a clear structure, such as circular or spherical clusters, whereas the complete algorithm is better suited for data without a clear structure, such as elongated or irregular clusters.

In summary, hierarchical clustering is a type of clustering algorithm that is used to group similar data points into clusters. It is a bottom-up approach that starts by treating each data point as a single cluster and then merges the closest pair of clusters until all the data points are grouped into a single cluster or a pre-defined number of clusters.

Scikit-Learn is a popular machine-learning library for Python that provides a wide range of clustering algorithms, including hierarchical clustering. The AgglomerativeClustering class in Scikit-Learn provides two algorithms for hierarchical clustering: ward and complete. The ward algorithm is an agglomerative clustering algorithm that uses Ward’s method to merge the clusters and is useful for data with a clear structure. The complete algorithm is an agglomerative clustering algorithm that uses the maximum or complete linkage method to merge the clusters and is useful for data without a clear structure.

Agglomerative clustering with and without structure in Scikit Learn

Agglomerative clustering is a hierarchical clustering algorithm that is used to group similar data points into clusters. It is a bottom-up approach that starts by treating each data point as a single cluster and then merges the closest pair of clusters until all the data points are grouped into a single cluster or a pre-defined number of clusters.

In this blog, we will discuss how to perform agglomerative clustering in Scikit-Learn, a popular machine-learning library for Python. We will also discuss the differences between agglomerative clustering with and without structure.

Before diving into the details of agglomerative clustering in Scikit-Learn, let’s first understand the basics of hierarchical clustering and how it works.

Similar Reads

What is Hierarchical Clustering?

Hierarchical clustering is a type of clustering algorithm that is used to group similar data points into clusters. It is a bottom-up approach that starts by treating each data point as a single cluster and then merges the closest pair of clusters until all the data points are grouped into a single cluster or a pre-defined number of clusters....

How does Hierarchical Clustering work?

The hierarchical clustering algorithm is an iterative algorithm that starts by treating each data point as a single cluster. In each iteration, the algorithm identifies the pair of clusters that are closest to each other and then merges them into a single cluster. This process continues until all the data points are grouped into a single cluster or a pre-defined number of clusters....

How to perform hierarchical clustering in Scikit-Learn?

Scikit-Learn is a popular machine-learning library for Python that provides a wide range of clustering algorithms, including hierarchical clustering. In this section, we will discuss how to perform hierarchical clustering in Scikit-Learn using the AgglomerativeClustering class....

Agglomerative clustering with and without structure

...

Differences between hierarchical clustering with and without structure

...

Conclusion

...

Contact Us