Dendrograms: Visualizing Hierarchical Clustering
A dendrogram is a tree-like diagram that shows the arrangement of clusters produced by hierarchical clustering. It provides a visual representation of the merging process and helps in determining the optimal number of clusters.
How to Read a Dendrogram?
- Leaves: Represent individual data points.
- Branches: Represent clusters formed by merging data points or other clusters.
- Height: Represents the distance or dissimilarity between clusters. The higher the branch, the more dissimilar the clusters.
Implementing Hierarchical Clustering with Scikit-Learn
Scikit-Learn provides a straightforward implementation of hierarchical clustering through the AgglomerativeClustering
class. Letâs walk through the steps to implement hierarchical clustering using Scikit-Learn.
Step 1: Import Libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import dendrogram, linkage
Step 2: Generate Sample Data
For demonstration purposes, we will generate synthetic data using the make_blobs function.
# Generate sample data
X, y = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)
Step 3: Perform Agglomerative Clustering
# Perform agglomerative clustering
agg_clustering = AgglomerativeClustering(n_clusters=4)
y_pred = agg_clustering.fit_predict(X)
Step 4: Plot the Clusters
# Plot the clusters
plt.scatter(X[:, 0], X[:, 1], c=y_pred, cmap='rainbow')
plt.title('Agglomerative Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()
Output:
Step 5: Plot the Dendrogram
To plot the dendrogram, we need to use the linkage function from the scipy.cluster.hierarchy module.
# Generate the linkage matrix
Z = linkage(X, method='ward')
# Plot the dendrogram
plt.figure(figsize=(10, 7))
dendrogram(Z)
plt.title('Dendrogram')
plt.xlabel('Sample Index')
plt.ylabel('Distance')
plt.show()
Output:
Hierarchical Clustering with Scikit-Learn
Hierarchical clustering is a popular method in data science for grouping similar data points into clusters. Unlike other clustering techniques like K-means, hierarchical clustering does not require the number of clusters to be specified in advance. Instead, it builds a hierarchy of clusters that can be visualized as a dendrogram. In this article, we will explore hierarchical clustering using Scikit-Learn, a powerful Python library for machine learning.
Table of Content
- Introduction to Hierarchical Clustering
- How Hierarchical Clustering Works?
- Dendrograms: Visualizing Hierarchical Clustering
- How to Read a Dendrogram?
- Implementing Hierarchical Clustering with Scikit-Learn
- Advantages and Disadvantages of Hierarchical Clustering
Contact Us