Removing Non-Numeric Labels

Calculating Distances and Performing Hierarchical Clustering

R

# Remove the non-numeric column (Gene names) temporarily
gene_names <- gene_data$Gene
gene_data <- gene_data[, -1]
print(gene_data)

Output:

 Sample1 Sample2 Sample3 Sample4 Sample5
1     2.3     2.1     2.2     2.4     2.0
2     1.8     1.7     1.9     1.6     1.5
3     3.2     3.0     3.1     3.3     3.4
4     0.9     1.0     0.8     0.7     0.6
5     2.5     2.4     2.6     2.3     2.7

In many datasets, the first column often contains labels or identifiers, such as gene names or sample IDs. These non-numeric columns are essential for understanding the data but can interfere with mathematical operations like distance calculations.
To perform distance calculations correctly, we need to exclude these non-numeric columns. In this code, we store the gene names in the variable ‘gene_names’ for later use and remove the non-numeric column from the ‘gene_data DataFrame’.
Removing the non-numeric column temporarily allows us to calculate distances without interference from these labels.

Creating Heatmaps with Hierarchical Clustering

Before diving into our actual topic, let’s have an understanding of Heatmaps and Hierarchical Clustering.

Removing Non-Numeric Labels

R

Creating Heatmaps with Hierarchical Clustering

Similar Reads

Contact Us