Removing Non-Numeric Labels
R
# Remove the non-numeric column (Gene names) temporarily gene_names <- gene_data$Gene gene_data <- gene_data[, -1] print (gene_data) |
Output:
Sample1 Sample2 Sample3 Sample4 Sample5
1 2.3 2.1 2.2 2.4 2.0
2 1.8 1.7 1.9 1.6 1.5
3 3.2 3.0 3.1 3.3 3.4
4 0.9 1.0 0.8 0.7 0.6
5 2.5 2.4 2.6 2.3 2.7
- In many datasets, the first column often contains labels or identifiers, such as gene names or sample IDs. These non-numeric columns are essential for understanding the data but can interfere with mathematical operations like distance calculations.
- To perform distance calculations correctly, we need to exclude these non-numeric columns. In this code, we store the gene names in the variable ‘gene_names’ for later use and remove the non-numeric column from the ‘gene_data DataFrame’.
- Removing the non-numeric column temporarily allows us to calculate distances without interference from these labels.
Creating Heatmaps with Hierarchical Clustering
Before diving into our actual topic, let’s have an understanding of Heatmaps and Hierarchical Clustering.
Contact Us