Analyzing Data in Subsets Using R

In this article, we will explore various methods to analyze data in subsets using R Programming Language.

How to analyze data in the subsets

Analyzing data encompasses employing diverse methodologies to acquire insights, recognize patterns, and draw significant conclusions from datasets. This encompasses activities such as computing summary statistics, visualizing data, and identifying trends within the dataset. R language offers various methods or functions to analyze data in the subsets. By using these methods, can work more efficiently. Some of the methods are:

Analyzing data in subsets by using subset() Function

subset(x, subset, select, . . . .)

This method is used to analyze the data present in the subsets. In the below example, we created a data frame and analyzed the data in the subsets.

# Example data
data <- data.frame(
  ID = 1:10,
  Category = rep(c("A", "B"), each = 5),
  Value = rnorm(10)

# Subsetting using subset() function
subset_A <- subset(data, Category == "A")
subset_B <- subset(data, Category == "B")

print("Analyzing the data in subsets")
print(subset_A)           # Print subsets


  ID Category      Value
1   1        A  1.5658719
2   2        A  0.3142731
3   3        A -1.4552153
4   4        A  0.9014216
5   5        A -0.2758858
6   6        B  1.3345081
7   7        B -1.0618629
8   8        B  1.1188082
9   9        B -1.3202145
10 10        B  1.2453632

[1] "Analyzing the data in subsets"
  ID Category      Value
1  1        A  1.5658719
2  2        A  0.3142731
3  3        A -1.4552153
4  4        A  0.9014216
5  5        A -0.2758858

   ID Category     Value
6   6        B  1.334508
7   7        B -1.061863
8   8        B  1.118808
9   9        B -1.320214
10 10        B  1.245363

# creating data frame
data <- data.frame(
  ID = 1:6,
  Name = rep(c("X", "Y"), each = 3),
  Value = rnorm(6)

# Subsetting using subset() function
subset_X <- subset(data, Name == "X")
subset_Y <- subset(data, Name == "Y")

print(" Analyzing the data in subsets")


 ID Name       Value
1  1    X -0.02737704
2  2    X  0.31270382
3  3    X -0.92980339
4  4    Y  0.43035869
5  5    Y  0.30612408
6  6    Y  0.89034199

[1] " Analyzing the data in subsets"
  ID Name       Value
1  1    X -0.02737704
2  2    X  0.31270382
3  3    X -0.92980339

  ID Name     Value
4  4    Y 0.4303587
5  5    Y 0.3061241
6  6    Y 0.8903420

Subsetting the data Frame

These method is used to analyze the data present in subsets. In the below example, we created a data frame and analyzed the data.

# Sample data frame
df <- data.frame(
  student_id = 1:10,
  test_score = c(80, 85, 90, 75, 95, 82, 78, 88, 92, 70),
  gender = c("M", "F", "M", "F", "M", "F", "M", "F", "M", "F")

# Subset of male students
male_students <- df[df$gender == "M", ]

print("Analyzing the data ")
# Summary statistics for male students


 student_id test_score gender
1          1         80      M
3          3         90      M
5          5         95      M
7          7         78      M
9          9         92      M

[1] "Analyzing the data "
      Min.  1st Qu.  Median    Mean  3rd Qu.    Max. 
       70.0    78.5    84.0         84.2     90.5    95.0 

# Sample sales data
sales_data <- data.frame(
  transaction_id = 1:24,
  product_category = rep(c("Electronics", "Clothing", "Books"), each = 8),
  sales_amount = c(150, 200, 100, 120, 180, 80, 70, 90, 110, 95, 250, 300, 280, 320,
                   270, 40, 60, 50, 55, 45, 65, 78, 89, 34)

# Subset of sales data for Electronics category
electronics_sales <- sales_data[sales_data$product_category == "Electronics", ]

# Displaying the subset


  transaction_id product_category sales_amount
1 1 Electronics 150
2 2 Electronics 200
3 3 Electronics 100
4 4 Electronics 120
5 5 Electronics 180
6 6 Electronics 80
7 7 Electronics 70
8 8 Electronics 90


In Conclusion, we learned various methods to analyze the data in subsets. R language offers versatile tools to analyze the data in subsets.

