Feature scaling

There are times when we have different features in the dataset on different scales. So, while using the gradient descent algorithm for training of the Machine Learning Algorithms it is advised to use features which are on the same scale to have stable and fast training of our algorithm. There are different methods of feature scaling like standardization and normalization which is also known as Min-Max Scaling.

Standardization

Standardizing is like giving your stats a change! Imagine you have a group of friends, all of whom are of different heights and weights. Some are long, some are heavy, difficult to compare directly. So what do you guys do? You bring a magic tailor with their standards! The tailor takes each friend and measures their height and weight. Then, these measurements are converted to a new scale where each person’s height and weight are adjusted to equal mean and value. Now, all your friends are “standardized” in one way. Modified for ease of comparison and analysis.

Thus, standardization is all about converting data to a common scale to facilitate comparison and analysis. It’s like giving your stats a fashionable makeover to reveal their true beauty and power!

  1. Create a new dataframe standardized_data as a copy of the original data dataframe.
  2. Standardize the MathScore column in standardized_data using the scale() function.
  3. Print the first few standardized values of MathScore using head().

R




standardized_data <- data.frame(data)
  
# Create a new variable for standardized MathScore
standardized_math <- scale(standardized_data$MathScore)
  
# Print the first few values of the standardized MathScore
head(standardized_math)


Output:

           [,1]
[1,] 0.2758342
[2,] 1.3395404
[3,] 0.6082424
[4,] 0.4087975
[5,] 1.2065771
[6,] -1.7186149

Normalization

Normalization is like giving your facts a makeover to make them first-class! It’s like dressing up your numbers and making them sense confident and pleasing. Just like getting each person to put on the same-sized t-shirt, normalization adjusts the values so all of them match properly inside a selected range(0-1). It’s like bringing harmony to your statistics, making it less complicated to examine and analyze. So, let’s get your facts equipped to polish with some normalization magic!

  • Create a new dataframe normalized_data as a copy of the original data.
  • Normalize the values in the MathScore column using the formula:

(normalized_data$MathScore - min(normalized_data$MathScore)) / (max(normalized_data$MathScore) - min(normalized_data$MathScore)).

  • The normalized MathScore values are stored in the variable normalized_math, ranging from 0 to 1

R




# Create a new dataframe for normalization
normalized_data <- data.frame(data)
  
# Create a new variable for normalized MathScore
normalized_math <- (normalized_data$MathScore - \
                    min(normalized_data$MathScore)) /
(max(normalized_data$MathScore) - min(normalized_data$MathScore))
  
# Print the first few values of the normalized MathScore
head(normalized_math)


Output:

0.71 0.87 0.76 0.73 0.85 0.41

Data Preprocessing in R

Welcome, adventurous data enthusiasts! Today we celebrate an exciting journey filled with lots of twists, turns, and fun, as we dive into the world of data cleaning and visualization through R Programming Language. Grab your virtual backpacks, put on your data detective hats, Ready to unravel the secrets of a dataset filled with test results and interesting features.

Similar Reads

Data Preprocessing in R

Installing and loading the tidyverse package....

Feature scaling

...

Feature Encoding

...

Contact Us