Feature scaling
There are times when we have different features in the dataset on different scales. So, while using the gradient descent algorithm for training of the Machine Learning Algorithms it is advised to use features which are on the same scale to have stable and fast training of our algorithm. There are different methods of feature scaling like standardization and normalization which is also known as Min-Max Scaling.
Standardization
Standardizing is like giving your stats a change! Imagine you have a group of friends, all of whom are of different heights and weights. Some are long, some are heavy, difficult to compare directly. So what do you guys do? You bring a magic tailor with their standards! The tailor takes each friend and measures their height and weight. Then, these measurements are converted to a new scale where each person’s height and weight are adjusted to equal mean and value. Now, all your friends are “standardized” in one way. Modified for ease of comparison and analysis.
Thus, standardization is all about converting data to a common scale to facilitate comparison and analysis. It’s like giving your stats a fashionable makeover to reveal their true beauty and power!
- Create a new dataframe standardized_data as a copy of the original data dataframe.
- Standardize the MathScore column in standardized_data using the scale() function.
- Print the first few standardized values of MathScore using head().
R
standardized_data <- data.frame (data) # Create a new variable for standardized MathScore standardized_math <- scale (standardized_data$MathScore) # Print the first few values of the standardized MathScore head (standardized_math) |
Output:
[,1]
[1,] 0.2758342
[2,] 1.3395404
[3,] 0.6082424
[4,] 0.4087975
[5,] 1.2065771
[6,] -1.7186149
Normalization
Normalization is like giving your facts a makeover to make them first-class! It’s like dressing up your numbers and making them sense confident and pleasing. Just like getting each person to put on the same-sized t-shirt, normalization adjusts the values so all of them match properly inside a selected range(0-1). It’s like bringing harmony to your statistics, making it less complicated to examine and analyze. So, let’s get your facts equipped to polish with some normalization magic!
- Create a new dataframe
normalized_data
as a copy of the originaldata
. - Normalize the values in the MathScore column using the formula:
(normalized_data$MathScore - min(normalized_data$MathScore)) / (max(normalized_data$MathScore) - min(normalized_data$MathScore))
.
- The normalized MathScore values are stored in the variable
normalized_math
, ranging from 0 to 1
R
# Create a new dataframe for normalization normalized_data <- data.frame (data) # Create a new variable for normalized MathScore normalized_math <- (normalized_data$MathScore - \ min (normalized_data$MathScore)) / ( max (normalized_data$MathScore) - min (normalized_data$MathScore)) # Print the first few values of the normalized MathScore head (normalized_math) |
Output:
0.71 0.87 0.76 0.73 0.85 0.41
Data Preprocessing in R
Welcome, adventurous data enthusiasts! Today we celebrate an exciting journey filled with lots of twists, turns, and fun, as we dive into the world of data cleaning and visualization through R Programming Language. Grab your virtual backpacks, put on your data detective hats, Ready to unravel the secrets of a dataset filled with test results and interesting features.
Contact Us