Handling Missing Values

You all must be wondering why a dataset will contain any missing values. It can occur when no information is provided for one or more items or for a whole unit. For Example, Suppose different users being surveyed may choose not to share their income, and some users may choose not to share their address in this way many datasets went missing. Missing Data is a very big problem in real-life scenarios.

Missing Data can also refer to as NA(Not Available) values in pandas. There are several useful functions for detecting, removing, and replacing null values in Pandas DataFrame :

Now let’s check if there are any missing values in our dataset or not.

Python3

df.isnull().sum()

Output:

First Name            67
Gender               145
Start Date             0
Last Login Time        0
Salary                 0
Bonus %                0
Senior Management     67
Team                  43
dtype: int64

We can see that every column has a different amount of missing values. Like Gender has 145 missing values and salary has 0. Now for handling these missing values there can be several cases like dropping the rows containing NaN or replacing NaN with either mean, median, mode, or some other value.

Now, let’s try to fill in the missing values of gender with the string “No Gender”.

Python3

df["Gender"].fillna("No Gender", inplace = True) 
df.isnull().sum()

Output:

First Name           67
Gender                0
Start Date            0
Last Login Time       0
Salary                0
Bonus %               0
Senior Management    67
Team                 43
dtype: int64

We can see that now there is no null value for the gender column. Now, Let’s fill the senior management with the mode value.

Python3

mode = df['Senior Management'].mode().values[0]
df['Senior Management']= df['Senior Management'].replace(np.nan, mode)
df.isnull().sum()

Output:

First Name           67
Gender                0
Start Date            0
Last Login Time       0
Salary                0
Bonus %               0
Senior Management     0
Team                 43
dtype: int64

Now for the first name and team, we cannot fill the missing values with arbitrary data, so, let’s drop all the rows containing these missing values.

Python3

df = df.dropna(axis = 0, how ='any')
print(df.isnull().sum())
df.shape

Output:

First Name           0
Gender               0
Start Date           0
Last Login Time      0
Salary               0
Bonus %              0
Senior Management    0
Team                 0
dtype: int64
(899, 8)

We can see that our dataset is now free of all the missing values and after dropping the data the number of rows also reduced from 1000 to 899.

For more information, refer to Working with Missing Data in Pandas.

Steps for Mastering Exploratory Data Analysis | EDA Steps

Mastering exploratory data analysis (EDA) is crucial for understanding your data, identifying patterns, and generating insights that can inform further analysis or decision-making. Data is the lifeblood of cutting-edge groups, and the capability to extract insights from records has become a crucial talent in today’s statistics-pushed world. Exploratory Data Analysis (EDA) is a powerful method that allows analysts, scientists, and researchers to gain complete knowledge of their data earlier than projecting formal modeling or speculation testing.

It is an iterative procedure that entails summarizing, visualizing, and exploring information to find patterns, anomalies, and relationships that might not be apparent at once. In this complete article, we will understand and implement critical steps for performing Exploratory Data Analysis. Here are steps to help you master EDA:

Steps for Mastering Exploratory Data Analysis

Step 1: Understand the Problem and the Data
Step 2: Import and Inspect the Data
Step 3: Handling Missing Values
Step 4: Explore Data Characteristics
Step 5: Perform Data Transformation
Step 6: Visualize Data Relationships
Step 7: Handling Outliers
Step 8: Communicate Findings and Insights

Handling Missing Values

Steps for Mastering Exploratory Data Analysis | EDA Steps

Similar Reads

Contact Us