Missing Data Handing
Find missing values in the dataset
The isnull( ) detects the missing values and returns a boolean object indicating if the values are NA. The values which are none or empty get mapped to true values and not null values get mapped to false values.
Python3
data_frame.isnull( ) |
Output:
CustomerID Genre Age Annual Income (k$) Spending Score (1-100)
0 False False False False False
1 False False False False False
2 False False False False False
3 False False False False False
4 False False False False False
.. ... ... ... ... ...
195 False False False False False
196 False False False False False
197 False False False False False
198 False False False False False
199 False False False False False
[200 rows x 5 columns]
[8]
0s
Find the number of missing values in the dataset
To find out the number of missing values in the dataset, use data_frame.isnull( ).sum( ). In the below example, the dataset doesn’t contain any null values. Hence, each column’s output is 0.
Python3
data_frame.isnull(). sum () |
Output:
CustomerID 0
Genre 0
Age 0
Annual Income (k$) 0
Spending Score (1-100) 0
dtype: int64
Removing missing values
The data_frame.dropna( ) function removes columns or rows which contains atleast one missing values.
data_frame = data_frame.dropna()
By default, data_frame.dropna( ) drops the rows where at least one element is missing. data_frame.dropna(axis = 1) drops the columns where at least one element is missing.
Fill in missing values
We can fill null values using data_frame.fillna( ) function.
data_frame = data_frame.fillna(value)
But by using the above format all the null values will get filled with the same values. To fill different values in the different columns we can use.
data_frame[col] = data_frame[col].fillna(value)
Row and column manipulations
Removing rows
By using the drop(index) function we can drop the row at a particular index. If we want to replace the data_frame with the row removed then add inplace = True in the drop function.
Python3
#Removing 4th indexed value from the dataframe data_frame.drop( 4 ).head() |
Output:
CustomerID Genre Age Annual Income (k$) Spending Score (1-100)
0 1 Male 19 15 39
1 2 Male 21 15 81
2 3 Female 20 16 6
3 4 Female 23 16 77
5 6 Female 22 17 76
[ ]
This function can also be used to remove the columns of a data frame by adding the attribute axis =1 and providing the list of columns we would like to remove.
Renaming rows
The rename function can be used to rename the rows or columns of the data frame.
Python3
data_frame.rename({ 0 : "First" , 1 : "Second" }) |
Output:
CustomerID Genre Age Annual Income (k$) Spending Score (1-100)
First 1 Male 19 15 39
Second 2 Male 21 15 81
2 3 Female 20 16 6
3 4 Female 23 16 77
4 5 Female 31 17 40
... ... ... ... ... ...
195 196 Female 35 120 79
196 197 Female 45 126 28
197 198 Male 32 126 74
198 199 Male 32 137 18
199 200 Male 30 137 83
[200 rows x 5 columns]
Adding new columns
Python3
#Creates a new column with all the values equal to 1 data_frame[ 'NewColumn' ] = 1 data_frame.head() |
Output:
CustomerID Genre Age Annual Income (k$) Spending Score (1-100) \
0 1 Male 19 15 39
1 2 Male 21 15 81
2 3 Female 20 16 6
3 4 Female 23 16 77
4 5 Female 31 17 40
NewColumn
0 1
1 1
2 1
3 1
4 1
Data Processing with Pandas
Data Processing is an important part of any task that includes data-driven work. It helps us to provide meaningful insights from the data. As we know Python is a widely used programming language, and there are various libraries and tools available for data processing.
In this article, we are going to see Data Processing in Python, Loading, Printing rows and Columns, Data frame summary, Missing data values Sorting and Merging Data Frames, Applying Functions, and Visualizing Dataframes.
Table of Content
- What is Data Processing in Python?
- What is Pandas?
- Loading Data in Pandas DataFrame
- Printing rows of the Data
- Printing the column names of the DataFrame
- Summary of Data Frame
- Descriptive Statistical Measures of a DataFrame
- Missing Data Handing
- Sorting DataFrame values
- Merge Data Frames
- Apply Function
- By using the lambda operator
- Visualizing DataFrame
- Conclusion
Contact Us