Data Preprocessing and Visualization

Get the number of columns of object datatype.

Python3

obj = (data.dtypes == 'object') 
print("Categorical variables:",len(list(obj[obj].index)))

Output :

Categorical variables: 7

As Loan_ID is completely unique and not correlated with any of the other column, So we will drop it using .drop() function.

Python3

# Dropping Loan_ID column 
data.drop(['Loan_ID'],axis=1,inplace=True)

Visualize all the unique values in columns using barplot. This will simply show which value is dominating as per our dataset.

Python3

obj = (data.dtypes == 'object') 
object_cols = list(obj[obj].index) 
plt.figure(figsize=(18,36)) 
index = 1
  
for col in object_cols: 
  y = data[col].value_counts() 
  plt.subplot(11,4,index) 
  plt.xticks(rotation=90) 
  sns.barplot(x=list(y.index), y=y) 
  index +=1

Output:

As all the categorical values are binary so we can use Label Encoder for all such columns and the values will change into int datatype.

Python3

# Import label encoder 
from sklearn import preprocessing 
    
# label_encoder object knows how  
# to understand word labels. 
label_encoder = preprocessing.LabelEncoder() 
obj = (data.dtypes == 'object') 
for col in list(obj[obj].index): 
  data[col] = label_encoder.fit_transform(data[col])

Again check the object datatype columns. Let’s find out if there is still any left.

Python3

# To find the number of columns with  
# datatype==object 
obj = (data.dtypes == 'object') 
print("Categorical variables:",len(list(obj[obj].index)))

Output :

Categorical variables: 0

Python3

plt.figure(figsize=(12,6)) 
  
sns.heatmap(data.corr(),cmap='BrBG',fmt='.2f', 
            linewidths=2,annot=True)

Output:

The above heatmap is showing the correlation between Loan Amount and ApplicantIncome. It also shows that Credit_History has a high impact on Loan_Status.

Now we will use Catplot to visualize the plot for the Gender, and Marital Status of the applicant.

Python3

sns.catplot(x="Gender", y="Married", 
            hue="Loan_Status",  
            kind="bar",  
            data=data)

Output:

Now we will find out if there is any missing values in the dataset using below code.

Python3

for col in data.columns: 
  data[col] = data[col].fillna(data[col].mean())  
    
data.isna().sum()

Output:

Gender               0
Married              0
Dependents           0
Education            0
Self_Employed        0
ApplicantIncome      0
CoapplicantIncome    0
LoanAmount           0
Loan_Amount_Term     0
Credit_History       0
Property_Area        0
Loan_Status          0

As there is no missing value then we must proceed to model training.

Loan Approval Prediction using Machine Learning

LOANS are the major requirement of the modern world. By this only, Banks get a major part of the total profit. It is beneficial for students to manage their education and living expenses, and for people to buy any kind of luxury like houses, cars, etc.

But when it comes to deciding whether the applicant’s profile is relevant to be granted with loan or not. Banks have to look after many aspects.

So, here we will be using Machine Learning with Python to ease their work and predict whether the candidate’s profile is relevant or not using key features like Marital Status, Education, Applicant Income, Credit History, etc.

Tags:

#AI-ML-DS With Python #Machine Learning Projects #AI-ML-DS #Machine Learning #Machine Learning

Importing Libraries and Dataset

Splitting Dataset

Data Preprocessing and Visualization

Python3

Python3

Python3

Python3

Python3

Python3

Python3

Python3

Loan Approval Prediction using Machine Learning

Similar Reads

Contact Us