Exploratory Data Analysis of the Dataset

In exploratory data analysis(EDA), we try to gain essential pieces of information from the dataframe. EDA is considered to be one of the time-consuming parts of a data science project about 75% of our work will be in doing EDA of the dataset. However, we will see next that our effort will get justified in the end.

We will first see the dimension of our dataset using the panda shape() function. The output of this function will be a tuple having a total number of columns and rows.

Python3




# shape of the dataset
print(tinder_df.shape)


output:

(2001, 22)

Next, we will use the info() function from the pandas to see the information about the dataset. The function will give Dtype and Non-Null counts of all the columns.

Python3




# information about the dataset
tinder_df.info()


Output :

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2001 entries, 0 to 2000
Data columns (total 22 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   user_id              2001 non-null   object 
 1   username             2001 non-null   object 
 2   age                  2001 non-null   int64  
 3   status               2001 non-null   object 
 4   sex                  2001 non-null   object 
 5   orientation          2001 non-null   object 
 6   drinks               2001 non-null   object 
 7   drugs                2001 non-null   object 
 8   height               2001 non-null   float64
 9   job                  2001 non-null   object 
 10  location             2001 non-null   object 
 11  pets                 2001 non-null   object 
 12  smokes               2001 non-null   object 
 13  language             2001 non-null   object 
 14  new_languages        2001 non-null   object 
 15  body_profile         2001 non-null   object 
 16  education_level      2001 non-null   float64
 17  dropped_out          2001 non-null   object 
 18  bio                  2001 non-null   object 
 19  interests            2001 non-null   object 
 20  other_interests      2001 non-null   object 
 21  location_preference  2001 non-null   object 
dtypes: float64(2), int64(1), object(19)
memory usage: 344.0+ KB

The function shows that the Dataset has a total of 2 float dtype columns 1 int dtype column and 19 object dtype columns. To see the total number of unique elements in each column. We will use the Pandas nunique() function.   

Python3




# Number of unique element in the columns
tinder_df.nunique()


Output:

user_id                2001
username               1995
age                      52
status                    4
sex                       2
orientation               3
drinks                    6
drugs                     3
height                   25
job                      21
location                 70
pets                     15
smokes                    5
language                575
new_languages             3
body_profile             12
education_level           5
dropped_out               2
bio                    2001
interests                31
other_interests          31
location_preference       3
dtype: int64

Predict Tinder Matches with Machine Learning

In this article, we are going to make a project on the Tinder Match-Making Recommender system. Most social media platform have their own recommender system algorithms. In our project, which works like Tinder, we are going to make a recommender algorithm that recommends profiles to people based on their similar interests so we will aim to predict the profiles to the user such that the user finds it most interesting out of all and tries to connect with them. We are going to make this project from basic and the steps we are going to follow are as:

Similar Reads

Importing Libraries

We will import all the libraries in one place so that we don’t have to import packages every time we use them. This practice will save us time and reserve our memory space....

Exploratory Data Analysis of the Dataset

...

Data Wrangling

...

Data Manipulation

In exploratory data analysis(EDA), we try to gain essential pieces of information from the dataframe. EDA is considered to be one of the time-consuming parts of a data science project about 75% of our work will be in doing EDA of the dataset. However, we will see next that our effort will get justified in the end....

Data Modelling

...

Model Prediction

...

Contact Us