Exploratory Data Analysis of the Dataset
In exploratory data analysis(EDA), we try to gain essential pieces of information from the dataframe. EDA is considered to be one of the time-consuming parts of a data science project about 75% of our work will be in doing EDA of the dataset. However, we will see next that our effort will get justified in the end.
We will first see the dimension of our dataset using the panda shape() function. The output of this function will be a tuple having a total number of columns and rows.
Python3
# shape of the dataset print (tinder_df.shape) |
output:
(2001, 22)
Next, we will use the info() function from the pandas to see the information about the dataset. The function will give Dtype and Non-Null counts of all the columns.
Python3
# information about the dataset tinder_df.info() |
Output :
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2001 entries, 0 to 2000 Data columns (total 22 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 user_id 2001 non-null object 1 username 2001 non-null object 2 age 2001 non-null int64 3 status 2001 non-null object 4 sex 2001 non-null object 5 orientation 2001 non-null object 6 drinks 2001 non-null object 7 drugs 2001 non-null object 8 height 2001 non-null float64 9 job 2001 non-null object 10 location 2001 non-null object 11 pets 2001 non-null object 12 smokes 2001 non-null object 13 language 2001 non-null object 14 new_languages 2001 non-null object 15 body_profile 2001 non-null object 16 education_level 2001 non-null float64 17 dropped_out 2001 non-null object 18 bio 2001 non-null object 19 interests 2001 non-null object 20 other_interests 2001 non-null object 21 location_preference 2001 non-null object dtypes: float64(2), int64(1), object(19) memory usage: 344.0+ KB
The function shows that the Dataset has a total of 2 float dtype columns 1 int dtype column and 19 object dtype columns. To see the total number of unique elements in each column. We will use the Pandas nunique() function.
Python3
# Number of unique element in the columns tinder_df.nunique() |
Output:
user_id 2001 username 1995 age 52 status 4 sex 2 orientation 3 drinks 6 drugs 3 height 25 job 21 location 70 pets 15 smokes 5 language 575 new_languages 3 body_profile 12 education_level 5 dropped_out 2 bio 2001 interests 31 other_interests 31 location_preference 3 dtype: int64
Predict Tinder Matches with Machine Learning
In this article, we are going to make a project on the Tinder Match-Making Recommender system. Most social media platform have their own recommender system algorithms. In our project, which works like Tinder, we are going to make a recommender algorithm that recommends profiles to people based on their similar interests so we will aim to predict the profiles to the user such that the user finds it most interesting out of all and tries to connect with them. We are going to make this project from basic and the steps we are going to follow are as:
Contact Us