Data Wrangling

Exploratory Data Analysis of the Dataset

In data wrangling, we process and transform the data to get the most useful and better structure out of it. To divide and summarize our dataset based on a column category. We will use the pandas groupby() method

Python3

tinder_df.groupby(['sex', 'drugs'])['drugs'] \ 
    .count() \ 
    .reset_index(name='unique_drug_count') 

Output:

    sex    drugs    unique_drug_count
0    f    never        711
1    f    often        5
2    f    sometimes    146
3    m    never        875
4    m    often        13
5    m    sometimes    251

We can also group people based on their interest in learning new languages and college dropouts.

Python3

tinder_df.groupby(['new_languages', 'dropped_out']) \ 
            ['dropped_out'].count(). \ 
            reset_index(name='drop_out_people count') 

Output:

new_languages    dropped_out    drop_out_people count
0    interested            no                 594
1    interested            yes                 39
2    not interested        no                 999
3    not interested        yes                 51
4    somewhat interested    no                 305
5    somewhat interested    yes                 13

Data Visualization

Data visualization is an important part of storytelling. In data visualization, we make interactive plots using Python libraries to demonstrate the ideas which columns are trying to tell.

Python3

# distribution of age 
sns.histplot(tinder_df["age"], kde=True) 

Output:

Histplot of age using seaborn

The age column has a long tail which shows it has a deviation from a normal distribution. Later we will apply some transformation to this age column to make it a normal distribution. Next, we will plot a histogram plot of the Height column.

Python3

# Distribution of height 
sns.histplot(tinder_df["height"], kde=True) 

Output:

Histplot of Height column using seaborn

We can also plot a pie chart for the numerical data to see the percentage contribution in a certain range. we may be interested in knowing the percentage of people in a certain age range who are using Tinder. We will use the pandas cut() function to create bins for the numerical data.

Python3

# Set the size of the figure to 10 inches 
# wide by 8 inches tall 
plt.figure(figsize=(6, 6)) 
  
# Divide the data into categories 
bins = [18, 30, 40, 50, 60, 70] 
  
# Use the `cut` function to assign 
# each data point to a category 
categories = pd.cut(tinder_df["age"], bins, 
                    labels=["18-30", "30-40", 
                            "40-50", "50-60", "60-70"]) 
  
# Count the number of data points in each category 
counts = categories.value_counts() 
  
# Plot the data as a pie chart 
plt.pie(counts, labels=counts.index, autopct='%1.1f%%') 
plt.show() 

Output:

Pie chart for the percentage of age distribution

We can use the Histplot function from Seaborn to create a graph that shows the count of people in a particular job.

Python3

plt.figure(figsize=(6, 6)) 
sns.histplot(x="job", data=tinder_df, 
             color="coral") 
  
# rotate x-axis labels vertically 
plt.xticks(rotation=90) 
plt.title("Distribution of job of each candidate", 
          fontsize=14) 
  
plt.xlabel("Job id", fontsize=12) 
plt.ylabel("Count of people", fontsize=12) 
  
plt.show() 

Output:

Count of people in a particular job using Histplot

Predict Tinder Matches with Machine Learning

In this article, we are going to make a project on the Tinder Match-Making Recommender system. Most social media platform have their own recommender system algorithms. In our project, which works like Tinder, we are going to make a recommender algorithm that recommends profiles to people based on their similar interests so we will aim to predict the profiles to the user such that the user finds it most interesting out of all and tries to connect with them. We are going to make this project from basic and the steps we are going to follow are as:

Tags:

#Machine Learning Projects #ML-Classification #ML-Clustering #python #AI-ML-DS #Data Science #Machine Learning #Machine Learning #python

Exploratory Data Analysis of the Dataset

Data Manipulation

Data Wrangling

Python3

Python3

Data Visualization

Python3

Python3

Python3

Python3

Predict Tinder Matches with Machine Learning

Similar Reads

Contact Us