Data Wrangling

In data wrangling, we process and transform the data to get the most useful and better structure out of it. To divide and summarize our dataset based on a column category. We will use the pandas groupby() method 

Python3




tinder_df.groupby(['sex', 'drugs'])['drugs'] \
    .count() \
    .reset_index(name='unique_drug_count')


Output:

    sex    drugs    unique_drug_count
0    f    never        711
1    f    often        5
2    f    sometimes    146
3    m    never        875
4    m    often        13
5    m    sometimes    251

We can also group people based on their interest in learning new languages and college dropouts.

Python3




tinder_df.groupby(['new_languages', 'dropped_out']) \
            ['dropped_out'].count(). \
            reset_index(name='drop_out_people count')


Output:

new_languages    dropped_out    drop_out_people count
0    interested            no                 594
1    interested            yes                 39
2    not interested        no                 999
3    not interested        yes                 51
4    somewhat interested    no                 305
5    somewhat interested    yes                 13

Data Visualization

Data visualization is an important part of storytelling. In data visualization, we make interactive plots using Python libraries to demonstrate the ideas which columns are trying to tell.

Python3




# distribution of age
sns.histplot(tinder_df["age"], kde=True)


Output:

 Histplot of age using seaborn 

The age column has a long tail which shows it has a deviation from a normal distribution. Later we will apply some transformation to this age column to make it a normal distribution. Next, we will plot a histogram plot of the Height column.

Python3




# Distribution of height
sns.histplot(tinder_df["height"], kde=True)


Output:

Histplot of Height column using seaborn 

We can also plot a pie chart for the numerical data to see the percentage contribution in a certain range. we may be interested in knowing the percentage of people in a certain age range who are using Tinder. We will use the pandas cut() function to create bins for the numerical data.

Python3




# Set the size of the figure to 10 inches
# wide by 8 inches tall
plt.figure(figsize=(6, 6))
  
# Divide the data into categories
bins = [18, 30, 40, 50, 60, 70]
  
# Use the `cut` function to assign
# each data point to a category
categories = pd.cut(tinder_df["age"], bins,
                    labels=["18-30", "30-40",
                            "40-50", "50-60", "60-70"])
  
# Count the number of data points in each category
counts = categories.value_counts()
  
# Plot the data as a pie chart
plt.pie(counts, labels=counts.index, autopct='%1.1f%%')
plt.show()


Output:

Pie chart for the percentage of age distribution

We can use the Histplot function from Seaborn to create a graph that shows the count of people in a particular job.

Python3




plt.figure(figsize=(6, 6))
sns.histplot(x="job", data=tinder_df,
             color="coral")
  
# rotate x-axis labels vertically
plt.xticks(rotation=90)
plt.title("Distribution of job of each candidate",
          fontsize=14)
  
plt.xlabel("Job id", fontsize=12)
plt.ylabel("Count of people", fontsize=12)
  
plt.show()


Output:

Count of people in a particular job using Histplot 

Predict Tinder Matches with Machine Learning

In this article, we are going to make a project on the Tinder Match-Making Recommender system. Most social media platform have their own recommender system algorithms. In our project, which works like Tinder, we are going to make a recommender algorithm that recommends profiles to people based on their similar interests so we will aim to predict the profiles to the user such that the user finds it most interesting out of all and tries to connect with them. We are going to make this project from basic and the steps we are going to follow are as:

Similar Reads

Importing Libraries

We will import all the libraries in one place so that we don’t have to import packages every time we use them. This practice will save us time and reserve our memory space....

Exploratory Data Analysis of the Dataset

...

Data Wrangling

...

Data Manipulation

In exploratory data analysis(EDA), we try to gain essential pieces of information from the dataframe. EDA is considered to be one of the time-consuming parts of a data science project about 75% of our work will be in doing EDA of the dataset. However, we will see next that our effort will get justified in the end....

Data Modelling

...

Model Prediction

...

Contact Us