Data Wrangling
In data wrangling, we process and transform the data to get the most useful and better structure out of it. To divide and summarize our dataset based on a column category. We will use the pandas groupby() method
Python3
tinder_df.groupby([ 'sex' , 'drugs' ])[ 'drugs' ] \ .count() \ .reset_index(name = 'unique_drug_count' ) |
Output:
sex drugs unique_drug_count 0 f never 711 1 f often 5 2 f sometimes 146 3 m never 875 4 m often 13 5 m sometimes 251
We can also group people based on their interest in learning new languages and college dropouts.
Python3
tinder_df.groupby([ 'new_languages' , 'dropped_out' ]) \ [ 'dropped_out' ].count(). \ reset_index(name = 'drop_out_people count' ) |
Output:
new_languages dropped_out drop_out_people count 0 interested no 594 1 interested yes 39 2 not interested no 999 3 not interested yes 51 4 somewhat interested no 305 5 somewhat interested yes 13
Data Visualization
Data visualization is an important part of storytelling. In data visualization, we make interactive plots using Python libraries to demonstrate the ideas which columns are trying to tell.
Python3
# distribution of age sns.histplot(tinder_df[ "age" ], kde = True ) |
Output:
The age column has a long tail which shows it has a deviation from a normal distribution. Later we will apply some transformation to this age column to make it a normal distribution. Next, we will plot a histogram plot of the Height column.
Python3
# Distribution of height sns.histplot(tinder_df[ "height" ], kde = True ) |
Output:
We can also plot a pie chart for the numerical data to see the percentage contribution in a certain range. we may be interested in knowing the percentage of people in a certain age range who are using Tinder. We will use the pandas cut() function to create bins for the numerical data.
Python3
# Set the size of the figure to 10 inches # wide by 8 inches tall plt.figure(figsize = ( 6 , 6 )) # Divide the data into categories bins = [ 18 , 30 , 40 , 50 , 60 , 70 ] # Use the `cut` function to assign # each data point to a category categories = pd.cut(tinder_df[ "age" ], bins, labels = [ "18-30" , "30-40" , "40-50" , "50-60" , "60-70" ]) # Count the number of data points in each category counts = categories.value_counts() # Plot the data as a pie chart plt.pie(counts, labels = counts.index, autopct = '%1.1f%%' ) plt.show() |
Output:
We can use the Histplot function from Seaborn to create a graph that shows the count of people in a particular job.
Python3
plt.figure(figsize = ( 6 , 6 )) sns.histplot(x = "job" , data = tinder_df, color = "coral" ) # rotate x-axis labels vertically plt.xticks(rotation = 90 ) plt.title( "Distribution of job of each candidate" , fontsize = 14 ) plt.xlabel( "Job id" , fontsize = 12 ) plt.ylabel( "Count of people" , fontsize = 12 ) plt.show() |
Output:
Predict Tinder Matches with Machine Learning
In this article, we are going to make a project on the Tinder Match-Making Recommender system. Most social media platform have their own recommender system algorithms. In our project, which works like Tinder, we are going to make a recommender algorithm that recommends profiles to people based on their similar interests so we will aim to predict the profiles to the user such that the user finds it most interesting out of all and tries to connect with them. We are going to make this project from basic and the steps we are going to follow are as:
Contact Us