Visualize Data Relationships
To visualize data relationships, we’ll explore univariate, bivariate, and multivariate analyses using the employees dataset. These visualizations will help uncover patterns, trends, and relationships within the data.
We will use Matplotlib and Seaborn library for the data visualization. If you want to know about these modules refer to the articles:
Univariate Analysis
This analysis focuses on a single variable. Here, we’ll look at the distributions of ‘Salary’ and ‘Bonus %’.
- Histogram of Salary
- Histogram of Bonus %
Histograms and density plots are typically used to visualize the distribution. These plots can show the spread, central tendency, and any skewness in the data.
# Univariate Analysis: Histograms for 'Salary' and 'Bonus %'
fig, axes = plt.subplots(1, 2, figsize=(18, 6))
sns.histplot(df['Salary'], bins=30, kde=True, ax=axes[0])
axes[0].set_title('Histogram of Salary')
sns.histplot(df['Bonus %'], bins=30, kde=True, ax=axes[1])
axes[1].set_title('Histogram of Bonus %')
plt.show()
Output:
Bivariate Analysis
Bivariate analysis explores the relationship between two variables. Common visualizations include Scatter Plot and Box Plots.
Boxplot For Data Visualization
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
sns.boxplot( x="Salary", y='Team', data=df, )
plt.show()
Output:
Scatter Plot For Data Visualization
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
sns.scatterplot( x="Salary", y='Team', data=df,
hue='Gender', size='Bonus %')
# Placing Legend outside the Figure
plt.legend(bbox_to_anchor=(1, 1), loc=2)
plt.show()
Output:
Multivariate Analysis
Multivariate analysis involves examining the relationships among three or more variables. Some common methods include:
- Pair Plots: To visualize pairwise relationships across several variables at once.
- Heatmaps: Particularly useful for showing the correlation matrix between numerical variables.
- Faceted Grids: Allow you to explore data across many dimensions and are particularly useful for understanding the interaction effects among variables.
For Now, we will use pairplot()method of the seaborn module. We can also use it for the multiple pairwise bivariate distributions in a dataset.
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
sns.pairplot(df, hue='Gender', height=2)
Output:
Steps for Mastering Exploratory Data Analysis | EDA Steps
Mastering exploratory data analysis (EDA) is crucial for understanding your data, identifying patterns, and generating insights that can inform further analysis or decision-making. Data is the lifeblood of cutting-edge groups, and the capability to extract insights from records has become a crucial talent in today’s statistics-pushed world. Exploratory Data Analysis (EDA) is a powerful method that allows analysts, scientists, and researchers to gain complete knowledge of their data earlier than projecting formal modeling or speculation testing.
It is an iterative procedure that entails summarizing, visualizing, and exploring information to find patterns, anomalies, and relationships that might not be apparent at once. In this complete article, we will understand and implement critical steps for performing Exploratory Data Analysis. Here are steps to help you master EDA:
Steps for Mastering Exploratory Data Analysis
- Step 1: Understand the Problem and the Data
- Step 2: Import and Inspect the Data
- Step 3: Handling Missing Values
- Step 4: Explore Data Characteristics
- Step 5: Perform Data Transformation
- Step 6: Visualize Data Relationships
- Step 7: Handling Outliers
- Step 8: Communicate Findings and Insights
Contact Us