Python Implementation of Volin Plot on Custom Dataset
Importing required libraries
Python3
import numpy as np import pandas as pd import seaborn as sns from matplotlib import pyplot from sklearn.datasets import load_iris |
Loading Data
Python3
# Load the Iris dataset iris = load_iris() # Create a DataFrame from the # features (X) with column names df = pd.DataFrame(data = iris.data,\ columns = iris.feature_names) # Add the target variable (y) to the DataFrame df[ 'target' ] = iris.target # Display the first five rows of the DataFrame print (df.head( 5 )) |
Output:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target
0 5.1 3.5 1.4 0.2 0
1 4.9 3.0 1.4 0.2 0
2 4.7 3.2 1.3 0.2 0
3 4.6 3.1 1.5 0.2 1
4 5.0 3.6 1.4 0.2 0
Description of the dataset
Python3
df.describe() |
Output:
sepal length (cm) sepal width (cm) petal length (cm) \
count 150.000000 150.000000 150.000000
mean 5.843333 3.057333 3.758000
std 0.828066 0.435866 1.765298
min 4.300000 2.000000 1.000000
25% 5.100000 2.800000 1.600000
50% 5.800000 3.000000 4.350000
75% 6.400000 3.300000 5.100000
max 7.900000 4.400000 6.900000
petal width (cm) target
count 150.000000 150.000000
mean 1.199333 1.000000
std 0.762238 0.819232
min 0.100000 0.000000
25% 0.300000 0.000000
50% 1.300000 1.000000
75% 1.800000 2.000000
max 2.500000 2.000000
Information About the Dataset
Python3
df.info() |
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sepal length (cm) 150 non-null float64
1 sepal width (cm) 150 non-null float64
2 petal length (cm) 150 non-null float64
3 petal width (cm) 150 non-null float64
4 target 150 non-null int64
dtypes: float64(4), int64(1)
memory usage: 6.0 KB
Describing the ‘sepal length (cm)’ feature of the Iris dataset.
Python3
df[ "sepal length (cm)" ].describe() |
Output:
count 150.000000
mean 5.843333
std 0.828066
min 4.300000
25% 5.100000
50% 5.800000
75% 6.400000
max 7.900000
Name: SepalLengthCm, dtype: float64
Univariate Violin Plot for ‘sepal length (cm)’ Feature.
Python3
fig, ax = pyplot.subplots(figsize = ( 9 , 7 )) sns.violinplot(ax = ax, y = df[ "sepal length (cm)" ] ) |
Output:
As you can see, we have a higher density between 5 and 6. That is very significant because as in the sepal length (cm) description, a mean value is at 5.43.
Univariate Violin Plot for the ‘sepal width (cm)’ feature.
Python3
fig, ax = pyplot.subplots(figsize = ( 9 , 7 )) sns.violinplot(ax = ax, y = df[ "sepal width (cm)" ] ) |
Output:
Here also, Higher density is at the mean = 3.05.
Bivariate Violin Plot comparing ‘SepalLengthCm’ and ‘SepalWidthCm’.
Python3
fig, ax = pyplot.subplots(figsize = ( 9 , 7 )) sns.violinplot(ax = ax, data = df.iloc[:, : 2 ]) |
Output:
Bivariate Violin Plot comparing ‘sepal length (cm)’ species-wise.
Python3
fig, ax = pyplot.subplots(figsize = ( 9 , 7 )) sns.violinplot(ax = ax, x = df[ "target" ], y = df[ "sepal length (cm)" ], palette = 'Set1' ) |
Output:
Also Check:
Violin Plot for Data Analysis
Data visualization is instrumental in understanding and interpreting data trends. Various visualization charts aid in comprehending data, with the violin plot standing out as a powerful tool for visualizing data distribution. This article aims to explore the fundamentals, implementation, and interpretation of violin plots.
Before applying any transformations to the features of a dataset, it is often necessary to seek answers to questions like the following:
- Are the values primarily clustered around the median?
- Alternatively, do they exhibit clustering at the extremes with a dearth of values in the middle range?
These inquiries go beyond median and mean values alone and are essential for obtaining a comprehensive understanding of the dataset. We can use a Violin plot for answering these questions.
Contact Us