Data Analysis using Pandas AI
Step 1: Install pandasai and openai library in your Python environment by executing the following command in Jupyter Notebook
!pip install -q pandasai
Step 2: Import the necessary libraries
Python3
import pandas as pd from pandasai import PandasAI from pandasai.llm.openai import OpenAI |
Step 3: Initialize an instance of OpenAI LLM and pass it’s API key
Python3
# replace "your_api_key" with your generated key OPENAI_API_KEY = "your_api_key" llm = OpenAI(api_token = OPENAI_API_KEY) pandas_ai = PandasAI(llm) |
Step 4: Import the IPL 2023 Auction dataset using pandas
We are using the IPL 2023 Auction dataset here.
Python3
df = pd.read_csv( 'IPL_Squad_2023_Auction_Dataset.csv' ) print (df.shape) df.head() |
Output:
Step 5: Drop the “Unnamed: 0” column from the above dataset
Python3
df.drop([ 'Unnamed: 0' ], axis = 1 , inplace = True ) df.head() |
Output:
Step 6: Data Analysis using PandasAI
Now let’s begin our analysis:
Prompt 1:
Python3
pandas_ai.run(df, prompt = "Which players are the most costliest buys?" ) |
Output:
['Sam Curran', 'Cameron Green', 'Ben Stokes']
Prompt 2:
Python3
prompts = """ Which players were the cheapest buys this season and which team bought them? """ pandas_ai.run(df, prompt = prompts) |
Output:
Well, it looks like the cheapest buys this season were Glenn Phillips for Sunrisers Hyderabad,
Raj Angad Bawa and Rishi Dhawan for Punjab Super Kings, Dhruv Jurel and K.C Cariappa
for Rajasthan Royals, and many more. The full list includes 163 players and their respective teams.
Prompt 3:
Python3
prompts = """ Draw a Bargraph showing How much money was spent by each team this season overall. """ pandas_ai.run(df, prompt = prompts) |
Output:
Prompt 4:
Python3
pandas_ai.run(df, prompt = "How many bowler remained unsold and what was their base price?" ) |
Output:
There were 108 bowlers who remained unsold in the auction.
Their base price ranged from 2 million to 20 million.
Prompt 5:
Python3
pandas_ai.run(df, prompt = "How many players remained unsold this season?" ) |
Output:
('Number of players remained unsold this season:', 338)
Prompt 6:
Python3
pandas_ai.run(df, prompt = "Which type of players were majorly unsold?" ) |
Output:
Player's List Base Price TYPE COST IN ₹ (CR.) Cost IN $ (000) \
0 Shivam Mavi 4000000 BOWLER 6.0 720.0
2 Kane Williamson 20000000 BATSMAN 2.0 240.0
3 K.S. Bharat 2000000 WICKETKEEPER 1.2 144.0
5 Odean Smith 5000000 ALL-ROUNDER 0.5 60.0
7 Rahul Tewatia Retained ALL-ROUNDER 0.0 0.0
2022 Squad Team
0 KKR Gujarat Titans
2 SRH Gujarat Titans
3 DC Gujarat Titans
5 PBKS Gujarat Titans
7 GT Gujarat Titans
TYPE
ALL-ROUNDER 65
BOWLER 64
BATSMAN 35
WICKETKEEPER 21
Name: TYPE, dtype: int64
Number of unsold players: 0
Prompt 7:
Python3
pandas_ai.run(df, prompt = "Who are three new players Gujrat picked?" ) |
Output:
0 Shivam Mavi
1 Joshua Little
2 Kane Williamson
Name: Player's List, dtype: object
Prompt 8:
Python3
pandas_ai.run(df, prompt = "What is total money spent by all teams in dollars?" ) |
Output:
The total amount of money spent by all teams in the auction is $20,040,000.
Prompt 9:
Python3
prompts = """ draw a barplot showing how much money was spent by Mumbai Indians on all types of players? """ pandas_ai.run(df, prompt = prompts) |
Output:
Sure, I can help you with that! To answer your question, I can draw a barplot that shows how much money the Mumbai Indians spent on all types of players. Would you like me to do that for you now?
Prompt 10:
Python3
prompts = """ draw a barplot showing how much money was spent by Gujrat on all types of players? """ pandas_ai.run(df, prompt = prompts) |
Output:
Prompt 11:
Python3
pandas_ai.run(df, prompt = "Can you predict which team will buy Sam Curran in 2024?" ) |
Output:
Lucknow Super Giants
Prompt 12:
Python3
print (pandas_ai.run(df, prompt = "Can you predict which team will spend the highest money in 2024?" )) |
Output:
Player's List Base Price TYPE COST IN ₹ (CR.) Cost IN $ (000) \
0 Shivam Mavi 4000000 BOWLER 6.0 720.0
2 Kane Williamson 20000000 BATSMAN 2.0 240.0
3 K.S. Bharat 2000000 WICKETKEEPER 1.2 144.0
5 Odean Smith 5000000 ALL-ROUNDER 0.5 60.0
7 Rahul Tewatia Retained ALL-ROUNDER 0.0 0.0
2022 Squad Team
0 KKR Gujarat Titans
2 SRH Gujarat Titans
3 DC Gujarat Titans
5 PBKS Gujarat Titans
7 GT Gujarat Titans
The team that will spend the highest money in 2024 is: Lucknow Super Giants
Prompt 13:
Python3
pandas_ai.run(df, prompt = "Perform univariate analysis" ) |
Output:
Sure! Based on the data provided, the univariate analysis shows that we have six variables: Player's List, Base Price, TYPE, COST IN ₹ (CR.), Cost IN $ (000), 2022 Squad, and Team. The data types for these variables are object, object, object, float64, float64, and object respectively.
Prompt 14:
Python3
pandas_ai.run(df, prompt = "Perform multivariate analysis" ) |
Output:
Unfortunately, I was not able to answer your question. Please try again. If the problem persists, try rephrasing your question.
For this input, PandasAI seems to have failed as the complexity and ambiguity increased.
IPL 2023 Data Analysis using Pandas AI
We are already familiar with performing data analysis using Pandas, in this article, we will see how we can leverage the power of PandasAI to perform analysis on IPL 2023 Auction dataset.
Contact Us