Introduction
PandasAI, a Python library, extends the functionality of pandas, a widely used data analysis and manipulation tool, by introducing Generative AI capabilities. With PandasAI, pandas (as well as other popular data analysis libraries) become conversational, enabling you to interact with your data using natural language. For instance, you can instruct PandasAI to identify all rows in a DataFrame where a specific column’s value exceeds 5, and it will provide you with a DataFrame containing only those relevant rows. Moreover, PandasAI can also assist you in tasks such as creating graphs, data cleansing, handling missing values, and generating new features.
Setup
First, you’ll want to install the most recent version of PandasAI. Additionally, you’ll need to import SmartDataframe, a specialized kind of DataFrame that inherits all the attributes and functions of the standard pd.DataFrame while incorporating conversational capabilities.
!pip install pandasai
from pandasai import SmartDataframe
Importing from a Pandas DataFrame
To import data from a pandas DataFrame, you’ll need to import the necessary pandas libraries and create a DataFrame instance. Here’s how you can do it:
import pandas as pd
df = pd.DataFrame({
"country": [
"United States",
"United Kingdom",
"France",
"Germany",
"Italy",
"Spain",
"Canada",
"Australia",
"Japan",
"China",
],
"gdp": [
19294482071552,
2891615567872,
2411255037952,
3435817336832,
1745433788416,
1181205135360,
1607402389504,
1490967855104,
4380756541440,
14631844184064,
],
"happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12],
})
Large Language Model (LLM)
Since PandasAI is powered by an LLM, you should import the LLM you’d like to use for your use case. Here, we will be using OpenAI, and you would need to import the relevant libraries and obtain an API token for OpenAI. Here are the steps to get an API token:
- Go to the OpenAI website: https://openai.com/api/
- Sign up using your email address or connect your Google Account.
- Once you’re logged in, navigate to “View API Keys” on the left side of your Personal Account Settings.
- Select “Create new Secret key” to generate a new API token.
After obtaining your API token, you can use it to authenticate and access the PandasAI library’s features powered by OpenAI’s language model.
# Import the OpenAI Module
from pandasai.llm import OpenAI
# Initialize the OpenAI Language Model (LLM):
llm = OpenAI(api_token="ENTER API TOKEN HERE")
# Create a SmartDataframe
sdf = SmartDataframe(df, config={"llm": llm})
Querying the Data
Example 1
Identify the top 5 countries based on their GDP.
print(sdf.chat("Return the top 5 countries by GDP"))
----------------- OUTPUT -----------------------
country gdp happiness_index
United States 19294482071552 6.94
China 14631844184064 5.12
Japan 4380756541440 5.87
Germany 3435817336832 7.07
United Kingdom 2891615567872 7.16
------------------------------------------------
Example 2
The sum of the GDP of the 2 unhappiest countries.
print(sdf.chat("What's the sum of the gdp of the 2 unhappiest countries?"))
----------------- OUTPUT -----------------------
19012600725504
------------------------------------------------
Example 3
Plot a chart of the GDP by Country.
sdf.chat("Plot a chart of the gdp by country")
----------------- OUTPUT -----------------------
Example 4
Plat a chart of the Happiness Index by country, and each country must have different colours
sdf.chat("Plot a chart of the happiness index by country. Each country must have different colors. The y-axis should be country")
----------------- OUTPUT -----------------------