How to Use SQL in Pandas using pandasql Queries?

Dec-06-2023
Techsolutionstuff
MySQL

Data manipulation might sound complicated, but it is not actually! If you're familiar with SQL and want to work with Pandas, there's a fantastic thing for you: pandasql. We are here to help you connect the dots between SQL and Pandas in a straightforward way.

Throughout this blog, we'll guide you through pandasql, showing you how to apply SQL-style commands to filter, merge, and manage your data in Pandas. It doesn't matter if you're a data enthusiast, analyst, or just someone keen to learn – we'll make using SQL in Pandas a breeze.

Get ready to explore the best of both worlds and supercharge your data skills!

How to Use pandasql?

The pandasql Python library helps you query pandas dataframes using SQL commands, without connecting to an SQL server. It operates using SQLite syntax, automatically identifying any pandas dataframe and treating it as a standard SQL table.

Set Up the Environment

First, install pandasql:

pip install pandasql

Then, import the required packages:

from pandasql import sqldf
import pandas as pd

We imported the 'sqldf' function directly from 'pandasql', and it's essentially the library's most meaningful function. As the name suggests, it allows you to query dataframes using SQL syntax. Besides this function, 'pandasql' includes two straightforward built-in datasets that can be loaded using the self-explanatory functions: 'load_births()' and 'load_meat'.

pandasql Syntax

sqldf(query, env=None)

Here, the "query" parameter is important, and it takes a SQL query as a string. Additionally, there is an optional parameter "env," which is rarely used and can take on the values of either "locals()" or "globals()." This parameter allows sqldf to access the corresponding set of variables in your Python environment.

The sqldf function responds to the query as a pandas dataframe.

Examples of Using pandasql

Now, let’s take a more detailed look at running SQL queries on pandas dataframes using the sqldf function. For practice, let's use a built-in dataset of the seaborn library—penguins:

import seaborn as sns
penguins = sns.load_dataset('penguins')
print(penguins.head())

Output:

No.	Species	Island	bill_length_mm	bill_depth_mm	flipper_length_mm
0	Adelie	Torgersen	39.1	18.7	181.0
1	Adelie	Torgersen	39.5	17.4	186.0
2	Adelie	Torgersen	40.3	18.0	195.0
3	Adelie	Torgersen	NA	NA	NA
4	Adelie	Torgersen	36.7	19.3	193.0

No.	body_mass_g	sex
0	3750.0	Male
1	3800.0	Female
2	3250.0	Female
3	NA	NA
4	3450.0	Female

Extracting Data With pandasql

print(sqldf('''SELECT species, island 
FROM penguins 
LIMIT 5'''))

Output:

No.	Species	Island
0	Adelie	Torgersen
1	Adelie	Torgersen
2	Adelie	Torgersen
3	Adelie	Torgersen
4	Adelie	Torgersen

Here, we have extracted information about the species and location of the first five penguins from the penguins dataframe. Note that running the sqldf function returns a pandas dataframe:

print(type(sqldf('''SELECT species, island 
                      FROM penguins 
                     LIMIT 5''')))

Output:

<class 'pandas.core.frame.DataFrame'>

In pure pandas, it would be:

print(penguins[['species', 'island']].head())

Sorting Data With pandasql

print(sqldf('''SELECT body_mass_g 
                 FROM penguins 
                ORDER BY body_mass_g DESC 
                LIMIT 5''')

Output:

No.	body_mass_g
0	6300.0
1	6050.0
2	6050.0
3	6000.0
4	5950.0

We sorted our penguins by body mass in descending order and displayed the top five values of body mass.

In pandas, it would be:

print(penguins['body_mass_g'].sort_values(ascending=False, 
ignore_index=True).head())

Output:

0	6300.0
1	6050.0
2	6000.0
3	6000.0
4	5950.0

Name: body_mass_g, dtype: float64

Filtering Data With pandasql

Let’s say we have a group of penguins who are male and have flippers longer than 210 mm:

print(sqldf('''SELECT DISTINCT species
                 FROM penguins 
                WHERE sex = 'Male' 
                  AND flipper_length_mm > 210'''))

Output:
species
0 Chinstrap
1 Gentoo

Here, we filtered the data with these two conditions: sex = 'Male' and flipper_length_mm > 210.

The same code in pandas would look a bit more overwhelming:

print(penguins[(penguins['sex'] == 'Male') & (penguins['flipper_length_mm'] > 210)]['species'].unique())

Output:

['Chinstrap' 'Gentoo']

Grouping and Aggregating Data With pandasql

Now, let's try data grouping and aggregation to find each species’ longest bill in the dataframe:

print(sqldf('''SELECT species, MAX(bill_length_mm)
                 FROM penguins 
                GROUP BY species'''))

Output:

No.	species	MAX(bill_length_mm)
0	Adelie	46.0
1	Chinstrap	58.0
2	Gentoo	59.6

Same code in pandas:

print(penguins[['species', 'bill_length_mm']].groupby('species', as_index=False).max())

Output:

No.	species	bill_length_mm
0	Adelie	46.0
1	Chinstrap	58.0
2	Gentoo	59.6

Conclusion

Using SQL within Pandas via the pandasql library is an efficient way to handle data manipulation. The seamless integration of SQL queries with Pandas dataframes empowers users to employ their SQL expertise within the Python environment.

This approach simplifies complex data operations, creating succinct, readable code that enhances data analysis workflows. With the power of SQL and pandasql, you can perform diverse operations with ease, from filtering and aggregating data to handling complex joins.

Ultimately, SQL and Pandas improve the analytical capabilities, creating a smoother and more powerful data processing experience.

You might also like:

Techsolutionstuff | The Complete Guide

I'm a software engineer and the founder of techsolutionstuff.com. Hailing from India, I craft articles, tutorials, tricks, and tips to aid developers. Explore Laravel, PHP, MySQL, jQuery, Bootstrap, Node.js, Vue.js, and AngularJS in our tech stack.

FEATURE POSTS

How To Integrate Stripe Payment Gateway In Laravel 8

How to Use SQL in Pandas using pandasql Queries?

Techsolutionstuff | The Complete Guide

RECOMMENDED POSTS

How to Install PHP 8 on Ubuntu

A Beginner's Guide to Using Date and Time in Node.js

How To Create AJAX CRUD Operation In Laravel 10

How To Install Tailwind CSS In React JS App

How To Generate PDF File In Node JS Using PDFKit

How to Create Migration in Laravel 10

How To Convert HTML To PDF In Node JS Using Puppeteer

How To Create PDF File In React JS

How To Validate Radio Button In jQuery

Laravel 10 Store JSON Data In Database Example

FEATURE POSTS

How To Integrate Stripe Payment Gateway In Laravel 8

How To Get Data Between Two Dates In Laravel 10

Laravel 11 Create Dynamic Event Calendar with FullCalendar

Laravel 8 Highcharts Example Tutorial

How To Check Email Already Exist Or Not In Laravel

Laravel 11 Real Time Chat Using Reverb and Vue

CATEGORIES

TOOLS

Online Json Formatter and Viewer

Online Password Generator

Online Character Counter

Online CSS Minifier / Compressor

Online CSS Beautifier / Formatter

POPULAR TAGS

RECENT POSTS

How to Create Dynamic Multi Line C...

How to Create Dynamic Donut Chart...

Laravel 11 Real Time Chat Using Re...

How to Display Unsaved Changes Bro...

Laravel 11 PayPal Payment Gateway...

How to Logout from Other Devices i...

Laravel 11 Find Nearest Location B...

Laravel 11 Eloquent Model Search Q...

How to Install Multiple PHP Versio...

How to Create Realtime Search in L...

How to Use SQL in Pandas using pandasql Queries?

Techsolutionstuff | The Complete Guide

RECOMMENDED POSTS

FEATURE POSTS

CATEGORIES

TOOLS

POPULAR TAGS

RECENT POSTS

SUBSCRIBE YOUR EMAIL ADDRESS