/plushcap/analysis/hex/how-to-write-sql-in-pandas

How to write SQL in Python with Pandas

What's this blog post about?

This post discusses three methods for querying data in a Pandas DataFrame using SQL, with the assumption that the reader already has a DataFrame ready to go. The first method involves using the pandasql package, which is a simple wrapper that converts DataFrames into SQLite and allows them to be queried with SQL. Results are automatically returned as a Pandas DataFrame. The second method uses DuckDB, an in-process OLAP system designed for analytics workloads. It works natively with Pandas DataFrames and is optimized for speed when compared to SQLite. The main difference between pandasql and DuckDB is that the latter requires explicit conversion of results to a DataFrame. The third method involves using the .query() function in Pandas, which is not exactly SQL but can make some basic queries easier. It's a simple WHERE or .filter() equivalent. The query syntax is modified Python without many specifics, and it can be used for more detailed information on Pandas parsers and numexpr.

Company
Hex

Date published
Sept. 7, 2022

Author(s)
Justin Gage

Word count
1079

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.