Company
Date Published
Sept. 7, 2022
Author
Justin Gage
Word count
1079
Language
English
Hacker News points
None

Summary

This post discusses three methods for querying data in a Pandas DataFrame using SQL, with the assumption that the reader already has a DataFrame ready to go. The first method involves using the pandasql package, which is a simple wrapper that converts DataFrames into SQLite and allows them to be queried with SQL. Results are automatically returned as a Pandas DataFrame. The second method uses DuckDB, an in-process OLAP system designed for analytics workloads. It works natively with Pandas DataFrames and is optimized for speed when compared to SQLite. The main difference between pandasql and DuckDB is that the latter requires explicit conversion of results to a DataFrame. The third method involves using the .query() function in Pandas, which is not exactly SQL but can make some basic queries easier. It's a simple WHERE or .filter() equivalent. The query syntax is modified Python without many specifics, and it can be used for more detailed information on Pandas parsers and numexpr.