How to convert a select query into a pandas DataFrame using PeeWee
Asked Answered
W

3

17

Using the PeeWee ORM I have the following query:

query = DataModel.select()where(DataModel.field == "value")

Is there any way to convert query into a pandas DataFrame without iterating over all the values? I'm looking for a more "Pythonic" way of doing this.

Whorled answered 4/3, 2017 at 12:47 Comment(0)
A
38

Assuming query is of type peewee.SelectQuery, you could do:

df = pd.DataFrame(list(query.dicts()))

EDIT: As Nicola points out below, you're now able to do pd.DataFrame(query.dicts()) directly.

Arianna answered 4/3, 2017 at 16:15 Comment(3)
Brilliantly simple, and works like a charm! I won't tell you how long I spent trying to figure that one out, it's embarrassing. lolWhorled
for some reason list(query.dicts()) is failing if there is a column that has the name of the table... has anybody experienced the same issue?Leucoma
Don't use list(), the correct code is: pd.DataFrame(query.dicts())Rafter
P
6

Just in case someone finds this useful, I was searching for the same conversion but in Python 3. Inspired by @toto_tico's previous answer, this is what I came up with:

import pandas
import peewee


def data_frame_from_peewee_query(query: peewee.Query) -> pandas.DataFrame:
    connection = query._database.connection()  # noqa
    sql, params = query.sql()
    return pandas.read_sql_query(sql, connection, params=params)

Checked with Python 3.9.6, pandas==1.3.2 and peewee==3.14.4, using peewee.SqliteDatabase.

Pastime answered 27/8, 2021 at 1:38 Comment(1)
Thanks a lot for writing this out — this is 100% the correct answer.Oblast
L
4

The following is a more efficient way, because it avoids creating the list and then pass it to the pandas dataframe. It also has the side benefit of preserving the order of the columns:

df = pd.read_sql(query.sql()[0], database.connection())

You need direct access to the peewee database, for example, in the quickstart tutorial corresponds to:

db = SqliteDatabase('people.db')

Of course, you can also create your own connection to the database.

Drawback: you should be careful if you have repeated columns in the two tables, e.g. id columns would appear twice. So make sure to correct those before continuing.


If you are using a peewee proxy

import peewee as pw; 
database_proxy = pw.Proxy()

then the connection is here:

database_proxy.obj.connection()
Leucoma answered 9/5, 2018 at 13:53 Comment(3)
How do you deal with the fact that query.sql()[0] returns a string with %s as placeholders?Breedlove
I would have to take a look at your exact query but in generally it could mean that there is a format-string python syntax, check thisLeucoma
@Breedlove It isn't a complete solution but getting a cursor and using mogrify works for postgres: cursor = db.cursor() then pd.read_sql(cursor.mogrify(*query.sql()), ...)Stafford

© 2022 - 2024 — McMap. All rights reserved.