pandas dataframe multiline query
Asked Answered
S

3

18

Say I have a dataframe

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randint(10, size=(10,3)), columns=['a', 'b', 'c'])

if I now try to query it using the query method:

this works:

df.query('''a > 3 and b < 9''')

this throws an error:

df.query(
    '''
        a > 3 and
        b < 9
    '''
)

I tried many variations of multiline strings but the result is always the following error:

~/ven/lib/python3.6/site-packages/pandas/core/computation/eval.py in eval(expr, parser, engine, truediv, local_dict, global_dict, resolvers, level, target, inplace)
    306     if multi_line and target is None:
    307         raise ValueError(
--> 308             "multi-line expressions are only valid in the "
    309             "context of data, use DataFrame.eval"
    310         )

ValueError: multi-line expressions are only valid in the context of data, use DataFrame.eval

Does anyone know how to make it work? The problem is that in reality I have a very long query to do and it would be very inconvenient having to write all in one line. I know I could use boolean indexing instead but my question is only about how to use multiline with the query method.

Thank you

Scallop answered 26/8, 2020 at 12:16 Comment(0)
S
19

Use multi-line char backslash ( \ )

Ex:

df = pd.DataFrame(np.random.randint(10, size=(10,3)), columns=['a', 'b', 'c'])
print(df.query(
    '''
        a > 3 and \
        b < 9
    '''
))
Simeon answered 26/8, 2020 at 12:28 Comment(0)
P
4

You can remove the new line character \n to allow multiline query

query_multiline = '''
  a > 3 and
  b < 9
'''

query_multiline = query_multiline.replace('\n', '')

df.query(query_multiline)    
Petard answered 20/10, 2021 at 0:32 Comment(1)
For non-trivial, multi-line queries, this solution is superior (IMHO) to escaping the newlines with backslashes as shown in the other answer (with more up-votes currently). Two reasons: 1) it's tedious and error-prone to edit your existing query, escaping all the potentially many newlines, and 2) adding a slew of ` \ ` can obfuscate your nicely formatted multi-line query, possibly impacting maintenance. Save yourself the effort, keep your query nice-looking, and do it programmatically with query.replace('\n', '')!Cyclades
M
0

I was facing same error. I wanted to delete records which has id field common in both dataframes.

Suppose dataframe A is

    id  name
0   1   a
1   2   b

and dataframe B is

    id  name
0   1   a
2   3   c

I wanted to remove common id from both dataframe, so you will have to merge both dataframes and convert id of merged to list.

merged = pd.merge(A, B, how='inner', on='id')
final_res = A.query(f"id in {list(merged['id'])}")
Mireyamiriam answered 27/5 at 9:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.