list_of_values
doesn't have to be a list
; it can be set
, tuple
, dictionary
, numpy array, pandas Series, generator, range
etc. and isin()
and query()
will still work.
A note on query()
:
- You can also call
isin()
inside query()
:
list_of_values = [3, 6]
df.query("A.isin(@list_of_values)")
- You can pass a values to search over as a
local_dict
argument, which is useful if you don't want to create the filtering list beforehand in a chain of function calls:
df.query("A == @lst", local_dict={'lst': [3, 6]})
Some common problems with selecting rows
1. list_of_values
is a range
If you need to filter within a range, you can use between()
method or query()
.
list_of_values = [3, 4, 5, 6] # a range of values
df[df['A'].between(3, 6)] # or
df.query('3<=A<=6')
2. Return df
in the order of list_of_values
In the OP, the values in list_of_values
don't appear in that order in df
. If you want df
to return in the order they appear in list_of_values
, i.e. "sort" by list_of_values
, use loc
.
list_of_values = [3, 6]
df.set_index('A').loc[list_of_values].reset_index()
If you want to retain the old index, you can use the following.
list_of_values = [3, 6, 3]
df.reset_index().set_index('A').loc[list_of_values].reset_index().set_index('index').rename_axis(None)
3. Don't use apply
In general, isin()
and query()
are the best methods for this task; there's no need for apply()
. For example, for function f(A) = 2*A - 5
on column A
, both isin()
and query()
work much more efficiently:
df[(2*df['A']-5).isin(list_of_values)] # or
df[df['A'].mul(2).sub(5).isin(list_of_values)] # or
df.query("A.mul(2).sub(5) in @list_of_values")
4. Select rows not in list_of_values
To select rows not in list_of_values
, negate isin()
/in
:
df[~df['A'].isin(list_of_values)]
df.query("A not in @list_of_values") # df.query("A != @list_of_values")
5. Select rows where multiple columns are in list_of_values
If you want to filter using both (or multiple) columns, there's any()
and all()
to reduce columns (axis=1
) depending on the need.
- Select rows where at least one of
A
or B
is in list_of_values
:
df[df[['A','B']].isin(list_of_values).any(1)]
df.query("A in @list_of_values or B in @list_of_values")
- Select rows where both of
A
and B
are in list_of_values
:
df[df[['A','B']].isin(list_of_values).all(1)]
df.query("A in @list_of_values and B in @list_of_values")