Pandas best way to subset a dataframe inplace, using a mask
Asked Answered
H

3

16

I have a pandas dataset that I want to downsize (remove all values under x).

The mask is df[my_column] > 50

I would typically just use df = df[mask], but want to avoid making a copy every time, particularly because it gets error prone when used in functions (as it only gets altered in the function scope).

What is the best way to subset a dataset inplace?

I was thinking of something along the lines of
df.drop(df.loc[mask].index, inplace = True)

Is there a better way to do this, or any situation where this won't work at all?

Huffman answered 13/10, 2015 at 13:29 Comment(6)
You mean view = df.loc[df[my_column] > 50]?Marrowbone
I'm always confused by the view vs copy thing in pandas. Essentially I want to give it a condition to drop, and drop inplace. The df.loc[mask].index will give the me indexes to drop, correct?Huffman
Sorry what's wrong with df = df[mask]? this will eventually recover the memory for the dropped rows?Marrowbone
Well mask itself is a boolean indexMarrowbone
More error prone, and when used in functions makes a "local" copy, which then has to be returned. I want to do a few alterations in place, not just for memory purposes. df.drop(df.loc[mask].index, inplace = True) seems to work, but I expect there might be a better solution (as mine will probably fail on multi-level indexes etc)Huffman
Not sure what you mean by makes a 'local' copy. I'd define df as a global variable, OR make it a class instance. Passing df as an argument to a bunch of functions and then doing changes to df is indeed error-prone.Hasa
B
19

You are missing the inplace parameter :

df.drop(df[df.my_column < 50].index, inplace = True)

Bergstein answered 7/6, 2019 at 14:54 Comment(2)
I think you want <= 50 in the mask to drop, since the OP wanted to keep values > 50.Lounge
Is there a method that does the opposite of drop? filter ?Irvin
L
3

you can use df.query()

like:

bool_series = df[my_column] > 50
df.query("@bool_series",inplace=True)
Laryssa answered 6/9, 2021 at 7:46 Comment(0)
B
-1

I think this works. Maybe there are better ways?

df = df.drop(df[df.my_column < 50].index)

Bradlybradman answered 13/10, 2015 at 14:28 Comment(2)
It would still be a copy and replace, but I'm curious as to why you avoided ilocHuffman
pandas.pydata.org/pandas-docs/stable/generated/… drop has an inplace flagPlunger

© 2022 - 2025 — McMap. All rights reserved.