Pandas best way to subset a dataframe inplace, using a mask

Asked 13/10, 2015 at 13:29 Answered 6/9, 2021 at 7:46

I have a pandas dataset that I want to downsize (remove all values under x).

The mask is df[my_column] > 50

I would typically just use df = df[mask], but want to avoid making a copy every time, particularly because it gets error prone when used in functions (as it only gets altered in the function scope).

What is the best way to subset a dataset inplace?

I was thinking of something along the lines of
df.drop(df.loc[mask].index, inplace = True)

Is there a better way to do this, or any situation where this won't work at all?

Huffman answered 13/10, 2015 at 13:29 Comment(6)

You mean view = df.loc[df[my_column] > 50]? – Marrowbone 13/10, 2015 at 13:37

I'm always confused by the view vs copy thing in pandas. Essentially I want to give it a condition to drop, and drop inplace. The df.loc[mask].index will give the me indexes to drop, correct? – Huffman 13/10, 2015 at 13:39

Sorry what's wrong with df = df[mask]? this will eventually recover the memory for the dropped rows? – Marrowbone 13/10, 2015 at 13:40

Well mask itself is a boolean index – Marrowbone 13/10, 2015 at 13:40

More error prone, and when used in functions makes a "local" copy, which then has to be returned. I want to do a few alterations in place, not just for memory purposes. df.drop(df.loc[mask].index, inplace = True) seems to work, but I expect there might be a better solution (as mine will probably fail on multi-level indexes etc) – Huffman 13/10, 2015 at 13:42

Not sure what you mean by makes a 'local' copy. I'd define df as a global variable, OR make it a class instance. Passing df as an argument to a bunch of functions and then doing changes to df is indeed error-prone. – Hasa 13/10, 2015 at 14:11

You are missing the inplace parameter :

df.drop(df[df.my_column < 50].index, inplace = True)

Bergstein answered 7/6, 2019 at 14:54 Comment(2)

I think you want <= 50 in the mask to drop, since the OP wanted to keep values > 50. – Lounge 7/5, 2022 at 18:46

Is there a method that does the opposite of drop? filter ? – Irvin 28/4, 2023 at 16:47

you can use df.query()

like:

bool_series = df[my_column] > 50
df.query("@bool_series",inplace=True)

Laryssa answered 6/9, 2021 at 7:46 Comment(0)

-1

I think this works. Maybe there are better ways?

df = df.drop(df[df.my_column < 50].index)

Bradlybradman answered 13/10, 2015 at 14:28 Comment(2)

It would still be a copy and replace, but I'm curious as to why you avoided iloc – Huffman 17/10, 2015 at 13:21

pandas.pydata.org/pandas-docs/stable/generated/… drop has an inplace flag – Plunger 12/9, 2018 at 15:2

Recommended topics

Hot tags