Pandas - Delete Rows with only NaN values
Asked Answered
T

2

16

I have a DataFrame containing many NaN values. I want to delete rows that contain too many NaN values; specifically: 7 or more.

I tried using the dropna function several ways but it seems clear that it greedily deletes columns or rows that contain any NaN values.

This question (Slice Pandas DataFrame by Row), shows me that if I can just compile a list of the rows that have too many NaN values, I can delete them all with a simple

df.drop(rows)

I know I can count non-null values using the count function which I could them subtract from the total and get the NaN count that way (Is there a direct way to count NaN values in a row?). But even so, I am not sure how to write a loop that goes through a DataFrame row-by-row.

Here's some pseudo-code that I think is on the right track:

### LOOP FOR ADDRESSING EACH row:
    m = total - row.count()
    if (m > 7):
        df.drop(row)

I am still new to Pandas so I'm very open to other ways of solving this problem; whether they're simpler or more complex.

Toffic answered 5/8, 2014 at 18:56 Comment(4)
There is a thresh param to specify the minimum number of non-NA values: pandas.pydata.org/pandas-docs/stable/generated/… have you tried this?Prehension
I had not noticed that, thank you. It suits my needs perfectly.Toffic
df.dropna(thresh=3) was all I needed (there are 9 columns in the dataframe)Toffic
I thought I'd put a dynamic method in my answer in the case where you don't the number of columns, glad I could helpPrehension
P
16

Basically the way to do this is determine the number of cols, set the minimum number of non-nan values and drop the rows that don't meet this criteria:

df.dropna(thresh=(len(df) - 7))

See the docs

Prehension answered 5/8, 2014 at 19:15 Comment(5)
I had to use len(df.columns) instead of len(df). Worked like a charm.Ogden
Doesn't axis=1 tells it to drop columns? At least in my case columns get deleted when I choose axis=1Marinetti
@Marinetti it depends on the function, in this case it's the oppositePrehension
axis=1 will drop the columns, not the rows. "{0 or ‘index’, 1 or ‘columns’}" straight from the docs.Manolete
@PaulEnglish You're correct, I'm not sure if this was due to an error in the docs historically or if I was confusing this with drop which does flip the expected meaning of axis, will update and thanks for pointing this outPrehension
O
5

The optional thresh argument of df.dropna lets you give it the minimum number of non-NA values in order to keep the row.

df.dropna(thresh=df.shape[1]-7)
Orelia answered 5/8, 2014 at 19:14 Comment(1)
df.dropna(thresh=2, , inplace=True) # drop extra lines w/o 2 valid values this was a little more simple and worked perfectly for my application.Mieshamiett

© 2022 - 2024 — McMap. All rights reserved.