Pandas - Delete Rows with only NaN values

Asked 5/8, 2014 at 18:56 Answered 5/8, 2014 at 19:15

I have a DataFrame containing many NaN values. I want to delete rows that contain too many NaN values; specifically: 7 or more.

I tried using the dropna function several ways but it seems clear that it greedily deletes columns or rows that contain any NaN values.

This question (Slice Pandas DataFrame by Row), shows me that if I can just compile a list of the rows that have too many NaN values, I can delete them all with a simple

df.drop(rows)

I know I can count non-null values using the count function which I could them subtract from the total and get the NaN count that way (Is there a direct way to count NaN values in a row?). But even so, I am not sure how to write a loop that goes through a DataFrame row-by-row.

Here's some pseudo-code that I think is on the right track:

### LOOP FOR ADDRESSING EACH row:
    m = total - row.count()
    if (m > 7):
        df.drop(row)

I am still new to Pandas so I'm very open to other ways of solving this problem; whether they're simpler or more complex.

Toffic answered 5/8, 2014 at 18:56 Comment(4)

There is a thresh param to specify the minimum number of non-NA values: pandas.pydata.org/pandas-docs/stable/generated/… have you tried this? – Prehension 5/8, 2014 at 19:7

I had not noticed that, thank you. It suits my needs perfectly. – Toffic 5/8, 2014 at 19:12

df.dropna(thresh=3) was all I needed (there are 9 columns in the dataframe) – Toffic 5/8, 2014 at 19:25

I thought I'd put a dynamic method in my answer in the case where you don't the number of columns, glad I could help – Prehension 5/8, 2014 at 19:26

Basically the way to do this is determine the number of cols, set the minimum number of non-nan values and drop the rows that don't meet this criteria:

df.dropna(thresh=(len(df) - 7))

See the docs

Prehension answered 5/8, 2014 at 19:15 Comment(5)

I had to use len(df.columns) instead of len(df). Worked like a charm. – Ogden 1/9, 2015 at 15:26

Doesn't axis=1 tells it to drop columns? At least in my case columns get deleted when I choose axis=1 – Marinetti 22/2, 2016 at 17:35

@Marinetti it depends on the function, in this case it's the opposite – Prehension 22/2, 2016 at 17:48

axis=1 will drop the columns, not the rows. "{0 or ‘index’, 1 or ‘columns’}" straight from the docs. – Manolete 14/7, 2016 at 19:7

@PaulEnglish You're correct, I'm not sure if this was due to an error in the docs historically or if I was confusing this with drop which does flip the expected meaning of axis, will update and thanks for pointing this out – Prehension 15/7, 2016 at 8:46

The optional thresh argument of df.dropna lets you give it the minimum number of non-NA values in order to keep the row.

df.dropna(thresh=df.shape[1]-7)

Orelia answered 5/8, 2014 at 19:14 Comment(1)

df.dropna(thresh=2, , inplace=True) # drop extra lines w/o 2 valid values this was a little more simple and worked perfectly for my application. – Mieshamiett 9/5, 2019 at 20:36

Recommended topics

Hot tags