Python Pandas: Check if all columns in rows value is NaN
Asked Answered
G

3

13

Kindly accept my apologies if my question has already been answered. I tried to find a solution but all I can find is to dropna solution for all NaN's in a dataframe. My question is that I have a dataframe with 6 columns and 500 rows. I need to check if in any particular row all the values are NaN so that I can drop them from my dataset. Example below row 2, 6 & 7 contains all Nan from col1 to col6:

    Col1    Col2    Col3    Col4    Col5    Col6
    12      25      02      78      88      90
    Nan     Nan     Nan     Nan     Nan     Nan
    Nan     35      03      11      65      53
    Nan     Nan     Nan     Nan     22      21
    Nan     15      93      111     165     153
    Nan     Nan     Nan     Nan     Nan     Nan
    Nan     Nan     Nan     Nan     Nan     Nan
    141     121     Nan     Nan     Nan     Nan

Please note that top row is just headings and from 2nd row on wards my data starts. Will be grateful if anyone can help me in right direction to solve this puzzle.

And also my 2nd question is that after deleting all Nan in all columns if I want to delete the rows where 4 or 5 columns data is missing then what will be the best solution.

and last question is, is it possible after deleting the rows with most Nan's then how can I create box plot on the remaining for example 450 rows?

Any response will be highly appreciated.

Regards,

Groundling answered 2/9, 2016 at 18:3 Comment(0)
S
11

I need to check if in any particular row all the values are NaN so that I can drop them from my dataset.

That's exactly what pd.DataFrame.dropna(how='all') does:

In [3]: df = pd.DataFrame({'a': [None, 1, None], 'b': [None, 1, 2]})

In [4]: df
Out[4]: 
     a    b
0  NaN  NaN
1  1.0  1.0
2  NaN  2.0

In [5]: df.dropna(how='all')
Out[5]: 
     a    b
1  1.0  1.0
2  NaN  2.0

Regarding your second question, pd.DataFrame.boxplot will do that. You can specify the columns you want (if needed), with the column parameter. See the example in the docs also.

Stylistic answered 2/9, 2016 at 18:11 Comment(5)
Hi Ami, Thanks for the reply. Actually at the moment I have 6 columns and 450 rows.Groundling
@Groundling Oh, right you are - erased that comment (it was not that important anyway).Stylistic
The second question was actually how to drop rows where 4 or 5 columns are missing data so another way to tackle the first and second questions would be to do df.dropna(thresh=2) to get rid of all columns that don't have at least 2 non-NaN valuesHerb
@Groundling If you write that part as an answer, I'll happily upvote it :-)Stylistic
What about dropping them from another dataset? Then dropna() doesn't help...Dedifferentiation
K
15

For those search because wish to know on the question title:

Check if all columns in rows value is NaN

A simple approach would be:

df[[list_of_cols_to_check]].isnull().apply(lambda x: all(x), axis=1) 

import pandas as pd
import numpy as np


df = pd.DataFrame({'movie': [np.nan, 'thg', 'mol', 'mol', 'lob', 'lob'],
                  'rating': [np.nan, 4., 5., np.nan, np.nan, np.nan],
                  'name':   ['John', np.nan, 'N/A', 'Graham', np.nan, np.nan]}) 
df.head()

enter image description here


To check if all columns is NaN:

cols_to_check = df.columns
df['is_na'] = df[cols_to_check].isnull().apply(lambda x: all(x), axis=1) 
df.head() 

enter image description here


To check if columns 'name', 'rating' are NaN:

cols_to_check = ['name', 'rating']
df['is_na'] = df[cols_to_check].isnull().apply(lambda x: all(x), axis=1) 
df.head()  

enter image description here

Klutz answered 12/3, 2018 at 7:0 Comment(1)
You could avoid .apply as df[cols_to_check].isnull().any(1) is doing exactly the same. On a 2M row dataframe with 3 cols_to_check the apply version took 25.4s while using any you just need 106ms.Processional
S
11

I need to check if in any particular row all the values are NaN so that I can drop them from my dataset.

That's exactly what pd.DataFrame.dropna(how='all') does:

In [3]: df = pd.DataFrame({'a': [None, 1, None], 'b': [None, 1, 2]})

In [4]: df
Out[4]: 
     a    b
0  NaN  NaN
1  1.0  1.0
2  NaN  2.0

In [5]: df.dropna(how='all')
Out[5]: 
     a    b
1  1.0  1.0
2  NaN  2.0

Regarding your second question, pd.DataFrame.boxplot will do that. You can specify the columns you want (if needed), with the column parameter. See the example in the docs also.

Stylistic answered 2/9, 2016 at 18:11 Comment(5)
Hi Ami, Thanks for the reply. Actually at the moment I have 6 columns and 450 rows.Groundling
@Groundling Oh, right you are - erased that comment (it was not that important anyway).Stylistic
The second question was actually how to drop rows where 4 or 5 columns are missing data so another way to tackle the first and second questions would be to do df.dropna(thresh=2) to get rid of all columns that don't have at least 2 non-NaN valuesHerb
@Groundling If you write that part as an answer, I'll happily upvote it :-)Stylistic
What about dropping them from another dataset? Then dropna() doesn't help...Dedifferentiation
M
2

Check if all columns in rows value is NaN

    #This gives you a boolean output if the df contains any row with all NaN values
    df.isnull().values.all()

The answer given by @Ami still holds. This check is useful when dealing with derived values, before dropping you might need to re-evaluate your feature extraction logic if any.

Milly answered 16/9, 2022 at 15:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.