Count number of non-NaN entries in every column of Dataframe

Asked 30/4, 2015 at 14:57 Answered 18/10, 2022 at 17:54

129

I have a massive DataFrame, and I was wondering if there was a short (one or two liner) way to get a count of non-NaN entries in a DataFrame. I don't want to do this one column at a time as I have close to 1000 columns.

df1 = pd.DataFrame([(1,2,None),(None,4,None),(5,None,7),(5,None,None)], 
                    columns=['a','b','d'], index = ['A', 'B','C','D'])

    a   b   d
A   1   2 NaN
B NaN   4 NaN
C   5 NaN   7
D   5 NaN NaN

Output:

a: 3
b: 2
d: 1

Bouldin answered 30/4, 2015 at 14:57 Comment(3)

df1[df1.notnull()].count() this seem to have worked – Bouldin 30/4, 2015 at 15:2

The extra indexing with df1.notnull() is not necessary since count ignores null values anyway. – Welldefined 30/4, 2015 at 15:11

Unlike series.value_counts(..., dropna=False), there is no option on df.count() to directly get NA counts. – Ogham 17/11, 2016 at 6:36

209

The count() method returns the number of non-NaN values in each column:

>>> df1.count()
a    3
b    2
d    1
dtype: int64

Similarly, count(axis=1) returns the number of non-NaN values in each row.

Welldefined answered 30/4, 2015 at 15:1 Comment(6)

I dont believe that works if the column has strings – Lowlife 15/7, 2021 at 19:30

@DISC-O: just tried and it works for me (pandas version 1.2.1). E.g. df = pd.DataFrame({"a": ["x", np.nan, "z"]}) then df.count() produces the expected value 2. Do you have an example where this method does not work? – Welldefined 15/7, 2021 at 21:1

yes, if you manually create a df and place the np.nan it could work I guess. But that is not how you typically create your columns. One often used way, by me at least is: df['C'] =np.where(df.A>df.B,'some text',np.nan). This turns the np.nan into 'nan' and is no longer recognized as nan. – Lowlife 17/7, 2021 at 0:3

I have a column with None values and this doesnt work – Eggbeater 10/11, 2022 at 9:20

@Lowlife (very late reply, apologies) - in that example you don't end up with any NaN values in the column (you have a column of string values) so the .count() method works as intended. Some NumPy methods, especially with strings, don't fit well with pandas and that's one of them so it's better to use pandas methods like df["C"] = (df.A > df.B).map({True: 'some text', False: np.nan}) instead. – Welldefined 10/11, 2022 at 11:32

@West: .count() should treat None as a null value and count it - happy to debug if you give an example of what you're seeing. – Welldefined 10/11, 2022 at 11:33

If you want to sum the total count values which are not NAN, one can do;

np.sum(df.count())

Jacobjacoba answered 31/1, 2020 at 21:49 Comment(0)

In case you are dealing with empty strings you may want to count them as NA as well :

df.replace('', np.nan).count()

or if you also want to remove blank strings :

df.replace(r'^\s*$', np.nan, regex=True).count()

Morez answered 17/2, 2021 at 17:47 Comment(0)

You can use methods notna / notnull and sum:

df.notna().sum()

Output:

a    3
b    2
d    1
dtype: int64

Dingy answered 18/10, 2022 at 17:54 Comment(0)

Recommended topics

Hot tags