Count number of non-NaN entries in every column of Dataframe
Asked Answered
B

4

129

I have a massive DataFrame, and I was wondering if there was a short (one or two liner) way to get a count of non-NaN entries in a DataFrame. I don't want to do this one column at a time as I have close to 1000 columns.

df1 = pd.DataFrame([(1,2,None),(None,4,None),(5,None,7),(5,None,None)], 
                    columns=['a','b','d'], index = ['A', 'B','C','D'])

    a   b   d
A   1   2 NaN
B NaN   4 NaN
C   5 NaN   7
D   5 NaN NaN

Output:

a: 3
b: 2
d: 1
Bouldin answered 30/4, 2015 at 14:57 Comment(3)
df1[df1.notnull()].count() this seem to have workedBouldin
The extra indexing with df1.notnull() is not necessary since count ignores null values anyway.Welldefined
Unlike series.value_counts(..., dropna=False), there is no option on df.count() to directly get NA counts.Ogham
W
209

The count() method returns the number of non-NaN values in each column:

>>> df1.count()
a    3
b    2
d    1
dtype: int64

Similarly, count(axis=1) returns the number of non-NaN values in each row.

Welldefined answered 30/4, 2015 at 15:1 Comment(6)
I dont believe that works if the column has stringsLowlife
@DISC-O: just tried and it works for me (pandas version 1.2.1). E.g. df = pd.DataFrame({"a": ["x", np.nan, "z"]}) then df.count() produces the expected value 2. Do you have an example where this method does not work?Welldefined
yes, if you manually create a df and place the np.nan it could work I guess. But that is not how you typically create your columns. One often used way, by me at least is: df['C'] =np.where(df.A>df.B,'some text',np.nan). This turns the np.nan into 'nan' and is no longer recognized as nan.Lowlife
I have a column with None values and this doesnt workEggbeater
@Lowlife (very late reply, apologies) - in that example you don't end up with any NaN values in the column (you have a column of string values) so the .count() method works as intended. Some NumPy methods, especially with strings, don't fit well with pandas and that's one of them so it's better to use pandas methods like df["C"] = (df.A > df.B).map({True: 'some text', False: np.nan}) instead.Welldefined
@West: .count() should treat None as a null value and count it - happy to debug if you give an example of what you're seeing.Welldefined
J
8

If you want to sum the total count values which are not NAN, one can do;

np.sum(df.count())
Jacobjacoba answered 31/1, 2020 at 21:49 Comment(0)
M
4

In case you are dealing with empty strings you may want to count them as NA as well :

df.replace('', np.nan).count()

or if you also want to remove blank strings :

df.replace(r'^\s*$', np.nan, regex=True).count()
Morez answered 17/2, 2021 at 17:47 Comment(0)
D
4

You can use methods notna / notnull and sum:

df.notna().sum()

Output:

a    3
b    2
d    1
dtype: int64
Dingy answered 18/10, 2022 at 17:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.