pandas aggregate count in dataframe

Asked 16/1, 2017 at 17:51 Answered 24/6 at 15:13

Solved pandas indexing dataframe counting

I have a DataFrame and I am using .aggregate({'col1': np.sum}), this will perform a summation of the values in col1 and aggregate them together. Is it possible to perform a count, something like .aggregate({'col1': some count function here})?

Phenoxide answered 16/1, 2017 at 17:51 Comment(3)

{'col1': 'count'} or {'col1': 'size'} or {'col1': 'nunique'} depending on your use case. – Crick 16/1, 2017 at 17:53

Or len (the built-in), which I suggest is the most readable of the bunch. – Chemism 16/1, 2017 at 17:57

len is typically slower than 'size', as it's a python built-in instead of numpy under the hood. – Crick 16/1, 2017 at 18:6

You can use 'size', 'count', or 'nunique' depending on your use case. The differences between them being:

'size': the count including NaN and repeat values.
'count': the count excluding NaN but including repeats.
'nunique': the count of unique values, excluding repeats and NaN.

For example, consider the following DataFrame:

df = pd.DataFrame({'col0': list('aabbcc'), 'col1': [1, 1, 2, np.nan, 3, 4]})

  col0  col1
0    a   1.0
1    a   1.0
2    b   2.0
3    b   NaN
4    c   3.0
5    c   4.0

Then using the three functions described:

df.groupby('col0')['col1'].agg(['size', 'count', 'nunique'])

      size  count  nunique
col0                      
a        2      2        1
b        2      1        1
c        2      2        2

Crick answered 16/1, 2017 at 18:2 Comment(0)

I found something handy and easy to read as well

(
df
.groupby('col0')
.agg(total=('col1','count'))
)

so, as you can see we can even put the column and name the column as well

Laze answered 24/6 at 15:13 Comment(0)

Recommended topics

Hot tags