I aggregate my Pandas dataframe: data
. Specifically, I want to get the average and sum amount
s by tuples of [origin
and type
]. For averaging and summing I tried the numpy functions below:
import numpy as np
import pandas as pd
result = data.groupby(groupbyvars).agg({'amount': [ pd.Series.sum, pd.Series.mean]}).reset_index()
My issue is that the amount
column includes NaN
s, which causes the result
of the above code to have a lot of NaN
average and sums.
I know both pd.Series.sum
and pd.Series.mean
have skipna=True
by default, so why am I still getting NaN
s here?
I also tried this, which obviously did not work:
data.groupby(groupbyvars).agg({'amount': [ pd.Series.sum(skipna=True), pd.Series.mean(skipna=True)]}).reset_index()
EDIT:
Upon @Korem's suggestion, I also tried to use a partial
as below:
s_na_mean = partial(pd.Series.mean, skipna = True)
data.groupby(groupbyvars).agg({'amount': [ np.nansum, s_na_mean ]}).reset_index()
but get this error:
error: 'functools.partial' object has no attribute '__name__'
pd.Series.sum
- just use'sum'
- the code should take a faster path. – Langfordpd.Series.sum
jus becasue it had askipna
option. Reading @Korem's answer, I now usenp.nansum
. Butnp.nanmean
is not available in my version (1.7.1) of numpy.I will try to post representative data, which may take a while. – Heisenberg