While others were fast to give the one-liner code snippet, I try to explain a bit more about kind of options you have, and what is the syntax understood by the pandas agg()
function.
What are we dealing with?
The type of the object you are dealing with is
type(df.set_index('STNAME').groupby(level=0)[['POPESTIMATE2010','POPESTIMATE2011']])
# pandas.core.groupby.generic.DataFrameGroupBy
Therefore, it might be a good start to time look at the documentation of the pandas.DataFrameGroupby.agg
What do the docs say?
The argument given to agg
can be
(1) string (function name)
(2) function
(3) list of functions
(4) dict of column names -> functions (or list of functions).
(1) string (function name)
If you give function name string as the argument, it must be a "function name that pandas understands". Understood function names are at least: 'sum','mean','std'
. Example:
In [24]: df.set_index('STNAME').groupby(level=0)[['POPESTIMATE2010','POPESTIMATE2011']].agg('mean')
Out[24]:
POPESTIMATE2010 POPESTIMATE2011
STNAME
Alabama 71420.313433 71658.328358
Alaska 24621.413793 24921.379310
Arizona 427213.866667 431248.800000
...
Since you want two things to be calculated: mean and sum, you need two function calls. One with 'mean' and one with 'sum'.
(2) function
You can also give any function as an argument. The function should take array-like data (pd.Series) as input, and produce scalar value out of it. Example:
In [25]: df.set_index('STNAME').groupby(level=0)[['POPESTIMATE2010','POPESTIMATE2011']].agg(np.mean)
Out[25]:
POPESTIMATE2010 POPESTIMATE2011
STNAME
Alabama 71420.313433 71658.328358
Alaska 24621.413793 24921.379310
...
Since you want two things to be calculated: mean and sum, you need two function calls. One with np.mean and one with np.sum.
(3) list of functions
You may also give a list of functions for the argument for agg()
. Example:
In [27]: df.set_index('STNAME').groupby(level=0)[['POPESTIMATE2010','POPESTIMATE2011']].agg([np.mean, np.sum])
Out[27]:
POPESTIMATE2010 POPESTIMATE2011
mean sum mean sum
STNAME
Alabama 71420.313433 4785161 71658.328358 4801108
Alaska 24621.413793 714021 24921.379310 722720
Arizona 427213.866667 6408208 431248.800000 6468732
...
This has the upside that you need only one function call. Use this, if you want the same operations to be calculated with multiple columns
(4) Dict to column names -> functions
If you give a dictionary as the argument to agg()
, then the keys must represent the column names in the dataframe, and the values should be either functions or lists of functions. Example:
In [30]: In [27]: df.set_index('STNAME').groupby(level=0)[['POPESTIMATE2010','POPESTIMATE2011']].agg({"POPESTIMATE2010": [np.mean, np.sum], "POPESTIMATE2011": [np.mean, np.sum]})
Out[30]:
POPESTIMATE2010 POPESTIMATE2011
average sum mean sum
STNAME
Alabama 71420.313433 4785161 71658.328358 4801108
Alaska 24621.413793 714021 24921.379310 722720
Arizona 427213.866667 6408208 431248.800000 6468732
...
This has the upside that you need only one function call. Use this, if you want different operations to be calculated with different columns