Assume having the following DataFrame
rng = pd.date_range('1/1/2011', periods=72, freq='H')
n = 10
df = pd.DataFrame(
"datetime": np.random.choice(rng,n),
"cat": np.random.choice(['a','b','b'], n),
"val": np.random.randint(0,5, size=n)
If I now groupby
gb = df.groupby(['cat','datetime']).sum()
I get the totals for each cat
for each hour:
cat datetime val
a 2011-01-01 00:00:00 1
2011-01-01 09:00:00 3
2011-01-02 16:00:00 1
2011-01-03 16:00:00 1
b 2011-01-01 08:00:00 4
2011-01-01 15:00:00 3
2011-01-01 16:00:00 3
2011-01-02 04:00:00 4
2011-01-02 05:00:00 1
2011-01-02 12:00:00 4
However, I would like to have something like:
cat datetime val
a 2011-01-01 4
2011-01-02 1
2011-01-03 1
b 2011-01-01 10
2011-01-02 9
I could get the desired result by adding another column called date
df['date'] = df.datetime.apply(
and then do a similar groupby
: df.groupby(['cat','date']).sum()
. But I am interested whether there's more pythonic way to do it? In addition, I might want to have a look on the month or year level. So, what would be the right way?
for instance. Or to set the index to the date column,resample
and then groupby on 'cat' and perform the aggregations – Suicide