group by week in pandas
Asked Answered
R

7

104

I'm having this data frame:

Name   Date    Quantity
Apple  07/11/17  20
orange 07/14/17  20
Apple  07/14/17  70
Orange 07/25/17  40
Apple  07/20/17  30

I want to aggregate this by Name and Date to get sum of quantities Details:

Date: Group, the result should be at the beginning of the week (or just on Monday)

Quantity: Sum, if two or more records have same Name and Date (if falls on same interval)

The desired output is given below:

Name   Date    Quantity
Apple  07/10/17  90
orange 07/10/17  20
Apple  07/17/17  30
orange 07/24/17  40
Rogovy answered 24/7, 2017 at 12:58 Comment(0)
P
158

First, convert column date to_datetime and subtract one week as we want the sum for the week ahead of the date and not the week before that date.

Then use groupby with Grouper by W-MON and aggregate sum:

df['Date'] = pd.to_datetime(df['Date']) - pd.to_timedelta(7, unit='d')
df = df.groupby(['Name', pd.Grouper(key='Date', freq='W-MON')])['Quantity']
    .sum()
    .reset_index()
    .sort_values('Date')
print (df)
  Name       Date  Quantity
0   Apple 2017-07-10        90
3  orange 2017-07-10        20
1   Apple 2017-07-17        30
2  Orange 2017-07-24        40
Penultimate answered 24/7, 2017 at 13:5 Comment(7)
Now it is correct, need substract one week only. Check edited answer.Penultimate
Why do you have to subtract one week?Kranz
@Kranz If I'm not mistaken it is because how W-MON works, if label='left' is not specify the W-MON would get the sum of the previous week, which is not normally the case. relevant Q: #49544921Chicoine
I just checked it and is the same substracting 7 days than adding label='left' on the arguments of pd.Grouper.Chicoine
The "W-MON" identifier was the magic bullet for me. In my data, weeks start on Tuesday...Scab
Can we just create the Week based on ISO format, and then groupby using that column?Broadloom
If you have date time values instead of just date, then subtracting 7 does not work correct. You need to subtract 6.Dissonance
N
44

Let's use groupby, resample with W-Mon, and sum:

df.groupby('Name').resample('W-Mon', on='Date').sum().reset_index().sort_values(by='Date')

Output:

     Name       Date  Quantity
0   Apple 2017-07-17        90
3  orange 2017-07-17        20
1   Apple 2017-07-24        30
2  Orange 2017-07-31        40
Numismatist answered 24/7, 2017 at 13:4 Comment(4)
Thanks for the reply!.But when I use your code it is showing, TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index' Do you know why? ThanksRogovy
Yes, you need to convert your data column to dtype datetime. Use df['Date'] = pd.to_datetime(df['Date']).Numismatist
Did not work for me. Full sample code would help reproducing. Does Quantity field need to be in the code?Bendigo
@MarkAndersen if you create a new question and paste the link here with your fulll code and sample dataset, I will try to answer your question.Numismatist
A
11

First convert column date to_datetime. This will group by week starting with Mondays. It will output the week number (but you can change that looking up in

http://strftime.org/

df.groupby(['name', df['date'].dt.strftime('%W')])['quantity'].sum()

Output:

name    date
apple   28      90
        29      30
orange  28      20
        30      40
Affra answered 27/6, 2018 at 13:21 Comment(2)
This fails as soon as you cross year boundaries, it groups everything by week number (0-52).Manhour
That is easily fixed by changing .dt.strftime('%W') by .dt.strftime('%W %Y')Gull
T
8

You already received a lot of good answers and the question is quite old, but, given the fact some of the solutions use deprecated functions and I encounted the same problem and found a different solution I think could be helpful to someone to share it.

Given the dataframe you proposed:

Name   Date    Quantity
Apple  07/11/17  20
orange 07/14/17  20
Apple  07/14/17  70
Orange 07/25/17  40
Apple  07/20/17  30

We have to convert the values in 'Date' as Pandas' Datetime since they are strings right now.
Then we can use the Series' dt property that allow us to handle DateTime-like series and extract informations.

df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%y')

By having a DateTime format allow us to use the dt parameters to extract the number of the week associated to the date. In order to do not loose any information I prefer to add a new column with the week number. Once retrieved the number of the week we can group by that week.

df['WeekNumber'] = df['Date'].dt.isocalendar().week
df.groupby(['Name', 'WeekNumber']).sum()

Name    WeekNumber  
Apple   28  90
        29  30
Orange  28  20
        30  40
Small problem: what if we consider different years?

There could be the case in whick our data have a range of years, in that situation we cannot consider only the week (otherwise we would mix up data from one year into another), so it would be useful to extract also the year column from isocalendar().

df['year'] = df['Date'].dt.isocalendar().year
df.groupby(['Name', 'WeekNumber', 'year']).sum()

Name    WeekNumber  year    Quantity
Apple   28          2017    90
        29          2017    30
Orange  28          2017    20
        30          2017    40
Thyroxine answered 26/5, 2022 at 13:2 Comment(0)
L
3

This groups every row on the previous Monday (if the date is already Monday, nothing is changed). This has the effect of grouping by week:

import pandas as pd, datetime as dt

# df = ...

df['WeekDate'] = df.apply(lambda row: row['Date'] - dt.timedelta(days=row['Date'].weekday()), axis=1)

perweek = df['WeekDate'].groupby(df['WeekDate']).count()

Exemple:

Date           WeekDate
2020-06-20     2020-06-15 <- monday
2020-06-21     2020-06-15
2020-06-24     2020-06-22 <- monday 
2020-06-25     2020-06-22
2020-06-26     2020-06-22
Ledesma answered 26/6, 2020 at 13:54 Comment(0)
P
1

You can use the to_period method to get the date truncated to the first day of the week (or month if you use the period M):

df["Week"] = df["Date"].dt.to_period("W").dt.to_timestamp()
df.groupby(["Name", "Week")).sum()
Pacesetter answered 30/8, 2022 at 14:55 Comment(0)
C
0

I think for completeness, one should mention the most compact way to do it:

df.groupby(['Name', df.Date.round('7d')])

Whis has less controls, but works across years.

(assumiing data is a datetime column. If not, replace df.Date with pd.to_datetime(df.Date))

Culberson answered 12/3, 2024 at 22:54 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.