group by week in pandas

Asked 24/7, 2017 at 12:58 Answered 12/3, 2024 at 22:54

104

I'm having this data frame:

Name   Date    Quantity
Apple  07/11/17  20
orange 07/14/17  20
Apple  07/14/17  70
Orange 07/25/17  40
Apple  07/20/17  30

I want to aggregate this by Name and Date to get sum of quantities Details:

Date: Group, the result should be at the beginning of the week (or just on Monday)

Quantity: Sum, if two or more records have same Name and Date (if falls on same interval)

The desired output is given below:

Name   Date    Quantity
Apple  07/10/17  90
orange 07/10/17  20
Apple  07/17/17  30
orange 07/24/17  40

Rogovy answered 24/7, 2017 at 12:58 Comment(0)

158

First, convert column date to_datetime and subtract one week as we want the sum for the week ahead of the date and not the week before that date.

Then use groupby with Grouper by W-MON and aggregate sum:

df['Date'] = pd.to_datetime(df['Date']) - pd.to_timedelta(7, unit='d')
df = df.groupby(['Name', pd.Grouper(key='Date', freq='W-MON')])['Quantity']
    .sum()
    .reset_index()
    .sort_values('Date')
print (df)

  Name       Date  Quantity
0   Apple 2017-07-10        90
3  orange 2017-07-10        20
1   Apple 2017-07-17        30
2  Orange 2017-07-24        40

Penultimate answered 24/7, 2017 at 13:5 Comment(7)

Now it is correct, need substract one week only. Check edited answer. – Penultimate 25/7, 2017 at 11:8

Why do you have to subtract one week? – Kranz 30/9, 2019 at 17:18

@Kranz If I'm not mistaken it is because how W-MON works, if label='left' is not specify the W-MON would get the sum of the previous week, which is not normally the case. relevant Q: #49544921 – Chicoine 7/10, 2019 at 9:47

I just checked it and is the same substracting 7 days than adding label='left' on the arguments of pd.Grouper. – Chicoine 7/10, 2019 at 10:59

The "W-MON" identifier was the magic bullet for me. In my data, weeks start on Tuesday... – Scab 11/8, 2020 at 14:54

Can we just create the Week based on ISO format, and then groupby using that column? – Broadloom 10/8, 2021 at 14:14

If you have date time values instead of just date, then subtracting 7 does not work correct. You need to subtract 6. – Dissonance 20/1, 2022 at 13:40

Let's use groupby, resample with W-Mon, and sum:

df.groupby('Name').resample('W-Mon', on='Date').sum().reset_index().sort_values(by='Date')

Output:

     Name       Date  Quantity
0   Apple 2017-07-17        90
3  orange 2017-07-17        20
1   Apple 2017-07-24        30
2  Orange 2017-07-31        40

Numismatist answered 24/7, 2017 at 13:4 Comment(4)

Thanks for the reply!.But when I use your code it is showing, TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index' Do you know why? Thanks – Rogovy 25/7, 2017 at 9:36

Yes, you need to convert your data column to dtype datetime. Use df['Date'] = pd.to_datetime(df['Date']). – Numismatist 25/7, 2017 at 13:4

Did not work for me. Full sample code would help reproducing. Does Quantity field need to be in the code? – Bendigo 1/2, 2020 at 20:15

@MarkAndersen if you create a new question and paste the link here with your fulll code and sample dataset, I will try to answer your question. – Numismatist 1/2, 2020 at 23:3

First convert column date to_datetime. This will group by week starting with Mondays. It will output the week number (but you can change that looking up in

http://strftime.org/

df.groupby(['name', df['date'].dt.strftime('%W')])['quantity'].sum()

Output:

name    date
apple   28      90
        29      30
orange  28      20
        30      40

Affra answered 27/6, 2018 at 13:21 Comment(2)

This fails as soon as you cross year boundaries, it groups everything by week number (0-52). – Manhour 7/4, 2021 at 2:24

That is easily fixed by changing .dt.strftime('%W') by .dt.strftime('%W %Y') – Gull 24/5, 2022 at 4:26

You already received a lot of good answers and the question is quite old, but, given the fact some of the solutions use deprecated functions and I encounted the same problem and found a different solution I think could be helpful to someone to share it.

Given the dataframe you proposed:

Name   Date    Quantity
Apple  07/11/17  20
orange 07/14/17  20
Apple  07/14/17  70
Orange 07/25/17  40
Apple  07/20/17  30

We have to convert the values in 'Date' as Pandas' Datetime since they are strings right now.
Then we can use the Series' dt property that allow us to handle DateTime-like series and extract informations.

df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%y')

By having a DateTime format allow us to use the dt parameters to extract the number of the week associated to the date. In order to do not loose any information I prefer to add a new column with the week number. Once retrieved the number of the week we can group by that week.

df['WeekNumber'] = df['Date'].dt.isocalendar().week
df.groupby(['Name', 'WeekNumber']).sum()

Name    WeekNumber  
Apple   28  90
        29  30
Orange  28  20
        30  40

Small problem: what if we consider different years?

There could be the case in whick our data have a range of years, in that situation we cannot consider only the week (otherwise we would mix up data from one year into another), so it would be useful to extract also the year column from isocalendar().

df['year'] = df['Date'].dt.isocalendar().year
df.groupby(['Name', 'WeekNumber', 'year']).sum()

Name    WeekNumber  year    Quantity
Apple   28          2017    90
        29          2017    30
Orange  28          2017    20
        30          2017    40

Thyroxine answered 26/5, 2022 at 13:2 Comment(0)

This groups every row on the previous Monday (if the date is already Monday, nothing is changed). This has the effect of grouping by week:

import pandas as pd, datetime as dt

# df = ...

df['WeekDate'] = df.apply(lambda row: row['Date'] - dt.timedelta(days=row['Date'].weekday()), axis=1)

perweek = df['WeekDate'].groupby(df['WeekDate']).count()

Exemple:

Date           WeekDate
2020-06-20     2020-06-15 <- monday
2020-06-21     2020-06-15
2020-06-24     2020-06-22 <- monday 
2020-06-25     2020-06-22
2020-06-26     2020-06-22

Ledesma answered 26/6, 2020 at 13:54 Comment(0)

You can use the to_period method to get the date truncated to the first day of the week (or month if you use the period M):

df["Week"] = df["Date"].dt.to_period("W").dt.to_timestamp()
df.groupby(["Name", "Week")).sum()

Pacesetter answered 30/8, 2022 at 14:55 Comment(0)

I think for completeness, one should mention the most compact way to do it:

df.groupby(['Name', df.Date.round('7d')])

Whis has less controls, but works across years.

(assumiing data is a datetime column. If not, replace df.Date with pd.to_datetime(df.Date))

Culberson answered 12/3, 2024 at 22:54 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Small problem: what if we consider different years?

Recommended topics

Hot tags