Calculating Cumulative Sum in PostgreSQL
Asked Answered
S

1

160

I want to find the cumulative or running amount of field and insert it from staging to table. My staging structure is something like this:

ea_month    id       amount    ea_year    circle_id
April       92570    1000      2014        1
April       92571    3000      2014        2
April       92572    2000      2014        3
March       92573    3000      2014        1
March       92574    2500      2014        2
March       92575    3750      2014        3
February    92576    2000      2014        1
February    92577    2500      2014        2
February    92578    1450      2014        3          

I want my target table to look something like this:

ea_month    id       amount    ea_year    circle_id    cum_amt
February    92576    1000      2014        1           1000 
March       92573    3000      2014        1           4000
April       92570    2000      2014        1           6000
February    92577    3000      2014        2           3000
March       92574    2500      2014        2           5500
April       92571    3750      2014        2           9250
February    92578    2000      2014        3           2000
March       92575    2500      2014        3           4500
April       92572    1450      2014        3           5950

I am really very much confused with how to go about achieving this result. I want to achieve this result using PostgreSQL.

Can anyone suggest how to go about achieving this result-set?

Spectacle answered 3/4, 2014 at 14:47 Comment(3)
How do you get the cum_amount of 1000 in your target table? For circle_id, the amount seems to be 2000.Poulenc
@Poulenc He is grouping by ea_year, circle_id, ea_month. Then want to take cum_amtThornton
I think that amount column in target table are wrong and cum_amt of course too. E.g. id 92576 has amount 2000 not 1000. If I'm right please edit this - it is confusingGanges
T
241

Basically, you need a window function. That's a standard feature nowadays. In addition to genuine window functions, you can use any aggregate function as window function in Postgres by appending an OVER clause.

The special difficulty here is to get partitions and sort order right:

SELECT ea_month, id, amount, ea_year, circle_id
     , sum(amount) OVER (PARTITION BY circle_id
                         ORDER BY ea_year, ea_month) AS cum_amt
FROM   tbl
ORDER  BY circle_id, ea_year, ea_month;

And no GROUP BY.

The sum for each row is calculated from the first row in the partition to the current row - or quoting the manual to be precise:

The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With ORDER BY, this sets the frame to be all rows from the partition start up through the current row's last ORDER BY peer.

Bold emphasis mine. This is the cumulative (or "running") sum you are after.

In default RANGE mode, rows with the same rank in the sort order are "peers" - same (circle_id, ea_year, ea_month) in this query. All of those show the same running sum with all peers added to the sum. But I assume your table is UNIQUE on (circle_id, ea_year, ea_month), then the sort order is deterministic and no row has peers. (And you might as well use the cheaper ROWS mode.)

Postgres 11 added tools to include / exclude peers with the new frame_exclusion options. See:

Now, ORDER BY ... ea_month won't work with strings for month names. Postgres would sort alphabetically according to the locale setting.

If you have actual date values stored in your table you can sort properly. If not, I suggest to replace ea_year and ea_month with a single column the_date of type date in your table.

  • Transform what you have with to_date():

      to_date(ea_year || ea_month , 'YYYYMonth') AS the_date
    
  • For display, you can get original strings with to_char():

      to_char(the_date, 'Month') AS ea_month
      to_char(the_date, 'YYYY')  AS ea_year
    

While stuck with the unfortunate design, this will work:

SELECT ea_month, id, amount, ea_year, circle_id
     , sum(amount) OVER (PARTITION BY circle_id ORDER BY the_date) AS cum_amt
FROM   (SELECT *, to_date(ea_year || ea_month, 'YYYYMonth') AS the_date FROM tbl) sub
ORDER  BY circle_id, mon;
Thermionics answered 3/4, 2014 at 16:9 Comment(11)
Thanks for the solution.. Can you help me with one more thing. I want to implement the same thing using a cursor with the logic being every circle will have just one record for a month of a year. And the function is supposed to run once every month. How can I achieve this?Spectacle
@YousufSultan: Most of the time there is a better solution than a cursor. That's definitely stuff for a new question. Please start a new question.Thermionics
I find this answer incomplete without at least a note that there is "framing" going on here which defaults to range unbounded preceding, which is the same as range between unbounded preceding and current row. This is why sum()when used as a window function produces a running total -- while other window functions don't have this default frame.Leucoma
@Colin'tHart: I added some more above to clarify.Thermionics
Here's a link to a similar question with a simpler query (the PARTITION is not always needed to create a running total): https://mcmap.net/q/28662/-count-cumulative-total-in-postgresqlLillia
Maybe ORDER BY circle_id, month; should be ORDER BY circle_id, ea_year, ea_month;?Vagarious
@Ciro Oh yes, thanks - matching the original table schema.Thermionics
@ErwinBrandstetter I think the last query is wrong. ORDER BY circle_id, the_date. Should also have a alias....Drome
@Drome Indeed, an alias for the subquery was missing, thanks!Thermionics
thanks for the pg reference link . Mysql doc dosen't say anything for order by inside windows function will impact the sum.Barranquilla
This answer is SO HELPFUL!!!!Gambill

© 2022 - 2024 — McMap. All rights reserved.