Calculating Cumulative Sum in PostgreSQL

ea_month id amount ea_year circle_id April 92570 1000 2014 1 April 92571 3000 2014 2 April 92572 2000 2014 3 March 92573 3000 2014 1 March 92574 2500 2014 2 March 92575 3750 2014 3 February 92576 2000 2014 1 February 92577 2500 2014 2 February 92578 1450 2014 3

ea_month id amount ea_year circle_id cum_amt February 92576 1000 2014 1 1000 March 92573 3000 2014 1 4000 April 92570 2000 2014 1 6000 February 92577 3000 2014 2 3000 March 92574 2500 2014 2 5500 April 92571 3750 2014 2 9250 February 92578 2000 2014 3 2000 March 92575 2500 2014 3 4500 April 92572 1450 2014 3 5950

Basically, you need a window function. That's a standard feature nowadays. In addition to genuine window functions, you can use any aggregate function as window function in Postgres by appending an OVER clause.

The special difficulty here is to get partitions and sort order right:

SELECT ea_month, id, amount, ea_year, circle_id
     , sum(amount) OVER (PARTITION BY circle_id
                         ORDER BY ea_year, ea_month) AS cum_amt
FROM   tbl
ORDER  BY circle_id, ea_year, ea_month;

And no GROUP BY.

The sum for each row is calculated from the first row in the partition to the current row - or quoting the manual to be precise:

The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With ORDER BY, this sets the frame to be all rows from the partition start up through the current row's last ORDER BY peer.

Bold emphasis mine. This is the cumulative (or "running") sum you are after.

In default RANGE mode, rows with the same rank in the sort order are "peers" - same (circle_id, ea_year, ea_month) in this query. All of those show the same running sum with all peers added to the sum. But I assume your table is UNIQUE on (circle_id, ea_year, ea_month), then the sort order is deterministic and no row has peers. (And you might as well use the cheaper ROWS mode.)

Postgres 11 added tools to include / exclude peers with the new frame_exclusion options. See:

Aggregating all values not in the same group

Now, ORDER BY ... ea_month won't work with strings for month names. Postgres would sort alphabetically according to the locale setting.

If you have actual date values stored in your table you can sort properly. If not, I suggest to replace ea_year and ea_month with a single column the_date of type date in your table.

Transform what you have with to_date():

  to_date(ea_year || ea_month , 'YYYYMonth') AS the_date

For display, you can get original strings with to_char():

  to_char(the_date, 'Month') AS ea_month
  to_char(the_date, 'YYYY')  AS ea_year

While stuck with the unfortunate design, this will work:

SELECT ea_month, id, amount, ea_year, circle_id
     , sum(amount) OVER (PARTITION BY circle_id ORDER BY the_date) AS cum_amt
FROM   (SELECT *, to_date(ea_year || ea_month, 'YYYYMonth') AS the_date FROM tbl) sub
ORDER  BY circle_id, mon;

Recommended topics

Hot tags