How to get cumulative sum
Asked Answered
S

16

252
declare  @t table
    (
        id int,
        SomeNumt int
    )

insert into @t
select 1,10
union
select 2,12
union
select 3,3
union
select 4,15
union
select 5,23


select * from @t

the above select returns me the following.

id  SomeNumt
1   10
2   12
3   3
4   15
5   23

How do I get the following:

id  srome   CumSrome
1   10  10
2   12  22
3   3   25
4   15  40
5   23  63
Splurge answered 22/1, 2010 at 21:12 Comment(4)
Getting running totals in T-SQL is not hard, there are many correct answers, most of them pretty easy. What is not easy (or even possible at this time) is to write a true query in T-SQL for running totals that is efficient. They are all O(n^2), though they could easily be O(n), except that T-SQL does not optimize for this case. You can get O(n) using Cursors and/or While loops, but then you're using Cursors. (blech!)Literalism
Does this answer your question? Calculate a Running Total in SQL ServerGrieve
@Literalism this was probably true for old versions with self-join. I could not find proof that SUM() OVER() is quadratic.Melosa
@Melosa Yep, it isn’t. The newer windowed (“OVER()”) functions work just fine without the performance problems of what was availability to us back in 2010.Literalism
U
279
select t1.id, t1.SomeNumt, SUM(t2.SomeNumt) as sum
from @t t1
inner join @t t2 on t1.id >= t2.id
group by t1.id, t1.SomeNumt
order by t1.id

SQL Fiddle example

Output

| ID | SOMENUMT | SUM |
-----------------------
|  1 |       10 |  10 |
|  2 |       12 |  22 |
|  3 |        3 |  25 |
|  4 |       15 |  40 |
|  5 |       23 |  63 |

Edit: this is a generalized solution that will work across most db platforms. When there is a better solution available for your specific platform (e.g., gareth's), use it!

Unequaled answered 22/1, 2010 at 21:25 Comment(4)
@Franklin Only cost efficient for small tables. Cost grows proportional to the square of the number of rows. SQL Server 2012 allows this to be done much more efficiently.Gingergingerbread
FWIW, I've had my knuckles smacked when doing this by a DBA. I think the reason is it gets really expensive, really quick. That being said, this is a great interview question, as most data analysts/scientists should have had to solve this problem once or twice :)Inexpugnable
@Inexpugnable Agreed - I tend to provide generalized SQL solutions that will work on most db platforms. As always, when there is a better approach available, e.g., gareths, use it!Peyton
@Inexpugnable agreed - this seems to be a discovered solution first than a calculated one!Eritrea
R
275

The latest version of SQL Server (2012) permits the following.

SELECT 
    RowID, 
    Col1,
    SUM(Col1) OVER(ORDER BY RowId ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Col2
FROM tablehh
ORDER BY RowId

or

SELECT 
    GroupID, 
    RowID, 
    Col1,
    SUM(Col1) OVER(PARTITION BY GroupID ORDER BY RowId ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Col2
FROM tablehh
ORDER BY RowId

This is even faster. Partitioned version completes in 34 seconds over 5 million rows for me.

Thanks to Peso, who commented on the SQL Team thread referred to in another answer.

Rochelle answered 11/11, 2012 at 12:25 Comment(4)
For brevity, you may use ROWS UNBOUNDED PRECEDING instead of ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.Granger
Note: If the column you want to sum cumulatively is itself already a sum or count, you can either wrap the whole thing as an inner query or you can actually do SUM(COUNT(*)) OVER (ORDER BY RowId ROWS UNBOUNDED PRECEDING) AS CumulativeSum. It wasn't immediately obvious to me if it would work but it did :-)Murky
Available in PostgreSQL as of 8.4: postgresql.org/docs/8.4/sql-select.htmlImpracticable
The partition solution also allowed me to sum across multiple variables. For example the cumulative total of invoice number per customer where each customer has multiple invoices and each invoice has multiple items.Simulation
P
60

For SQL Server 2012 onwards it could be easy:

SELECT id, SomeNumt, sum(SomeNumt) OVER (ORDER BY id) as CumSrome FROM @t

because ORDER BY clause for SUM by default means RANGE UNBOUNDED PRECEDING AND CURRENT ROW for window frame ("General Remarks" at https://msdn.microsoft.com/en-us/library/ms189461.aspx)

Purulent answered 31/8, 2015 at 20:8 Comment(1)
RANGE UNBOUNDED PRECEDING and ROWS UNBOUNDED PRECEDING are subtly different and can cause incorrect results if you choose the wrong onePenitence
A
23

Let's first create a table with dummy data:

Create Table CUMULATIVESUM (id tinyint , SomeValue tinyint)

Now let's insert some data into the table;

Insert Into CUMULATIVESUM
    Select 1, 10 union 
    Select 2, 2  union
    Select 3, 6  union
    Select 4, 10 

Here I am joining same table (self joining)

Select c1.ID, c1.SomeValue, c2.SomeValue
From CumulativeSum c1, CumulativeSum c2
Where c1.id >= c2.ID
Order By c1.id Asc

Result:

ID  SomeValue   SomeValue
-------------------------
1   10          10
2   2           10
2   2            2
3   6           10
3   6            2
3   6            6
4   10          10
4   10           2
4   10           6
4   10          10

Here we go now just sum the Somevalue of t2 and we`ll get the answer:

Select c1.ID, c1.SomeValue, Sum(c2.SomeValue) CumulativeSumValue
From CumulativeSum c1,  CumulativeSum c2
Where c1.id >= c2.ID
Group By c1.ID, c1.SomeValue
Order By c1.id Asc

For SQL Server 2012 and above (much better performance):

Select 
    c1.ID, c1.SomeValue, 
    Sum (SomeValue) Over (Order By c1.ID )
From CumulativeSum c1
Order By c1.id Asc

Desired result:

ID  SomeValue   CumlativeSumValue
---------------------------------
1   10          10
2   2           12
3   6           18
4   10          28

Drop Table CumulativeSum
Agreement answered 23/11, 2012 at 10:12 Comment(5)
please edit your answer and format the code to make it readableLiable
What if mi "ID" values are repeated? (they are obvoiusly not primary key in my table) I have no t been able to adapt this query to that case?Semanteme
AFAIK you need unique id for cumulative sum , and you can get it using row_number. check that code below: ;with NewTBLWITHUNiqueID as ( select row_number() over(order by id , somevalue) UniqueID , * From CUMULATIVESUMwithoutPK )Agreement
Thanks @NeerajPrasadSharma, I actually used rank() and another order by clause to solve it.Semanteme
+1 for the Sum (SomeValue) Over (Order By c1.ID ), I've just realized that it might be needed sometimesFennec
B
16

A CTE version, just for fun:

;
WITH  abcd
        AS ( SELECT id
                   ,SomeNumt
                   ,SomeNumt AS MySum
             FROM   @t
             WHERE  id = 1
             UNION ALL
             SELECT t.id
                   ,t.SomeNumt
                   ,t.SomeNumt + a.MySum AS MySum
             FROM   @t AS t
                    JOIN abcd AS a ON a.id = t.id - 1
           )
  SELECT  *  FROM    abcd
OPTION  ( MAXRECURSION 1000 ) -- limit recursion here, or 0 for no limit.

Returns:

id          SomeNumt    MySum
----------- ----------- -----------
1           10          10
2           12          22
3           3           25
4           15          40
5           23          63
Barta answered 23/1, 2010 at 18:11 Comment(0)
T
7

Late answer but showing one more possibility...

Cumulative Sum generation can be more optimized with the CROSS APPLY logic.

Works better than the INNER JOIN & OVER Clause when analyzed the actual query plan ...

/* Create table & populate data */
IF OBJECT_ID('tempdb..#TMP') IS NOT NULL
DROP TABLE #TMP 

SELECT * INTO #TMP 
FROM (
SELECT 1 AS id
UNION 
SELECT 2 AS id
UNION 
SELECT 3 AS id
UNION 
SELECT 4 AS id
UNION 
SELECT 5 AS id
) Tab


/* Using CROSS APPLY 
Query cost relative to the batch 17%
*/    
SELECT   T1.id, 
         T2.CumSum 
FROM     #TMP T1 
         CROSS APPLY ( 
         SELECT   SUM(T2.id) AS CumSum 
         FROM     #TMP T2 
         WHERE    T1.id >= T2.id
         ) T2

/* Using INNER JOIN 
Query cost relative to the batch 46%
*/
SELECT   T1.id, 
         SUM(T2.id) CumSum
FROM     #TMP T1
         INNER JOIN #TMP T2
                 ON T1.id > = T2.id
GROUP BY T1.id

/* Using OVER clause
Query cost relative to the batch 37%
*/
SELECT   T1.id, 
         SUM(T1.id) OVER( PARTITION BY id)
FROM     #TMP T1

Output:-
  id       CumSum
-------   ------- 
   1         1
   2         3
   3         6
   4         10
   5         15
Toughen answered 22/4, 2016 at 8:25 Comment(1)
I am not persuaded. "Query cost relative to the batch" is a meaningless thing for comparing performance of queries. Query costs are estimates used by the query planner to quickly weigh-up different plans and choose the least costly, but those costs are for comparing plans for the same query, and are not relevant or comparable between queries, not at all. This sample dataset is also too tiny to see any significant difference between the three methods. Try it again with 1m rows, look at the actual execution plans, try it with set io statistics on and compare the cpu and actual times.Tepee
T
6
Select 
    *, 
    (Select Sum(SOMENUMT) 
     From @t S 
     Where S.id <= M.id)
From @t M
Thordis answered 3/12, 2013 at 9:39 Comment(3)
It is a very smart way of achieving the result, and you can add multiple conditions to the sum.Produce
@Produce It's not great for performance though, it runs that correlated subquery for every single row of the result set, scanning more and more rows as it goes. It doesn't keep a running total and scan the data once like window functions can.Tepee
@Tepee you are right, if you use it gets very slow over 100,000 records.Produce
R
4

You can use this simple query for progressive calculation :

select 
   id
  ,SomeNumt
  ,sum(SomeNumt) over(order by id ROWS between UNBOUNDED PRECEDING and CURRENT ROW) as CumSrome
from @t
Ryder answered 21/12, 2019 at 7:32 Comment(0)
E
3

There is a much faster CTE implementation available in this excellent post: http://weblogs.sqlteam.com/mladenp/archive/2009/07/28/SQL-Server-2005-Fast-Running-Totals.aspx

The problem in this thread can be expressed like this:

    DECLARE @RT INT
    SELECT @RT = 0

    ;
    WITH  abcd
            AS ( SELECT TOP 100 percent
                        id
                       ,SomeNumt
                       ,MySum
                       order by id
               )
      update abcd
      set @RT = MySum = @RT + SomeNumt
      output inserted.*

Edmondedmonda answered 23/5, 2012 at 22:41 Comment(0)
S
3

For Ex: IF you have a table with two columns one is ID and second is number and wants to find out the cumulative sum.

SELECT ID,Number,SUM(Number)OVER(ORDER BY ID) FROM T
Shem answered 28/5, 2020 at 16:3 Comment(1)
Best solution for those who want to write less, it's easy to also see current MAX per row or event current average: SUM(SomeNumt)OVER(ORDER BY id), MAX(SomeNumt)OVER(ORDER BY id), AVG(SomeNumt)OVER(ORDER BY id)Kileykilgore
F
1

The SQL solution wich combines "ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" and "SUM" did exactly what i wanted to achieve. Thank you so much!

If it can help anyone, here was my case. I wanted to cumulate +1 in a column whenever a maker is found as "Some Maker" (example). If not, no increment but show previous increment result.

So this piece of SQL:

SUM( CASE [rmaker] WHEN 'Some Maker' THEN  1 ELSE 0 END) 
OVER 
(PARTITION BY UserID ORDER BY UserID,[rrank] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Cumul_CNT

Allowed me to get something like this:

User 1  Rank1   MakerA      0  
User 1  Rank2   MakerB      0  
User 1  Rank3   Some Maker  1  
User 1  Rank4   Some Maker  2  
User 1  Rank5   MakerC      2
User 1  Rank6   Some Maker  3  
User 2  Rank1   MakerA      0  
User 2  Rank2   SomeMaker   1  

Explanation of above: It starts the count of "some maker" with 0, Some Maker is found and we do +1. For User 1, MakerC is found so we dont do +1 but instead vertical count of Some Maker is stuck to 2 until next row. Partitioning is by User so when we change user, cumulative count is back to zero.

I am at work, I dont want any merit on this answer, just say thank you and show my example in case someone is in the same situation. I was trying to combine SUM and PARTITION but the amazing syntax "ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" completed the task.

Thanks! Groaker

Fourflusher answered 24/5, 2016 at 0:8 Comment(0)
M
0

Once the table is created -

select 
    A.id, A.SomeNumt, SUM(B.SomeNumt) as sum
    from @t A, @t B where A.id >= B.id
    group by A.id, A.SomeNumt

order by A.id
Montagnard answered 4/7, 2013 at 9:22 Comment(0)
P
0

Above (Pre-SQL12) we see examples like this:-

SELECT
    T1.id, SUM(T2.id) AS CumSum
FROM 
    #TMP T1
    JOIN #TMP T2 ON T2.id < = T1.id
GROUP BY
    T1.id

More efficient...

SELECT
    T1.id, SUM(T2.id) + T1.id AS CumSum
FROM 
    #TMP T1
    JOIN #TMP T2 ON T2.id < T1.id
GROUP BY
    T1.id
Partida answered 14/11, 2016 at 14:11 Comment(0)
S
-1

Try this

select 
    t.id,
    t.SomeNumt, 
    sum(t.SomeNumt) Over (Order by t.id asc Rows Between Unbounded Preceding and Current Row) as cum
from 
    @t t 
group by
    t.id,
    t.SomeNumt
order by
    t.id asc;
Stpierre answered 28/6, 2013 at 2:12 Comment(1)
This works with SQL Server 2012 and up, 2008 has limited support for window functions.Aphonic
H
-1

Try this:

CREATE TABLE #t(
 [name] varchar NULL,
 [val] [int] NULL,
 [ID] [int] NULL
) ON [PRIMARY]

insert into #t (id,name,val) values
 (1,'A',10), (2,'B',20), (3,'C',30)

select t1.id, t1.val, SUM(t2.val) as cumSum
 from #t t1 inner join #t t2 on t1.id >= t2.id
 group by t1.id, t1.val order by t1.id
Hoffer answered 17/2, 2014 at 3:37 Comment(0)
D
-1

Without using any type of JOIN cumulative salary for a person fetch by using follow query:

SELECT * , (
  SELECT SUM( salary ) 
  FROM  `abc` AS table1
  WHERE table1.ID <=  `abc`.ID
    AND table1.name =  `abc`.Name
) AS cum
FROM  `abc` 
ORDER BY Name
Dachia answered 12/6, 2017 at 7:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.