Optimizing a Vertica SQL query to do running totals
Asked Answered
H

2

8

I have a table S with time series data like this:

key   day   delta

For a given key, it's possible but unlikely that days will be missing.

I'd like to construct a cumulative column from the delta values (positive INTs), for the purposes of inserting this cumulative data into another table. This is what I've got so far:

SELECT key, day,
   SUM(delta) OVER (PARTITION BY key ORDER BY day asc RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
   delta
FROM S

In my SQL flavor, default window clause is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, but I left that in there to be explicit.

This query is really slow, like order of magnitude slower than the old broken query, which filled in 0s for the cumulative count. Any suggestions for other methods to generate the cumulative numbers?

I did look at the solutions here: Running total by grouped records in table

The RDBMs I'm using is Vertica. Vertica SQL precludes the first subselect solution there, and its query planner predicts that the 2nd left outer join solution is about 100 times more costly than the analytic form I show above.

Hukill answered 5/8, 2011 at 1:8 Comment(4)
What projections you have defined for this table ? How many rows and distinct keys? How slow is "very slow"?Circinus
What many rows do you have in the table? How are you partitioning / segmenting the master projection for that table?Furlong
Very difficult for me to understand! Could you please provide sample data with required output?Bleb
https://mcmap.net/q/109735/-how-to-get-cumulative-sum this might helpInexpensive
S
1

I think you're essentially there. You may just need to update the syntax a bit:

SELECT s_qty, 
   Sum(s_price) 
     OVER( 
       partition BY NULL 
       ORDER BY s_qty ASC rows UNBOUNDED PRECEDING ) "Cumulative Sum" 
FROM   sample_sales;

Output:

S_QTY | Cumulative Sum 
------+----------------
1     | 1000
100   | 11000
150   | 26000
200   | 28000
250   | 53000
300   | 83000
2000  | 103000
(7 rows)

reference link:

https://dwgeek.com/vertica-cumulative-sum-average-and-example.html/
Sheba answered 27/4, 2020 at 16:21 Comment(0)
T
-1

Sometimes it's faster to just use a correlated subquery:

SELECT 
    [key]
    , [day]
    , delta
    , (SELECT SUM(delta) FROM S WHERE [key] < t1.[key]) AS DeltaSum
FROM S t1
Talkie answered 27/5, 2014 at 17:28 Comment(1)
Vertica doesn't support correlated subqueries of this nature.Corydon

© 2022 - 2024 — McMap. All rights reserved.