SQL Query for 7 Day Rolling Average in SQL Server
Asked Answered
L

2

32

I have a table of hourly product usage (how many times the product is used) data –

ID (bigint)| ProductId (tinyint)| Date (int - YYYYMMDD) | Hour (tinyint)| UsageCount (int)
#|1 | 20140901 | 0 | 10
#|1 | 20140901 | 1 | 15
#|1 | 20140902 | 5 | 25
#|1 | 20140903 | 5 | 25
#|1 | 20140904 | 3 | 25
#|1 | 20140905 | 7 | 25
#|1 | 20140906 | 10 | 25
#|1 | 20140907 | 9 | 25
#|1 | 20140908 | 5 | 25
#|2 | 20140903 | 16 | 10
#|2 | 20140903 | 13 | 115

Likewise, I have the usage data for 4 different products (ProductId from 1 through 4) stored for every hour in the product_usage table. As you can imagine, it is constantly growing as the nightly ETL process dumps the data for the entire previous day. If a product is not used on any hour of a day, the record for that hour won’t appear in this table. Similarly, if a product is not used for the entire day, there won’t be any record for that day in the table. I need to generate a report that gives daily usage and last 7 days’ rolling average –

For example:

ProductId | Date | DailyUsage | RollingAverage
1 | 20140901 | sum of usages of that day | (Sum of usages from 20140901 through 20140826) / 7
1 | 20140901 | sum of usages of that day | (Sum of usages from 20140901 through 20140826) / 7
1 | 20140902 | sum of usages of that day | (Sum of usages from 20140902 through 20140827) / 7
2 | 20140902 | sum of usages of that day | (Sum of usages from 20140902 through 20140827) / 7

And so on.. I am planning to create an Indexed View in SQL server 2014. Can you think of an efficient SQL query to do this?

Lover answered 18/9, 2014 at 21:8 Comment(1)
What you are calling a "rolling average" is really a sum.Fusspot
R
45

Try:

select x.*,
       avg(dailyusage) over(partition by productid order by productid, date rows between 6 preceding and current row) as rolling_avg
  from (select productid, date, sum(usagecount) as dailyusage
          from tbl
         group by productid, date) x

Fiddle:

http://sqlfiddle.com/#!6/f674a7/4/0

Replace "avg(dailusage) over...." with sum (rather than avg) if what you really want is the sum for the past week. In your title you say you want the average but later you say you want the sum. The query should be the same other than that, so use whichever you actually want.

As was pointed out by Gordon this is basically the average of the past 6 dates in which the product was used, which might be more than just the past 6 days if there are days without any rows for that product on the table because it wasn't used at all. To get around that you could use a date table and your products table.

Revival answered 18/9, 2014 at 22:2 Comment(3)
Hey, do you know how to reword your sql code for msql 2008? I try running it in 2008 and i keep getting errors.Sleazy
SQL Server 2008 does not support ROWS BETWEEN syntax. You need a slightly different approach, for example #26618853Pomerania
Wow - I knew part of the power of the "over(partition ...)" function, but I apparently glossed right over the 'preceding' element. Thanks!Wellfixed
F
10

You have to be careful if you can be missing data on some days. If I assume that there is data for some product on each day, then this approach will work:

select p.productid, d.date, sum(usagecount),
       sum(sum(usagecount)) over (partition by p.productid order by d.date
                                  rows between 6 preceding and current row) as Sum7day
from (select distinct productid from hourly) p cross join
     (select distinct date from hourly) d left join
     hourly h
     on h.productid = p.productid and h.date = p.date
group by p.productid, d.date;
Fusspot answered 18/9, 2014 at 22:15 Comment(1)
Exactly. The technique of 'ordering' and taking the last N observations breaks down if an observations is missing.Denationalize

© 2022 - 2024 — McMap. All rights reserved.