SQL moving average
Asked Answered
W

13

16

How do you create a moving average in SQL?

Current table:

Date             Clicks 
2012-05-01       2,230
2012-05-02       3,150
2012-05-03       5,520
2012-05-04       1,330
2012-05-05       2,260
2012-05-06       3,540
2012-05-07       2,330

Desired table or output:

Date             Clicks    3 day Moving Average
2012-05-01       2,230
2012-05-02       3,150
2012-05-03       5,520          4,360
2012-05-04       1,330          3,330
2012-05-05       2,260          3,120
2012-05-06       3,540          3,320
2012-05-07       2,330          3,010
Wetzel answered 16/5, 2012 at 18:58 Comment(3)
What database system are you using?Justen
@BrianWebster: he said that in a comment to my (now deleted) post: He is using Hive. But you removed the tag for it.Hungerford
Ok, fixed - I honestly didn't realize that was a database systemJusten
J
8

One way to do this is to join on the same table a few times.

select
 (Current.Clicks 
  + isnull(P1.Clicks, 0)
  + isnull(P2.Clicks, 0)
  + isnull(P3.Clicks, 0)) / 4 as MovingAvg3
from
 MyTable as Current
 left join MyTable as P1 on P1.Date = DateAdd(day, -1, Current.Date)
 left join MyTable as P2 on P2.Date = DateAdd(day, -2, Current.Date)
 left join MyTable as P3 on P3.Date = DateAdd(day, -3, Current.Date)

Adjust the DateAdd component of the ON-Clauses to match whether you want your moving average to be strictly from the past-through-now or days-ago through days-ahead.

  • This works nicely for situations where you need a moving average over only a few data points.
  • This is not an optimal solution for moving averages with more than a few data points.
Justen answered 16/5, 2012 at 19:0 Comment(3)
left join those. (see the first two have none)Fluecure
Isn't doing 4 joins going to be a pretty costly operation for large tables?Wetzel
Depends on the data, but in my experience this is a pretty speedy operation.Justen
P
19

This is an Evergreen Joe Celko question. I ignore which DBMS platform is used. But in any case Joe was able to answer more than 10 years ago with standard SQL.

Joe Celko SQL Puzzles and Answers citation: "That last update attempt suggests that we could use the predicate to construct a query that would give us a moving average:"

SELECT S1.sample_time, AVG(S2.load) AS avg_prev_hour_load
FROM Samples AS S1, Samples AS S2
WHERE S2.sample_time
BETWEEN (S1.sample_time - INTERVAL 1 HOUR)
AND S1.sample_time
GROUP BY S1.sample_time;

Is the extra column or the query approach better? The query is technically better because the UPDATE approach will denormalize the database. However, if the historical data being recorded is not going to change and computing the moving average is expensive, you might consider using the column approach.

MS SQL Example:

CREATE TABLE #TestDW
( Date1 datetime,
  LoadValue Numeric(13,6)
);

INSERT INTO #TestDW VALUES('2012-06-09' , '3.540' );
INSERT INTO #TestDW VALUES('2012-06-08' , '2.260' );
INSERT INTO #TestDW VALUES('2012-06-07' , '1.330' );
INSERT INTO #TestDW VALUES('2012-06-06' , '5.520' );
INSERT INTO #TestDW VALUES('2012-06-05' , '3.150' );
INSERT INTO #TestDW VALUES('2012-06-04' , '2.230' );

SQL Puzzle query:

SELECT S1.date1,  AVG(S2.LoadValue) AS avg_prev_3_days
FROM #TestDW AS S1, #TestDW AS S2
WHERE S2.date1
    BETWEEN DATEADD(d, -2, S1.date1 )
    AND S1.date1
GROUP BY S1.date1
order by 1;
Propound answered 16/5, 2012 at 19:22 Comment(2)
Thanks for the info - but I'm having a hard time translating this to see how it solves the question. Can you give the query you would use for the table in the question?Wetzel
This is even better as it can be modified to find out Moving average of N months..Varmint
J
8

One way to do this is to join on the same table a few times.

select
 (Current.Clicks 
  + isnull(P1.Clicks, 0)
  + isnull(P2.Clicks, 0)
  + isnull(P3.Clicks, 0)) / 4 as MovingAvg3
from
 MyTable as Current
 left join MyTable as P1 on P1.Date = DateAdd(day, -1, Current.Date)
 left join MyTable as P2 on P2.Date = DateAdd(day, -2, Current.Date)
 left join MyTable as P3 on P3.Date = DateAdd(day, -3, Current.Date)

Adjust the DateAdd component of the ON-Clauses to match whether you want your moving average to be strictly from the past-through-now or days-ago through days-ahead.

  • This works nicely for situations where you need a moving average over only a few data points.
  • This is not an optimal solution for moving averages with more than a few data points.
Justen answered 16/5, 2012 at 19:0 Comment(3)
left join those. (see the first two have none)Fluecure
Isn't doing 4 joins going to be a pretty costly operation for large tables?Wetzel
Depends on the data, but in my experience this is a pretty speedy operation.Justen
S
3
select t2.date, round(sum(ct.clicks)/3) as avg_clicks
from
(select date from clickstable) as t2,
(select date, clicks from clickstable) as ct
where datediff(t2.date, ct.date) between 0 and 2
group by t2.date

Example here.

Obviously you can change the interval to whatever you need. You could also use count() instead of a magic number to make it easier to change, but that will also slow it down.

Stowe answered 27/6, 2013 at 10:50 Comment(1)
Your first two entries are 1- and 2-day averages. The question asks for those entries to be NULL.Walkup
S
2

General template for rolling averages that scales well for large data sets

WITH moving_avg AS (
  SELECT 0 AS [lag] UNION ALL
  SELECT 1 AS [lag] UNION ALL
  SELECT 2 AS [lag] UNION ALL
  SELECT 3 AS [lag] --ETC
)
SELECT
  DATEADD(day,[lag],[date]) AS [reference_date],
  [otherkey1],[otherkey2],[otherkey3],
  AVG([value1]) AS [avg_value1],
  AVG([value2]) AS [avg_value2]
FROM [data_table]
CROSS JOIN moving_avg
GROUP BY [otherkey1],[otherkey2],[otherkey3],DATEADD(day,[lag],[date])
ORDER BY [otherkey1],[otherkey2],[otherkey3],[reference_date];

And for weighted rolling averages:

WITH weighted_avg AS (
  SELECT 0 AS [lag], 1.0 AS [weight] UNION ALL
  SELECT 1 AS [lag], 0.6 AS [weight] UNION ALL
  SELECT 2 AS [lag], 0.3 AS [weight] UNION ALL
  SELECT 3 AS [lag], 0.1 AS [weight] --ETC
)
SELECT
  DATEADD(day,[lag],[date]) AS [reference_date],
  [otherkey1],[otherkey2],[otherkey3],
  AVG([value1] * [weight]) / AVG([weight]) AS [wavg_value1],
  AVG([value2] * [weight]) / AVG([weight]) AS [wavg_value2]
FROM [data_table]
CROSS JOIN weighted_avg
GROUP BY [otherkey1],[otherkey2],[otherkey3],DATEADD(day,[lag],[date])
ORDER BY [otherkey1],[otherkey2],[otherkey3],[reference_date];
Superstition answered 12/12, 2012 at 0:40 Comment(3)
interesting approach for weighted. Wouldn't work (well) for more discrete time points (timestamp rather date) thoughKristy
@Kristy Outside of academic exercises, what purpose would a fixed weight rolling average over non-uniform intervals have? Wouldn't you either histo the data first or calculate weight as a function of interval size?Superstition
by all means uniform. You just throw to the appropriate weight bucket depending on the distance from the current time point. For instance "take weight=1 for datapoints within 24hrs from current datapoint; weight=0.5 for datapoints within 48hrs…". That case it matters how much consecutive datapoints (like 6:12am and 11:48pm ) are distant from each other… A use case I can think of would be an attempt to smooth the histogram wherever datapoints are not dense enoughKristy
M
2
select *
        , (select avg(c2.clicks) from #clicks_table c2 
            where c2.date between dateadd(dd, -2, c1.date) and c1.date) mov_avg
from #clicks_table c1
Mochun answered 13/12, 2012 at 12:56 Comment(0)
P
1

assume x is the value to be averaged and xDate is the date value:

SELECT avg(x) from myTable WHERE xDate BETWEEN dateadd(d, -2, xDate) and xDate

Poole answered 16/5, 2012 at 19:4 Comment(0)
G
1

Use a different join predicate:

SELECT current.date
       ,avg(periods.clicks)
FROM current left outer join current as periods
       ON current.date BETWEEN dateadd(d,-2, periods.date) AND periods.date
GROUP BY current.date HAVING COUNT(*) >= 3

The having statement will prevent any dates without at least N values from being returned.

Guardi answered 16/5, 2012 at 19:27 Comment(1)
This will not show the May 1 and May 2 rows, which the questioner wants to see NULLs for.Walkup
D
1

In hive, maybe you could try

select date, clicks, avg(clicks) over (order by date rows between 2 preceding and current row) as moving_avg from clicktable;
Devil answered 11/1, 2018 at 6:23 Comment(0)
T
0

For the purpose, I'd like to create an auxiliary/dimensional date table like

create table date_dim(date date, date_1 date, dates_2 date, dates_3 dates ...)

while date is the key, date_1 for this day, date_2 contains this day and the day before; date_3...

Then you can do the equal join in hive.

Using a view like:

select date, date               from date_dim
union all
select date, date_add(date, -1) from date_dim
union all
select date, date_add(date, -2) from date_dim
union all
select date, date_add(date, -3) from date_dim
Thrift answered 24/9, 2012 at 14:42 Comment(0)
V
0

NOTE: THIS IS NOT AN ANSWER but an enhanced code sample of Diego Scaravaggi's answer. I am posting it as answer as the comment section is insufficient. Note that I have parameter-ized the period for Moving aveage.

declare @p int = 3
declare @t table(d int, bal float)
insert into @t values
(1,94),
(2,99),
(3,76),
(4,74),
(5,48),
(6,55),
(7,90),
(8,77),
(9,16),
(10,19),
(11,66),
(12,47)

select a.d, avg(b.bal)
from
       @t a
       left join @t b on b.d between a.d-(@p-1) and a.d
group by a.d
Varmint answered 14/11, 2012 at 16:27 Comment(0)
R
0
--@p1 is period of moving average, @01 is offset

declare @p1 as int
declare @o1 as int
set @p1 = 5;
set @o1 = 3;

with np as(
select *, rank() over(partition by cmdty, tenor order by markdt) as r
from p_prices p1
where
1=1 
)
, x1 as (
select s1.*, avg(s2.val) as avgval from np s1
inner join np s2 
on s1.cmdty = s2.cmdty and s1.tenor = s2.tenor
and s2.r between s1.r - (@p1 - 1) - (@o1) and s1.r - (@o1)
group by s1.cmdty, s1.tenor, s1.markdt, s1.val, s1.r
)
Rapping answered 22/9, 2015 at 10:28 Comment(0)
G
0

I'm not sure that your expected result (output) shows classic "simple moving (rolling) average" for 3 days. Because, for example, the first triple of numbers by definition gives:

ThreeDaysMovingAverage = (2.230 + 3.150 + 5.520) / 3 = 3.6333333

but you expect 4.360 and it's confusing.

Nevertheless, I suggest the following solution, which uses window-function AVG. This approach is much more efficient (clear and less resource-intensive) than SELF-JOIN introduced in other answers (and I'm surprised that no one has given a better solution).

-- Oracle-SQL dialect 
with
  data_table as (
     select date '2012-05-01' AS dt, 2.230 AS clicks from dual union all
     select date '2012-05-02' AS dt, 3.150 AS clicks from dual union all
     select date '2012-05-03' AS dt, 5.520 AS clicks from dual union all
     select date '2012-05-04' AS dt, 1.330 AS clicks from dual union all
     select date '2012-05-05' AS dt, 2.260 AS clicks from dual union all
     select date '2012-05-06' AS dt, 3.540 AS clicks from dual union all
     select date '2012-05-07' AS dt, 2.330 AS clicks from dual  
  ),
  param as (select 3 days from dual)
select
   dt     AS "Date",
   clicks AS "Clicks",

   case when rownum >= p.days then 
       avg(clicks) over (order by dt
                          rows between p.days - 1 preceding and current row)
   end    
          AS "3 day Moving Average"
from data_table t, param p;

You see that AVG is wrapped with case when rownum >= p.days then to force NULLs in first rows, where "3 day Moving Average" is meaningless.

Gelhar answered 23/2, 2016 at 13:12 Comment(0)
W
-1

We can apply Joe Celko's "dirty" left outer join method (as cited above by Diego Scaravaggi) to answer the question as it was asked.

declare @ClicksTable table  ([Date] date, Clicks int)
insert into @ClicksTable
    select '2012-05-01', 2230 union all
    select '2012-05-02', 3150 union all
    select '2012-05-03', 5520 union all
    select '2012-05-04', 1330 union all
    select '2012-05-05', 2260 union all
    select '2012-05-06', 3540 union all
    select '2012-05-07', 2330

This query:

SELECT
    T1.[Date],
    T1.Clicks,
    -- AVG ignores NULL values so we have to explicitly NULLify
    -- the days when we don't have a full 3-day sample
    CASE WHEN count(T2.[Date]) < 3 THEN NULL
        ELSE AVG(T2.Clicks) 
    END AS [3-Day Moving Average] 
FROM @ClicksTable T1
LEFT OUTER JOIN @ClicksTable T2
    ON T2.[Date] BETWEEN DATEADD(d, -2, T1.[Date]) AND T1.[Date]
GROUP BY T1.[Date]

Generates the requested output:

Date             Clicks    3-Day Moving Average
2012-05-01       2,230
2012-05-02       3,150
2012-05-03       5,520          4,360
2012-05-04       1,330          3,330
2012-05-05       2,260          3,120
2012-05-06       3,540          3,320
2012-05-07       2,330          3,010
Walkup answered 9/1, 2016 at 0:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.