Moving Average based on Timestamps in PostgreSQL
Asked Answered
B

4

22

I wanted to perform moving average through timestamps. I have two columns: Temperature and timestamps (time-date) and I want to perform the moving average based on every 15 minutes successive temperature observations. In other words, selecting data to perform the average based on 15 minutes time interval. Moreover, it is possible to have different number of observations for different time sequences. I meant all the window sizes are equal (15 minutes) but it is possible to have different number of observations in each window. For example: For a first window we have to calculate the average of n observation and for second window calculate the average of the observation for n+5 observation.

Data Sample:

ID   Timestamps          Temperature
1    2007-09-14 22:56:12 5.39
2    2007-09-14 22:58:12 5.34
3    2007-09-14 23:00:12 5.16
4    2007-09-14 23:02:12 5.54
5    2007-09-14 23:04:12 5.30
6    2007-09-14 23:06:12 5.20
7    2007-09-14 23:10:12 5.39
8    2007-09-14 23:12:12 5.34
9    2007-09-14 23:20:12 5.16
10   2007-09-14 23:24:12 5.54
11   2007-09-14 23:30:12 5.30
12   2007-09-14 23:33:12 5.20
13   2007-09-14 23:40:12 5.39
14   2007-09-14 23:42:12 5.34
15   2007-09-14 23:44:12 5.16
16   2007-09-14 23:50:12 5.54
17   2007-09-14 23:52:12 5.30
18   2007-09-14 23:57:12 5.20

Main Challenges:

How I can learn the code to discriminate every 15 minute while there are not exact 15 minutes time intervals due to different sampling frequency.

Bentonbentonite answered 11/12, 2012 at 10:40 Comment(3)
Does the rolling average "restart" if a new 15-minutes window starts? Or should the average calculate the "last" 15 minutes?Newsstand
@a_horse_with_no_name, Actually, the data-set includes 4 weeks historic data and I need the moving average results as a new data-set.Bentonbentonite
That doesn't answer my question.Newsstand
N
9

Assuming you want to restart the rolling average after each 15 minute interval:

select id, 
       temp,
       avg(temp) over (partition by group_nr order by time_read) as rolling_avg
from (       
  select id, 
         temp,
         time_read, 
         interval_group,
         id - row_number() over (partition by interval_group order by time_read) as group_nr
  from (
    select id, 
           time_read, 
           'epoch'::timestamp + '900 seconds'::interval * (extract(epoch from time_read)::int4 / 900) as interval_group,
           temp
    from readings
  ) t1
) t2
order by time_read;

It is based on Depesz's solution to group by "time ranges":

Here is an SQLFiddle example: http://sqlfiddle.com/#!1/0f3f0/2

Newsstand answered 11/12, 2012 at 12:32 Comment(2)
Since id - row_number() is used as group_nr, I think this will only work when the IDs are contiguous (i.e. there must be no gaps in the IDs)Laureenlaurel
"Assuming you want to restart the rolling average after each 15 minute interval" - that is not how rolling averages work.Morelli
S
12

You can join your table with itself:

select l1.id, avg( l2.Temperature )
from l l1
inner join l l2 
   on l2.id <= l1.id and
      l2.Timestamps + interval '15 minutes' > l1.Timestamps
group by l1.id
order by id
;

Results:

| ID |            AVG |
-----------------------
|  1 |           5.39 |
|  2 |          5.365 |
|  3 | 5.296666666667 |
|  4 |         5.3575 |
|  5 |          5.346 |
|  6 | 5.321666666667 |
|  7 | 5.331428571429 |

Notice: Only 'hard work' is made. You should join result with original table or append new columns to query. I don't know your final query needed. Adapt this solution or ask for more help.

Stardom answered 11/12, 2012 at 11:0 Comment(0)
N
9

Assuming you want to restart the rolling average after each 15 minute interval:

select id, 
       temp,
       avg(temp) over (partition by group_nr order by time_read) as rolling_avg
from (       
  select id, 
         temp,
         time_read, 
         interval_group,
         id - row_number() over (partition by interval_group order by time_read) as group_nr
  from (
    select id, 
           time_read, 
           'epoch'::timestamp + '900 seconds'::interval * (extract(epoch from time_read)::int4 / 900) as interval_group,
           temp
    from readings
  ) t1
) t2
order by time_read;

It is based on Depesz's solution to group by "time ranges":

Here is an SQLFiddle example: http://sqlfiddle.com/#!1/0f3f0/2

Newsstand answered 11/12, 2012 at 12:32 Comment(2)
Since id - row_number() is used as group_nr, I think this will only work when the IDs are contiguous (i.e. there must be no gaps in the IDs)Laureenlaurel
"Assuming you want to restart the rolling average after each 15 minute interval" - that is not how rolling averages work.Morelli
R
4

Here's an approach that utilises the facility to use an aggregation function as a window function. The aggregate function keeps the last 15 minutes' worth of observations in an array, along with the current running total. The state transition function shifts elements off the array that have fallen behind the 15-minute window, and pushes on the latest observation. The final function simply computes the mean temperature in the array.

Now, as to whether this is a benefit or not... it depends. It focuses on the plgpsql-execution part of postgresql rather than database-access part, and my own experiences is that plpgsql is not fast. If you can easily do lookups back to the table to find the previous 15 minutes' rows for each observation, a self-join (as in @danihp answer) will do well. However, this approach can deal with the observations coming from some more complex source, where those lookups aren't practical. As ever, trial and compare on your own system.

-- based on using this table definition
create table observation(id int primary key, timestamps timestamp not null unique,
                         temperature numeric(5,2) not null);

-- note that I'm reusing the table structure as a type for the state here
create type rollavg_state as (memory observation[], total numeric(5,2));

create function rollavg_func(state rollavg_state, next_in observation) returns rollavg_state immutable language plpgsql as $$
declare
  cutoff timestamp;
  i int;
  updated_memory observation[];
begin
  raise debug 'rollavg_func: state=%, next_in=%', state, next_in;
  cutoff := next_in.timestamps - '15 minutes'::interval;
  i := array_lower(state.memory, 1);
  raise debug 'cutoff is %', cutoff;
  while i <= array_upper(state.memory, 1) and state.memory[i].timestamps < cutoff loop
    raise debug 'shifting %', state.memory[i].timestamps;
    i := i + 1;
    state.total := state.total - state.memory[i].temperature;
  end loop;
  state.memory := array_append(state.memory[i:array_upper(state.memory, 1)], next_in);
  state.total := coalesce(state.total, 0) + next_in.temperature;
  return state;
end
$$;

create function rollavg_output(state rollavg_state) returns float8 immutable language plpgsql as $$
begin
  raise debug 'rollavg_output: state=% len=%', state, array_length(state.memory, 1);
  if array_length(state.memory, 1) > 0 then
    return state.total / array_length(state.memory, 1);
  else
    return null;
  end if;
end
$$;

create aggregate rollavg(observation) (sfunc = rollavg_func, finalfunc = rollavg_output, stype = rollavg_state);

-- referring to just a table name means a tuple value of the row as a whole, whose type is the table type
-- the aggregate relies on inputs arriving in ascending timestamp order
select rollavg(observation) over (order by timestamps) from observation;
Retiform answered 11/12, 2012 at 12:28 Comment(0)
M
0

Based on dani herrera's answer:

select l1.id,
       l1.time_read, 
       l1.temp ,
       avg( l2.Temp ) as rolling_avg
  from readings l1
 inner join readings l2 
    on l2.id <= l1.id and
       l2.time_read + interval '15 minutes' > l1.time_read
 group by l1.id
 order by time_read;

Here is an SQLFiddle: http://sqlfiddle.com/#!17/9db74/161 and the data in a chart would look like this:

chart

Morelli answered 6/10, 2022 at 5:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.