django aggregation to lower resolution using grouping by a date range
Asked Answered
M

5

7

horrible title, but let me explain: i've got this django model containing a timestamp (date) and the attribute to log - f.e. the number of users consuming some ressource - (value).

class Viewers(models.Model):
    date = models.DateTimeField()
    value = models.IntegerField()

for each 10seconds the table contains the number of users. something like this:

| date | value |
|------|-------|
|  t1  |   15  |
|  t2  |   18  |
|  t3  |   27  |
|  t4  |   25  |
|  ..  |   ..  |
|  t30 |   38  |
|  t31 |   36  |
|  ..  |   ..  |

now i want to generate different statistics from this data, each with another resolution. f.e. for a chart of the last day i don't need the 10second resolution, so i want 5 minute steps (that are build by averaging the values (and maybe also the date) of the rows from t1 to t29, t30 to t59, ...), so that i'll get:

| date | value |
|------|-------|
|  t15 |   21  |
|  t45 |   32  |
|  ..  |   ..  |

the attributes to keep variable are start & end timestamp and the resolution (like 5 minutes). is there a way using the django orm/queryset api and if not, how to reach this with custom sql?

Meadowsweet answered 7/6, 2011 at 14:3 Comment(0)
Z
4

I've been trying to solve this problem in the most 'django' way possible. I've settled for the following. It averages the values for 15minute time slots between start_date and end_date where the column name is'date':

readings = Reading.objects.filter(date__range=(start_date, end_date)) \
   .extra(select={'date_slice': "FLOOR (EXTRACT (EPOCH FROM date) / '900' )"}) \
   .values('date_slice') \
   .annotate(value_avg=Avg('value'))

It returns a dictionary:

 {'value_avg': 1116.4925373134329, 'date_slice': 1546512.0}
 {'value_avg': 1001.2028985507246, 'date_slice': 1546513.0}
 {'value_avg': 1180.6285714285714, 'date_slice': 1546514.0}

The core of the idea comes from this answer to the same question for PHP/SQL. The code passed to extra is for a Postgres DB.

Zoospore answered 12/2, 2014 at 23:6 Comment(1)
To use this with MySQL, replace the call to extra() with: .extra(select={"date_slice": "from_unixtime(unix_timestamp(time) - unix_timestamp(time) mod 900)"})Boycie
S
2
from django.db.models import Avg

Viewers.objects.filter(date__range=(start_time, end_time)).aggregate(average=Avg('value'))

That will get you the average of all the values between start_time and end_time, returned as a dictionary in the form of { 'average': <the average> }.

start_time and end_time need to be Python datetime objects. So if you have a timestamp, or something, you'll need to convert it first. You can also use datetime.timedelta to calculate the end_time based on the start_time. For a five minute resolution, something like this:

from datetime import timedelta

end_time = start_time + timedelta(minutes=5)
Smirch answered 7/6, 2011 at 15:2 Comment(4)
ayo, thats the simple range filter. i think i didn't explained explicitly enough that between the start_time and the end_time can be - or better 'are' - many periodes/timedeltas of f.e. 5minutes. so the result is not a single aggregated value but a table like the second in my question: with a timestamp (middle timestamp of the 5minute time period) and the averaged value of this period. get it? :) it seems actually to be a simple compacting job but i'm tying to achieve this for 3h now :(Meadowsweet
You essentially need a loop, that's really not going to be possible with a single DB query, straight SQL or otherwise. You could always use the above code in a loop, increasing the start and end times by 5 minutes in each iteration, but that's probably highly inefficient.Smirch
as chrisdpratt said, I don't think you could do that with SQL. You should get a queryset with all the values (or a subset) and calculate the averages with PythonIain
hrrhrr, look at my answerMeadowsweet
S
1

have you looked at the range filter?

https://docs.djangoproject.com/en/dev/ref/models/querysets/#range

The example given in the doc's seems similar to your situation.

Schorl answered 7/6, 2011 at 14:50 Comment(1)
hey, thanx 4 reponse. as you said, it's just a filter which i can pick out one the periodes (5minutes) with. but i want to do this for potentially many periodes.Meadowsweet
N
0

Slightly improving upon the answer by @Richard Corden, in Postgresql you can do

def for_interval(self, start=None, end=None, interval=60):
    # (Check start and end values...)
    return self
        .filter(timestamp__range=(start, end)) \
        .annotate(
            unix_timestamp=Floor(Extract('timestamp', 'epoch') / interval) * interval,
            time=Func(F('unix_timestamp'), function="TO_TIMESTAMP", output_field=models.DateTimeField()),
        ) \
        .values('time') \
        .annotate(value=Avg('value')) \
        .order_by('time')

I would also recommend storing the floor of the interval rather than its midpoint.

Nica answered 8/8, 2022 at 12:37 Comment(0)
M
-1

After long trying i made it as SQL-statement:

SELECT FROM_UNIXTIME(AVG(UNIX_TIMESTAMP(date))), SUM(value)
FROM `my_table`
WHERE date BETWEEN SUBTIME(NOW( ), '0:30:00') AND NOW()
GROUP BY UNIX_TIMESTAMP(date) DIV 300
ORDER BY date DESC

with

start_time = SUBTIME(NOW( ), '0:30:00')
end_time = NOW()
period = 300 # in seconds

in the end - not really hard - and indeed independent from the time resolution of the samplings in the origin table.

Meadowsweet answered 8/6, 2011 at 7:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.