SQL Time Series Group with max amount of results
Asked Answered
C

3

6

I have timeseries data in a table using Timescaledb.

Data is as follows:

    time                    locationid parameterid unitid value
2022-04-18T10:00:00.000Z      "1"        "1"        "2"    2.2
2022-04-18T10:00:00.000Z      "2"        "1"        "2"    3.0
2022-04-18T09:00:00.000Z      "1"        "1"        "2"    1.2
2022-04-18T09:00:00.000Z      "2"        "1"        "2"    4.0
2022-04-18T08:00:00.000Z      "1"        "1"        "2"    2.6
2022-04-18T08:00:00.000Z      "2"        "1"        "2"    3.1
2022-04-18T07:00:00.000Z      "1"        "1"        "2"    2.1
2022-04-18T07:00:00.000Z      "2"        "1"        "2"    2.7

I have 1000s of rows with time series IOT data that I am putting into graphs using HighCharts.

My question is, is there a way to limit the number of items returned in my results, but not a classic limit. I'd like to return a 256 data groups at all times. So if I had 2,560 rows my query would group by/date trunc / time_bucket every 100 rows, but if I had 512 rows my query would only group every 2 rows so that I am always returning 256 no matter what.

My current query:

SELECT time_bucket('4 hours', time) as "t"
,locationid, avg(timestamp) as "x", avg(value) as "y"
FROM probe_data
WHERE locationid = '${q.locationid}'and parameterid = '${q.parameterid}' 
and time > '${q.startDate}' and time < `${q.endDate}`
GROUP BY "t", locationid
ORDER BY "t" DESC;

It seems like I should be able to use my min date and max date to count the number of possible returns and then divide by 256? Is this the best way to do it?

Crayon answered 22/4, 2022 at 21:15 Comment(5)
Are the times all already equally spaced?Hermia
Yes, every reading for each locationid is 1 hour apart.Crayon
@Crayon this was a great question (I am from TimescaleDB). I hope it's OK with you if I add it to our forum so other users can find it... if you would like the credit and prefer to write it up yourself though let me know!Mozarab
@Mozarab go for it!Crayon
Took me a while but I've done it now, thank you!Mozarab
M
3

There are a few different ways you can do something like this:

  1. You can just change the time bucket you're using dynamically in your query with time_bucket. You can do arithmetic on intervals and get another interval back ie SELECT (now()- '2022-04-21')/256; will return an interval, this can be the first input into time_bucket. So something like
SELECT time_bucket((enddate - startdate) / 256, time) as "t" 
...
GROUP BY time_bucket((enddate - startdate) / 256, time)

Should do what you're looking for to a large extent...

However, it does mean that you're going to be getting averages of arbitrarily larger groups of data as you zoom out and doesn't horribly allow you to cache things or the like and probably isn't actually a great representation of the underlying process.

Another option would be:

  1. You can do an average at a set time_bucket that is relevant to your data analysis and then downsample that using an algorithm like largest triangle three buckets which maintains the visual accuracy of a graph in a useful way while downsampling the data. It's one of the experimental hyperfunctions that we have in TimescaleDB.

This would allow you to also use something like continuous aggregates to downsample the data with a set time_bucket and then get the number of points you need for your graph more quickly using the LTTB algorithm.

So it sort of depends what you're looking for...in some cases using LTTB on its own without doing the average or even using something like ASAP smoothing (another experimental hyperfunction) might be a better way to do what you're looking for and are built-in for this type of work! I think the docs pages have more info on the algorithms and what they're useful for, but both LTTB and ASAP are designed specifically for graphing applications so I thought I'd point them out!

Mayfield answered 25/4, 2022 at 16:30 Comment(0)
B
2

No - SQL doesn't support dynamic grouping. To do what you ask, you'd have to fetch the resultset & check the number of records returned to then run again with your logic.

Effectively, you have a presentation level issue - not a SQL issue.

Basophil answered 22/4, 2022 at 21:15 Comment(0)
I
1

Probably something with NTILE, not sure if this would work but I'd imagine doing it something like this:

SELECT avg(sub.timestamp), avg(sub.value) FROM (
    SELECT 
        timestamp,
        value,
        NTILE (256) OVER (
            ORDER BY time
       ) bucket_no
    FROM 
        probe_data
) sub GROUP BY sub.bucket_no;

Where the inner query would have all of your data broken into 256 groups, each group would then have a column bucket_no, and your outer query would group by the bucket_no

EDIT: just realized the mysql tag on this question is probably inaccurate, but I'll leave the answer as it might point you in the right direction for timescaledb.

Italia answered 22/4, 2022 at 21:24 Comment(1)
Good call, I just removed the mysql tag. I appreciate the help, I'll take a look at this and see what I can come up with. Thanks!Crayon

© 2022 - 2024 — McMap. All rights reserved.