InfluxDB "GROUP BY time" shifts time
Asked Answered
P

1

7

I'm having a problem thats limiting me quite a bit. We are trying to sample our data by grouping time. We have millions of points and want to fetch every Nth point in a given interval. We have implemented a solution that calculates the time difference in this interval and then groups by it to receive the correct amount of points.

SELECT last(value) as value FROM measurement WHERE time >= '...' AND time <= '...' GROUP BY time(calculated_time) fill(none)

The amount of points returned seems to be correct, but the dates are not.

See the results below:

Without sampling

> SELECT value FROM "measurement" WHERE time >= '2016-01-01T00:00:00Z' AND time <= '2017-01-01T00:00:00Z' LIMIT 5;
name: measurement
time                 value
----                 -----
2016-01-01T00:00:00Z 61.111
2016-01-01T01:00:00Z 183.673
2016-01-01T02:00:00Z 200
2016-01-01T03:00:00Z 66.667
2016-01-01T04:00:00Z 97.959

With Sampling

> SELECT last(value) as value FROM "measurement" WHERE time >= '2016-01-01T00:00:00Z' AND time <= '2017-01-01T00:00:00Z' GROUP BY time(23m) fill(none) LIMIT 5;
name: measurement
time                 value
----                 -----
2015-12-31T23:44:00Z 61.111
2016-01-01T00:53:00Z 183.673
2016-01-01T01:39:00Z 200
2016-01-01T02:48:00Z 66.667
2016-01-01T03:57:00Z 97.959

I expect the data to be returned to have the correct timestamp as in the database, regardless of the time used in the aggregation function. Instead, the time returned seems to be a multiple of the aggregated time. That is, if my aggregation is GROUP BY time(7m) then the points seem to a multiple of 7 apart.


If there is no solution to my problem with influx, is there an alternative database I can use where I can accomplish this? The data in this example is uniform and evenly distributed, but this is not always the case. More often than not it will be randomly distributed (spans of seconds to minutes).

Painkiller answered 24/10, 2017 at 8:28 Comment(2)
did you solve the problem and if yes, how? : ) I have done similar grouping but have not encountered such an issue.Hyracoid
I cannot remember, long time ago :-).Painkiller
L
0

For anyone who is still interested as of InfluxDB 2.0 a new query language was introduced: Flux.

With flux there is a sample function that allows this solution (to sample every 10th element):

from(bucket: "my-bucket")
  |> range(start: 2016-01-01T00:00:00Z, stop: 2017-01-01T00:00:00Z)
  |> filter(fn: (r) => r._measurement == "measurement")
  |> filter(fn: (r) => r._field == "value")
  |> sample(n: 10)
  |> yield()
Lyckman answered 12/4, 2023 at 13:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.