I'm having a problem thats limiting me quite a bit. We are trying to sample our data by grouping time. We have millions of points and want to fetch every Nth point in a given interval. We have implemented a solution that calculates the time difference in this interval and then groups by it to receive the correct amount of points.
SELECT last(value) as value FROM measurement WHERE time >= '...' AND time <= '...' GROUP BY time(calculated_time) fill(none)
The amount of points returned seems to be correct, but the dates are not.
See the results below:
Without sampling
> SELECT value FROM "measurement" WHERE time >= '2016-01-01T00:00:00Z' AND time <= '2017-01-01T00:00:00Z' LIMIT 5;
name: measurement
time value
---- -----
2016-01-01T00:00:00Z 61.111
2016-01-01T01:00:00Z 183.673
2016-01-01T02:00:00Z 200
2016-01-01T03:00:00Z 66.667
2016-01-01T04:00:00Z 97.959
With Sampling
> SELECT last(value) as value FROM "measurement" WHERE time >= '2016-01-01T00:00:00Z' AND time <= '2017-01-01T00:00:00Z' GROUP BY time(23m) fill(none) LIMIT 5;
name: measurement
time value
---- -----
2015-12-31T23:44:00Z 61.111
2016-01-01T00:53:00Z 183.673
2016-01-01T01:39:00Z 200
2016-01-01T02:48:00Z 66.667
2016-01-01T03:57:00Z 97.959
I expect the data to be returned to have the correct timestamp as in the database, regardless of the time used in the aggregation function. Instead, the time returned seems to be a multiple of the aggregated time. That is, if my aggregation is GROUP BY time(7m)
then the points seem to a multiple of 7 apart.
If there is no solution to my problem with influx, is there an alternative database I can use where I can accomplish this? The data in this example is uniform and evenly distributed, but this is not always the case. More often than not it will be randomly distributed (spans of seconds to minutes).