Aggregating timeseries from sensors
Asked Answered
M

2

2

I have about 500 sensors which emit a value about once a minute each. It can be assumed that the value for a sensor remains constant until the next value is emitted, thus creating a time series. The sensors are not synchronized in terms of when they emit data (so the observation timestamps vary), but it's all collected centrally and stored per sensor (to allow filtering by subset of sensors).

How can I produce an aggregate time series that gives the sum of the data from the sensors? n (need to create a time series over 1 day's set of observations - so will need to take into account 24x60x500 observations per day). The calculations also need to be fast, preferrably run in in < 1s.

Example - raw input:

q)n:10
q)tbl:([]time:n?.z.t;sensor:n?3;val:n?100.0)
q)select from tbl
time         sensor val
----------------------------    
01:43:58.525 0      33.32978
04:35:12.181 0      78.75249
04:35:31.388 0      1.898088    
02:31:11.594 1      16.63539
07:16:40.320 1      52.34027
00:49:55.557 2      45.47007
01:18:57.918 2      42.46532
02:37:14.070 2      91.98683
03:48:43.055 2      41.855
06:34:32.414 2      9.840246

The output I'm looking for should show the same timestamps, and the sum across sensors. If a sensor doesn't have a record defined at a matching timestamp, then it's previous value should be used (the records only imply times when the output from the sensor changes).

Expected output, sorted by time

time         aggregatedvalue
----------------------------    
00:49:55.557   45.47007  / 0 (sensor 0) + 0 (sensor 1) + 45.47007 (sensor 2)
01:18:57.918   42.46532  / 0 (sensor 0) + 0 (sensor 1) + 42.46532 (new value on sensor 2)
01:43:58.525   75.7951   / 33.32978 + 0 + 42.46532
02:31:11.594   92.43049  / 33.32978 + 16.63539 + 42.46532
02:37:14.070   141.952   / 33.32978 + 16.63539 + 91.98683
03:48:43.055   91.82017  / 33.32978 + 16.63539 + 41.855
04:35:12.181   137.24288 / 78.75249 + 16.63539 + 41.855
04:35:31.388   60.388478 / 1.898088 + 16.63539 + 41.855   
06:34:32.414   28.373724 / 1.898088 + 16.63539 + 9.840246 
07:16:40.320   64.078604 / 1.898088 + 52.34027 + 9.840246
Metaphysics answered 3/5, 2013 at 19:18 Comment(0)
C
0

I'm assuming the records are coming in in time order, therefore tbl will be sorted by time. If this is not the case, sort the table by time first.

d is a dictionary of last price by sensor at each time. The solution below is probably not the most elegent and I can imagine a more performant method is available that would not require the each.

q)d:(`long$())!`float$()
q)f:{d[x]::y;sum d} 
q)update agg:f'[sensor;val] from tbl
time         sensor val      agg     
-------------------------------------
00:34:28.887 2      53.47096 53.47096
01:05:42.696 2      40.66642 40.66642
01:26:21.548 1      41.1597  81.82612
01:53:10.321 1      51.70911 92.37553
03:42:39.320 1      17.80839 58.47481
05:15:26.418 2      51.59796 69.40635
05:47:49.777 0      30.17723 99.58358
11:32:19.305 0      39.27524 108.6816
11:37:56.091 0      71.11716 140.5235
12:09:18.458 1      78.5033  201.2184

Your data set of 720k records will be relatively small, so any aggregations should be well under a second. If you storing many days of data you may want to consider some of the techniques (splaying, partitioning etc) outlined here .

Caldarium answered 3/5, 2013 at 19:39 Comment(3)
thanks for quick reply, but not getting output quite right. I have updated question with expected output. I have about 3 years worth of data, so will need to look at splaying/partitioning as well.Metaphysics
just updated my answer. As I mention, there are probably more efficient ways of doing this in vector form.Caldarium
ok, that gives the right output. Unfortunately there's no guarantee of arrival order (only approximate time order)Metaphysics
A
0

Its been a while since I have spent a lot of time with this. Would it help to go back, after you have a large batch and perform linear interpolation calculations at specific intervals and store this data. I have worked on sensor data that comes in ordered by time but the sensors only send data when the data actually changes. To accelerate reporting and other calculations, we actually aggregate the data in certain periods (like 1 second, 30 seconds, 1 min), often doing the averaging you are talking about along the way. While we are doing this we also perform linear interpolation, as well.

The downside is it requires additional storage space. But the performance gains are significant.

Looks like you have a great proposed solution already.

Amp answered 6/5, 2013 at 0:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.