I have series of measurements which are time stamped and irregularly spaced. Values in these series always represent changes of the measurement -- i.e. without a change no new value. A simple example of such a series would be:
23:00:00.100 10
23:00:01.200 8
23:00:01.600 0
23:00:06.300 4
What I want to reach is an equally spaced series of time-weighted averages. For the given example I might aim at a frequency based on seconds and hence a result like the following:
23:00:01 NaN ( the first 100ms are missing )
23:00:02 5.2 ( 10*0.2 + 8*0.4 + 0*0.4 )
23:00:03 0
23:00:04 0
23:00:05 0
23:00:06 2.8 ( 0*0.3 + 4*0.7 )
I am searching for a Python library solving that problem. For me, this seems to be a standard problem, but I couldn't find such a functionality so far in standard libraries like pandas.
The algorithm needs to take two things into account:
- time-weighted averaging
- considering values ahead of the current interval ( and possibly even ahead of the lead ) when forming the average
Using pandas
data.resample('S', fill_method='pad') # forming a series of seconds
does parts of the work. Providing a user-defined function for aggregation will allow to form time-weighted averages, but because the beginning of the interval is ignored, this average will be incorrect too. Even worse: the holes in the series are filled with the average values, leading in the example from above to the values of seconds 3, 4 and 5 to be non zero.
data = data.resample('L', fill_method='pad') # forming a series of milliseconds
data.resample('S')
does the trick with a certain accurateness, but is -- depending on the accurateness -- very expensive. In my case, too expensive.