How to aggregate timeseries in Python?

I have two different timeseries with partially overlapping timestamps:

import scikits.timeseries as ts
from datetime import datetime 
a = ts.time_series([1,2,3], dates=[datetime(2010,10,20), datetime(2010,10,21), datetime(2010,10,23)], freq='D')
b = ts.time_series([4,5,6], dates=[datetime(2010,10,20), datetime(2010,10,22), datetime(2010,10,23)], freq='D')

which represents following data:

Day:   20. 21. 22. 23.
  a:    1   2   -   3
  b:    4   -   5   6

I would like to calculate a weighted average on every day with coefficients a(0.3) and b(0.7), while ignoring missing values:

Day 20.: (0.3 * 1 + 0.7 * 4) / (0.3 + 0.7) = 3.1 / 1.  = 3.1
Day 21.: (0.3 * 2          ) / (0.3      ) = 0.6 / 0.3 = 2
Day 22.: (          0.7 * 5) / (      0.7) = 3.5 / 0.7 = 5
Day 23.: (0.3 * 3 + 0.7 * 6) / (0.3 + 0.7) = 3.1 / 1.  = 5.1

when I first try to align these timeseries:

a1, b1 = ts.aligned(a, b)

I get correctly masked timeseries:

timeseries([1 2 -- 3],
  dates = [20-Oct-2010 ... 23-Oct-2010],
  freq  = D)

timeseries([4 -- 5 6],
  dates = [20-Oct-2010 ... 23-Oct-2010],
  freq  = D)

but when I do a1 * 0.3 + b1 * 0.7, it ignores values, that are present in one timeseries only:

timeseries([3.1 -- -- 5.1],
   dates = [20-Oct-2010 ... 23-Oct-2010],
   freq  = D)

What should I do to receive the awaited?

timeseries([3.1 2. 5. 5.1],
   dates = [20-Oct-2010 ... 23-Oct-2010],
   freq  = D)

EDIT: The answer should be applicable also to more than two initial timeseries with different weights and differently missing values.

So if we have four timeseries with weights T1(0.1), T2(0.2), T3(0.3) and T4(0.4), their weights at a given timestamp will be:

            |  T1 |  T2 |  T3 |  T4 |
weight      | 0.1 | 0.2 | 0.3 | 0.4 |
-------------------------------------
all present | 10% | 20% | 30% | 40% |
T1 missing  |     | 22% | 33% | 45% |
T1,T2 miss. |     |     | 43% | 57% |
T4 missing  | 17% | 33% | 50% |     |
etc.

aWgt = 0.3 bWgt = 0.7 print (np.where(a1.mask, 0., a1.data * aWgt) + np.where(b1.mask, 0., b1.data * bWgt)) / (np.where(a1.mask, 0., aWgt) + np.where(b1.mask, 0., bWgt)) # array([ 3.1, 2. , 5. , 5.1])

def weightedAvg(weightedTimeseries): sumA = np.sum((np.where(ts.mask, 0., ts.data * weight) for ts, weight in weightedTimeseries), axis=0) sumB = np.sum((np.where(ts.mask, 0., weight) for ts, weight in weightedTimeseries), axis=0) return np.divide(sumA, sumB) weightedAvg(((a1, 0.3), (bb, 0.7))) # array([ 3.1, 2. , 5. , 5.1])

Recommended topics

Hot tags