I think I can guess what is happening:
In [481]: df=pd.DataFrame( { 'x':[0,0,.1,.2,0,0] } )
In [482]: df2 = pd.rolling_sum(df,window=2)
In [483]: df2
Out[483]:
x
0 NaN
1 0.000000e+00
2 1.000000e-01
3 3.000000e-01
4 2.000000e-01
5 2.775558e-17
It looks OK, except for the last one, right? In fact, the rounding has obscured that some of the other entries are not as clean as they appear at first glance. It's just that the default display formats are going to disguise this unless you have a value very close to zero.
In [493]: for i in range(6):
...: print '%22.19f' % df2.ix[i,'x']
nan
0.0000000000000000000
0.1000000000000000056
0.3000000000000000444
0.2000000000000000389
0.0000000000000000278
What's happening here is that rolling_sum is not going to actually do a fresh sum each time. Rather it is going to update the sum by adding the newest number and removing the oldest number. In this trivial example with window=2
, that won't be useful, but if the window is much larger, that could speed the computation up considerably, so it makes sense to do it that way.
However, that means that some unexpected results can happen. You're expecting the last rolling sum to be the results of 0+0
, but it's not, it actually something like this:
In [492]: (.0+.0)+(.1-.0)+(.2-.0)+(.0-.1)+(.0-.2)
Out[492]: 2.7755575615628914e-17
Bottom line: Your results are basically fine. It just happens that the way you did it (with these data) revealed the underlying precision issues that are inherent in these things. This happens a lot but the default display will generally hide these things that are happening at the 13th decimal place.
Edit to add: Based on Korem's comment, small negative numbers are in fact causing a problem. I think the best thing to do in this case is to use numpy's around
function and replace the second step above with:
df2 = np.around(pd.rolling_sum(df,window=2),decimals=5)
That will force all small numbers (positive or negative) to zero. I think that's a pretty safe general solution. If all your data have integer values you could recast as integers, but that's not a very general solution, obviously.
x.sort(ascending=False)
beforerolling_sum
fix the issue? – Telesispandas.rolling_sum
work? (Instead ofpandas.stats.moments.rolling_sum
) – Telesisprint "%20.15f" % x[0]
– Alternate1.0e-13
. Unless you actually care about that level of precision, I'm not sure you really have a problem. E.g.In [304]: .1 + .1 + .1 - .3 Out[304]: 5.551115123125783e-17
– Alternate