Sliding window iterator using rolling in pandas

Asked 21/7, 2016 at 16:4 Answered 11/2, 2022 at 13:12

Solved python pandas numpy dataframe pandas-groupby

If it's single row, I can get the iterator as following

import pandas as pd
import numpy as np

a = np.zeros((100,40))
X = pd.DataFrame(a)

for index, row in X.iterrows():
    print index
    print row

Now I want each iterator will return a subset X[0:9, :], X[5:14, :], X[10:19, :] etc. How do I achieve this with rolling (pandas.DataFrame.rolling)?

Mak answered 21/7, 2016 at 16:4 Comment(9)

what is X? what is rolling? – Chamberlin 21/7, 2016 at 16:7

Please see updated question. – Mak 21/7, 2016 at 16:14

OK. What does X[0:9] mean? – Chamberlin 21/7, 2016 at 16:15

It's row 0 to 9 of X, that is X[0:9, 0:40]. I want to get iterator, instead of slicing as it is slow for big dataset. – Mak 21/7, 2016 at 16:17

Why?? iterrows is slower than indexing? – Mak 21/7, 2016 at 16:31

Yes, to my knowledge iterrows is always slower than using index slicing. How big is your dataset. – Yanirayank 21/7, 2016 at 16:33

It's about 2000000x500 ints – Mak 21/7, 2016 at 16:35

You may need to prune the 500 columns. Work on a using only what you need vs everything. – Yanirayank 21/7, 2016 at 16:37

I updated my solution to return the iterable such that you don't need to remember to stack. – Berkow 21/7, 2016 at 17:6

I'll experiment with the following dataframe.

Setup

import pandas as pd
import numpy as np
from string import uppercase

def generic_portfolio_df(start, end, freq, num_port, num_sec, seed=314):
    np.random.seed(seed)
    portfolios = pd.Index(['Portfolio {}'.format(i) for i in uppercase[:num_port]],
                          name='Portfolio')
    securities = ['s{:02d}'.format(i) for i in range(num_sec)]
    dates = pd.date_range(start, end, freq=freq)
    return pd.DataFrame(np.random.rand(len(dates) * num_sec, num_port),
                        index=pd.MultiIndex.from_product([dates, securities],
                                                         names=['Date', 'Id']),
                        columns=portfolios
                       ).groupby(level=0).apply(lambda x: x / x.sum())    


df = generic_portfolio_df('2014-12-31', '2015-05-30', 'BM', 3, 5)

df.head(10)

I'll now introduce a function to roll a number of rows and concatenate into a single dataframe where I'll add a top level to the column index that indicates the location in the roll.

Solution Step-1

def rolled(df, n):
    k = range(df.columns.nlevels)
    _k = [i - len(k) for i in k]
    myroll = pd.concat([df.shift(i).stack(level=k) for i in range(n)],
                       axis=1, keys=range(n)).unstack(level=_k)
    return [(i, row.unstack(0)) for i, row in myroll.iterrows()]

Though its hidden in the function, myroll would look like this

Now we can use it just like an iterator.

Solution Step-2

for i, roll in rolled(df.head(5), 3):
    print roll
    print

                    0   1   2
Portfolio                    
Portfolio A  0.326164 NaN NaN
Portfolio B  0.201597 NaN NaN
Portfolio C  0.085340 NaN NaN

                    0         1   2
Portfolio                          
Portfolio A  0.278614  0.326164 NaN
Portfolio B  0.314448  0.201597 NaN
Portfolio C  0.266392  0.085340 NaN

                    0         1         2
Portfolio                                
Portfolio A  0.258958  0.278614  0.326164
Portfolio B  0.089224  0.314448  0.201597
Portfolio C  0.293570  0.266392  0.085340

                    0         1         2
Portfolio                                
Portfolio A  0.092760  0.258958  0.278614
Portfolio B  0.262511  0.089224  0.314448
Portfolio C  0.084208  0.293570  0.266392

                    0         1         2
Portfolio                                
Portfolio A  0.043503  0.092760  0.258958
Portfolio B  0.132221  0.262511  0.089224
Portfolio C  0.270490  0.084208  0.293570

Berkow answered 21/7, 2016 at 16:53 Comment(1)

Did you look at np.strides at all? Might use the above on something that Im working on. – Yanirayank 21/7, 2016 at 18:0

That's not how rolling works. It "provides rolling transformations" (from the docs).

You can loop and use pandas indexing?

for i in range((X.shape[0] + 9) // 10):
    X_subset = X.iloc[i * 10: (i + 1) * 10]

Pittsburgh answered 21/7, 2016 at 16:21 Comment(1)

How can I get an iterator instead, similar to iterrows()? – Mak 21/7, 2016 at 16:27

It can be done with rolling but is inefficient (due to iterating all windows) and requires Pandas 1.1:

for index, row in enumerate(list(X.rolling(10))[::5]):

Bit faster (iteration in C)

from itertools import islice
for index, row in enumerate(islice(X.rolling(10), None, None, 5)):

Cookhouse answered 11/2, 2022 at 13:12 Comment(0)

Setup

Solution Step-1

Solution Step-2

Recommended topics

Hot tags