Multiindex duplicated when rolling() applied on a groupby pandas object
Asked Answered
F

2

5

I have got a bug with:

x.field.rolling(window=5,min_periods=1).mean() where x is a pandas.core.groupby.groupby.DataFrameGroupBy object.

I tried with the solution proposed in this page. So I did this:

x.field.apply(lambda x: x.rolling(window=5,min_periods=1).mean())

Contrary to the webpage introduced above, I still get the same bug.

+---------+---------+-------+--------------------+
| machin  | machin  | truc  | a column of series |
+---------+---------+-------+--------------------+
| machin1 | machin1 | truc1 | 1                  |
|         |         | truc2 | 2                  |
|         |         | truc3 | 3                  |
|         |         | truc4 | 4                  |
| machin2 | machin2 | truc1 | 100                |
|         |         | truc2 | 99                 |
|         |         | truc3 | 98                 |
+---------+---------+-------+--------------------+

as you can see, the column index 'machin' is duplicated while before using the rolling method it appears correctly.

For instance let's write x.field.apply(lambda x: x+1). It returns:

+---------+-------+--------------------+
| machin  | truc  | a column of series |
+---------+-------+--------------------+
| machin1 | truc1 | 2                  |
|         | truc2 | 3                  |
|         | truc3 | 4                  |
|         | truc4 | 5                  |
| machin2 | truc1 | 101                |
|         | truc2 | 100                |
|         | truc3 | 99                 |
+---------+-------+--------------------+

So no duplication, no bug. It shows that's really an issue from the rolling() method.

Here some code to help you to reproduce my computation

import pandas as pd

#creation of records
rec=[{'machin':'machin1',
    'truc':['truc1','truc2','truc3','truc4'],
    'a column':[1,2,3,4]},
    {'machin':'machin2',
    'truc':['truc1','truc2','truc3'],
    'a column':[100,99,98]}]

#creation of pandas dataframe
df=pd.concat([pd.DataFrame(rec[0]),pd.DataFrame(rec[1])])

#creation of multi-index
df.set_index(['machin','truc'],inplace=True)

#creation of a groupby object
x=df.groupby(by='machin')

#rolling computation. Note that to do x.field or x['field'] is the same, and gives same bug as I checked.
x['a column'].rolling(window=5,min_periods=1).mean()

#rolling with apply and lambda, gives same bug
x['a column'].apply(lambda x:x.rolling(window=5,min_periods=1).mean())

#making apply and lambda alone gives no bug
a=x['a column'].apply(lambda x: x+1)

Others solutions I tried

I tried to reset the index of the series, doc here.

a.reset_index(name='machin')

it raises an exception: ValueError: cannot insert machin, already exists

while you can see 'machin' in a names' value in the multiindex:

a.index
MultiIndex(levels=[['machin1', 'machin2'], ['machin1', 'machin2'],  ['truc1', 'truc2', 'truc3', 'truc4']],
       labels=[[0, 0, 0, 0, 1, 1, 1], [0, 0, 0, 0, 1, 1, 1], [0, 1, 2, 3, 0, 1, 2]],
       names=['machin', 'machin', 'truc'])

I tried with drop too, doc here:

a.drop(index='machin')
a.drop(index=0)

it raises an exception: KeyError: 'machin' or KeyError: 0

My versions

Python 3.7.1 (default, Dec 14 2018, 19:28:38) in an anaconda environment, even in terminal: [GCC 7.3.0] :: Anaconda, Inc. on linux

pandas 0.23.4

Fearfully answered 4/4, 2019 at 13:28 Comment(0)
U
6

Use the group_keys argument of groupby:

df.groupby('machin', group_keys=False).rolling(window=5, min_periods=1).mean()

Alternatively, you can drop the 0th level, which rolling inserts, with reset_index:

df.groupby('machin').rolling(window=5, min_periods=1).mean().reset_index(level=0, drop=True)   

Output for either:

               a column
machin  truc           
machin1 truc1       1.0
        truc2       1.5
        truc3       2.0
        truc4       2.5
machin2 truc1     100.0
        truc2      99.5
        truc3      99.0
Unexpressed answered 4/4, 2019 at 13:45 Comment(5)
First option did not work for me. But second works very well. Thank you.Fearfully
The first option didn't work for me either.Skulduggery
I think pandas has been active in this module recently. I just upgraded to version 1.3.5 and think this bug has been reintroduced. I get it even with group_keys=False.Eastertide
First solution not working for me neither. I am using pandas 2.2.0.Tragedy
I reported the group_keys=False bug here: github.com/pandas-dev/pandas/issues/59881Floruit
F
0

If you have a transformation that results in the same shape, you can use transform, which does not add an extra index level for the group:

df
.groupby('machin')
.transform(
    lambda x: x.rolling(window=5, min_periods=1).mean()
)

Otherwise, use apply, which respects the group_keys parameter:

df
.groupby('machin', group_keys=False)
.apply(
    lambda x: x.rolling(window=5, min_periods=1).mean()
)

There is a ongoing discussion to make this more obvious.

Floruit answered 24/9 at 6:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.