TypeError: incompatible index of inserted column with frame index when applying a custom function
Asked Answered
P

1

0

I want to apply a function on groups of a data frame and get the function output as a new column.

Here is the function that I wrote:

def get_centroids(sample):
    
    # Ideally, re = complex_function(sample) that returns 1d array which has the same length as sample
    # for simplicity let's use np.random.rand(len(sample))

    re = pd.DataFrame({'B': np.random.rand(len(sample))})
    print(re)
    print(re.index)  
    return re

The function prints,

   B
0  0.176083
1  0.984371

RangeIndex(start=0, stop=2, step=1)

Let's look at this data frame. For simplicity, it has only one group 'a'.

df = pd.DataFrame({'A': 'a a'.split(),
                   'B': [1,43],
                   'C': [4,2]})

    A   B   C
0   a   1   4
1   a   43  2

print(df.index)
RangeIndex(start=0, stop=2, step=1)

When I apply the function as below,

df['test'] = df.groupby('A')[['B']].apply(get_centroids)

it throws "TypeError: incompatible index of inserted column with frame index" though df and re has the similar type of indexes. Any help would be appreciated.

Paranoiac answered 28/7, 2021 at 6:15 Comment(6)
Try passing group_keys=False to groupby and please see the documentation & experiment with group_keys parameter via printing groupby result without assigning it to a column.Woodnote
Thanks for the suggestion. I gave a quick try with group_keys=False, but it still gives the same error. I will dig more with it.Paranoiac
I tried but it didn't give any error: df["test"] = df.groupby("A", group_keys=False)[["B"]].apply(get_centroids) on the sample data you provided above.Woodnote
Thanks mate! But it still throws the error... did you run the entire statement? as in altogether with df["test"] =Paranoiac
Yes the entire statement and no error. I use pandas version 1.2.4.Woodnote
Mine is 1.1.2. However, I tried with the version 1.2.4 in here (programiz.com/python-programming/online-compiler), but still throws the error.Paranoiac
P
4

While I was playing around with the suggestions, I realised that df.groupby('A')[['B']].apply(get_centroids) alone works fine, and the assignment causes the error.

In other words, df does not receive well df.groupby('A')[['B']].apply(get_centroids). I then decided to check for df.groupby('A')[['B']].apply(get_centroids).index which is

MultiIndex([('a', 0),
            ('a', 1)],
           names=['A', None])

The index of df was RangeIndex(start=0, stop=2, step=1). Therefore, RangeIndex vs MultiIndex mismatach caused the issue.

This can be solved by resetting and setting the index of df.groupby('A')[['B']].apply(get_centroids) as below.

df['test'] = df.groupby('A')[['B']].apply(get_centroids).reset_index().set_index('level_1').drop('A',axis=1)

The same solution has been proposed here https://mcmap.net/q/1357938/-groupby-pandas-incompatible-index-of-inserted-column-with-frame-index.

Paranoiac answered 6/8, 2021 at 23:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.