TypeError: incompatible index of inserted column with frame index when applying a custom function

About

Asked 28/7, 2021 at 6:15 Answered 6/8, 2021 at 23:5

Solved python pandas dataframe group-by apply

I want to apply a function on groups of a data frame and get the function output as a new column.

Here is the function that I wrote:

def get_centroids(sample):
    
    # Ideally, re = complex_function(sample) that returns 1d array which has the same length as sample
    # for simplicity let's use np.random.rand(len(sample))

    re = pd.DataFrame({'B': np.random.rand(len(sample))})
    print(re)
    print(re.index)  
    return re

The function prints,

   B
0  0.176083
1  0.984371

RangeIndex(start=0, stop=2, step=1)

Let's look at this data frame. For simplicity, it has only one group 'a'.

df = pd.DataFrame({'A': 'a a'.split(),
                   'B': [1,43],
                   'C': [4,2]})

    A   B   C
0   a   1   4
1   a   43  2

print(df.index)
RangeIndex(start=0, stop=2, step=1)

When I apply the function as below,

df['test'] = df.groupby('A')[['B']].apply(get_centroids)

it throws "TypeError: incompatible index of inserted column with frame index" though df and re has the similar type of indexes. Any help would be appreciated.

Paranoiac answered 28/7, 2021 at 6:15 Comment(6)

Try passing group_keys=False to groupby and please see the documentation & experiment with group_keys parameter via printing groupby result without assigning it to a column. – Woodnote 28/7, 2021 at 6:22

Thanks for the suggestion. I gave a quick try with group_keys=False, but it still gives the same error. I will dig more with it. – Paranoiac 28/7, 2021 at 10:16

I tried but it didn't give any error: df["test"] = df.groupby("A", group_keys=False)[["B"]].apply(get_centroids) on the sample data you provided above. – Woodnote 28/7, 2021 at 10:46

Thanks mate! But it still throws the error... did you run the entire statement? as in altogether with df["test"] = – Paranoiac 29/7, 2021 at 2:9

Yes the entire statement and no error. I use pandas version 1.2.4. – Woodnote 29/7, 2021 at 6:2

Mine is 1.1.2. However, I tried with the version 1.2.4 in here (programiz.com/python-programming/online-compiler), but still throws the error. – Paranoiac 6/8, 2021 at 22:34

While I was playing around with the suggestions, I realised that df.groupby('A')[['B']].apply(get_centroids) alone works fine, and the assignment causes the error.

In other words, df does not receive well df.groupby('A')[['B']].apply(get_centroids). I then decided to check for df.groupby('A')[['B']].apply(get_centroids).index which is

MultiIndex([('a', 0),
            ('a', 1)],
           names=['A', None])

The index of df was RangeIndex(start=0, stop=2, step=1). Therefore, RangeIndex vs MultiIndex mismatach caused the issue.

This can be solved by resetting and setting the index of df.groupby('A')[['B']].apply(get_centroids) as below.

df['test'] = df.groupby('A')[['B']].apply(get_centroids).reset_index().set_index('level_1').drop('A',axis=1)

The same solution has been proposed here https://mcmap.net/q/1357938/-groupby-pandas-incompatible-index-of-inserted-column-with-frame-index.

Paranoiac answered 6/8, 2021 at 23:5 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags