groupby pandas : incompatible index of inserted column with frame index
Asked Answered
U

4

8

I have performed a groupby on pandas and I want to apply a complex function which needs several inputs and gives as output a pandas Series that I want to burn in my original dataframe. this is a known procedure to me and has worked very well - that is excpet in this last case (of which I forward my apologies for not being able to post the code in its entirety). essentially I get a TypeError: incompatible index of inserted column with frame index. but, as shown below, I shouldn't get one.

group_by part:

all_in_data_risk['weights_of_the_sac'] = all_in_data_risk.groupby(['ptf', 'ac'])['sac', 'unweighted_weights_by_sac', 'instrument_id', 'risk_budgets_sac'].apply(lambda x: wrapper_new_risk_budget(x, temp_fund_all_ret, method_compute_cov))

where the function is:

def wrapper_new_risk_budget:
     print(x.index)
     ...     
     print(result.index)
     return result.loc[:, 'res']

which raised this error:

    raise TypeError('incompatible index of inserted column '
TypeError: incompatible index of inserted column with frame index

the problem is this:

print(np.array_equal(result.index, x.index))

yields all True. this should be a guarantee of index matching and therefore the problem should not simply be there.

now, I understand the information I am providing is scarce to say the least but do you happen to have any insight on where the problem lies?

p.s.: I have already tried transforming the result in a dataframe and tried to recast the output as pd.Series(result.loc[:, 'res'].values, index=result.index)

Unbound answered 8/9, 2016 at 7:17 Comment(0)
R
4

I met this problem and find a way to solve it. In my case, I need to do this: df.groupby('id').apply(func), then it returns a nx1 dataframe, its shape is exactly the same as the df.shape[0], but it occurs the same problem.

It's because when you first groupby, you will receive a mutiple index, it's different from the df.

But you can solve the problem by reset and reappoint the origin index, such as:

df['a']=df.groupby('id').apply(lambda x:func(x)).reset_index().set_index('level_1').drop('id',axis=1)

BTW, you should be very careful about the function. The return dataframe should include the same index of df.

Roter answered 19/3, 2021 at 13:26 Comment(2)
Solved this #68555785 using this answer!Scrimpy
This fixed it for me too, thanks! I also found the group_keys param to groupby, which is another way to solve it (e.g. df["new_col"] = df.groupby("group_col"), group_keys=False).apply(func))Falgout
U
1

ok, for reasons beyond my understanding, when I performed a merge inside the code, although their numpy representation was equivalent, they differed for something else before pandas' eyes. I tried a work-around of the merge (longer and more inefficient) and now with more traditional means it works.

today I won't be able to post the complete example since I am very pressed for time and I have a deadline looming over but I will complete it as soon as possible both to show respect to those who have answered or tried to do so and to all the other users who might find something beneficial in the resolution of this problem

Unbound answered 8/9, 2016 at 7:59 Comment(2)
so where is the solutionIngrid
also having that issue somehow...hmWhodunit
C
1

Setting the grouping columns as Index helped to resolve this for me

df.set_index(grouping_items, inplace = True)

Caudill answered 16/8 at 15:15 Comment(0)
C
0

Simplify the problem:

In the original question something like this should be done:

df[‘new_column’] = df.groupby(...).aggregationfunction()

This works usually if at least one of these conditions is fulfilled:

  1. The groupyby is only over one column.
  2. The groupyby aggregation function does not reduce the number of rows. (e.g. cumcount() )

In case BOTH conditions are NOT given at the same time the error “TypeError: incompatible index of the inserted column with frame index” may arise.

Example for the rising Error

See the following example:

df = pd.DataFrame({'foo':[0,1]*2,'foo2':np.zeros(4).astype(int),'bar':np.arange(4)})
df

>     foo    foo2     bar
> 0     0       0       0
> 1     1       0       1
> 2     0       0       2
> 3     1       0       3

df['bar_max'] = df.groupby(['foo','foo2'])['bar'].max()
> TypeError: incompatible index of inserted column with frame index

Solution

With "as_index= False" in the groupby you may create a dataframe which you may join to the original one:

df_grouped = df.groupby(['foo','foo2'], as_index= False)['bar'].max().rename(columns={'bar': 'bar_max'})
df = df.merge(df_grouped, on = ['foo','foo2'])
df

>   foo     foo2    bar     bar_max
>0  0       0       0       2
>1  0       0       2       2
>2  1       0       1       3
>3  1       0       3       3
Chaussure answered 21/7, 2021 at 10:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.