I want to apply a function on groups of a data frame and get the function output as a new column.
Here is the function that I wrote:
def get_centroids(sample):
# Ideally, re = complex_function(sample) that returns 1d array which has the same length as sample
# for simplicity let's use np.random.rand(len(sample))
re = pd.DataFrame({'B': np.random.rand(len(sample))})
print(re)
print(re.index)
return re
The function prints,
B
0 0.176083
1 0.984371
RangeIndex(start=0, stop=2, step=1)
Let's look at this data frame. For simplicity, it has only one group 'a'.
df = pd.DataFrame({'A': 'a a'.split(),
'B': [1,43],
'C': [4,2]})
A B C
0 a 1 4
1 a 43 2
print(df.index)
RangeIndex(start=0, stop=2, step=1)
When I apply the function as below,
df['test'] = df.groupby('A')[['B']].apply(get_centroids)
it throws "TypeError: incompatible index of inserted column with frame index" though df and re has the similar type of indexes. Any help would be appreciated.
group_keys=False
togroupby
and please see the documentation & experiment withgroup_keys
parameter via printing groupby result without assigning it to a column. – Woodnotedf["test"] = df.groupby("A", group_keys=False)[["B"]].apply(get_centroids)
on the sample data you provided above. – Woodnote