pandas: get all groupby values in an array [duplicate]
Asked Answered
G

1

17

I'm sure this has been asked before, sorry if duplicate. Suppose I have the following dataframe:

df = pd.DataFrame({'key': ['A', 'B', 'C', 'A', 'B', 'C'],
                   'data': range(6)}, columns=['key', 'data'])

>>
    key data
0   A   0
1   B   1
2   C   2
3   A   3
4   B   4
5   C   5

Doing a groupby on 'key', df.groupby('key').sum() I know we can do things like:

>> 
    data
key 
A   3
B   5
C   7

What is the easiest way to get all the 'splitted' data in an array?:

>> 
    data
key 
A   [0, 3]
B   [1, 4]
C   [2, 5]

I'm not necessarily grouping by just one key, but with several other indexes as well ('year' and 'month' for example) which is why I'd like to use the groupby function, but preserve all the grouped values in an array.

G answered 12/3, 2019 at 15:51 Comment(0)
H
30

You can use apply(list):

print(df.groupby('key').data.apply(list).reset_index())

  key    data
0   A  [0, 3]
1   B  [1, 4]
2   C  [2, 5]
Heddle answered 12/3, 2019 at 15:51 Comment(8)
For arrays instead of lists you can do df.groupby('key').data.apply(np.array) which was more convenient for my operations.G
What is one has multiple-columns and wants aggregate all the values from multiple columns into one list?Dap
@Dap df.groupby("Column Name").agg(list) should help.. another way is pivot table (not required though) df.pivot_table(index="Column Name",aggfunc=list)Heddle
This is what worked for me as I needed distinct list/array items: df.groupby('key').data.unique().reset_index()Facer
does this preserve the itens order in the resulting list?Governess
Hey, I am getting the error Error:'DataFrameGroupBy' object has no attribute 'data'. My line of code is main_group = main.groupby(["new-date", 'seller_identifier', 'affiliate_name']).data.apply(np.array).reset_index() Any solutions to this?Pond
@NischayaSharma remove the .data from your code and tryHeddle
@NischayaSharma I ran into this also, and it took me way longer than it should have to figure it out: .data is a column name, not a Pandas API field. You'll get 'DataFrameGroupBy' object has no attribute 'data' if your DataFrame has different column names, and the solution is just to replace data with the name of your actual column.Thrash

© 2022 - 2024 — McMap. All rights reserved.