Looking for pandas "ungroup by" operation opposite to .groupby in the following string aggregation?
Asked Answered
C

5

68

Suppose we take a pandas dataframe...

    name  age  family
0   john    1       1
1  jason   36       1
2   jane   32       1
3   jack   26       2
4  james   30       2

Then do a groupby() ...

group_df = df.groupby('family')
group_df = group_df.aggregate({'name': name_join, 'age': pd.np.mean})

Then do some aggregate/summarize operation (in my example, my function name_join aggregates the names):

def name_join(list_names, concat='-'):
    return concat.join(list_names)

The grouped summarized output is thus:

        age             name
family                      
1        23  john-jason-jane
2        28       jack-james

Question:

Is there a quick, efficient way to get to the following from the aggregated table?

    name  age  family
0   john   23       1
1  jason   23       1
2   jane   23       1
3   jack   28       2
4  james   28       2

(Note: the age column values are just examples, I don't care for the information I am losing after averaging in this specific example)

Craniate answered 21/11, 2013 at 13:38 Comment(4)
possible duplicate of pandas: How do I split text in a column into multiple columns?Microbiology
@AndyHayden: perhaps but that question's title sucks; this one is straightforward. (So if only the example use-case needs improving, best to improve it instead of closing this)Deadlight
"A table, stored in a pandas dataframe" is circumlocution. Just learn to see a dataframe as a table (if that is what it represents).Deadlight
The question is slightly unwieldy: instead of doing the aggregate/summarize operation then reversing it, just stop after the groupby(), do some averaging on age if necessary, then do reset_index()Deadlight
L
63

The rough equivalent is .reset_index(), but it may not be helpful to think of it as the "opposite" of groupby().

You are splitting a string in to pieces, and maintaining each piece's association with 'family'. This old answer of mine does the job.

Just set 'family' as the index column first, refer to the link above, and then reset_index() at the end to get your desired result.

Labourite answered 21/11, 2013 at 13:58 Comment(1)
brilliant! I'm still looking at what the combination of apply, lambda, pd.Series and stack does, but it works exactly as intended. thanks!Craniate
T
21

It turns out that pd.groupby() returns an object with the original data stored in obj. So ungrouping is just pulling out the original data.

group_df = df.groupby('family')
group_df.obj

Example

>>> dat_1 = df.groupby("category_2")
>>> dat_1
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7fce78b3dd00>
>>> dat_1.obj
    order_date          category_2     value
1   2011-02-01  Cross Country Race  324400.0
2   2011-03-01  Cross Country Race  142000.0
3   2011-04-01  Cross Country Race  498580.0
4   2011-05-01  Cross Country Race  220310.0
5   2011-06-01  Cross Country Race  364420.0
..         ...                 ...       ...
535 2015-08-01          Triathalon   39200.0
536 2015-09-01          Triathalon   75600.0
537 2015-10-01          Triathalon   58600.0
538 2015-11-01          Triathalon   70050.0
539 2015-12-01          Triathalon   38600.0

[531 rows x 3 columns]
Transhumance answered 30/3, 2021 at 22:40 Comment(4)
This is a good hack, but I'm afraid it may not be future proof. I have in mind Hadley Wickham's talk about maintainable code. He warned against off-label usage of functions. The function maintainer might not be aware that end users use the function this way, so he/she might modify the function behavior, unaware that it might break down existing downstream code. What do you think?Phytohormone
@HanyNagaty Yes - It's of course a possibility. It would be smart of us to request an ungroup() method be added to pandas, which would simply return the grouped_df.obj. They would add unit tests to make sure a test fails if the ungroup() method doesn't work.Transhumance
@HanyNagaty I've opened a GitHub Issue on Pandas here. Please support it if you'd like this feature. github.com/pandas-dev/pandas/issues/43902Transhumance
@MaddDancho Yes I like it and I made a comment there.Phytohormone
B
7

Here's a complete example that recovers the original dataframe from the grouped object

def name_join(list_names, concat='-'):
    return concat.join(list_names)

print('create dataframe\n')
df = pandas.DataFrame({'name':['john', 'jason', 'jane', 'jack', 'james'], 'age':[1,36,32,26,30], 'family':[1,1,1,2,2]})
df.index.name='indexer'
print(df)
print('create group_by object')
group_obj_df = df.groupby('family')
print(group_obj_df)

print('\nrecover grouped df')
group_joined_df = group_obj_df.aggregate({'name': name_join, 'age': 'mean'})
group_joined_df


create dataframe

          name  age  family
indexer                    
0         john    1       1
1        jason   36       1
2         jane   32       1
3         jack   26       2
4        james   30       2
create group_by object
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7fbfdd9dd048>

recover grouped df 
                   name  age
family                      
1       john-jason-jane   23
2            jack-james   28
print('\nRecover the original dataframe')
print(pandas.concat([group_obj_df.get_group(key) for key in group_obj_df.groups]))

Recover the original dataframe
          name  age  family
indexer                    
0         john    1       1
1        jason   36       1
2         jane   32       1
3         jack   26       2
4        james   30       2
Blanchblancha answered 2/12, 2019 at 19:48 Comment(0)
D
1

There are a few ways to undo DataFrame.groupby, one way is to do DataFrame.groupby.filter(lambda x:True), this gets back to the original DataFrame.

Displacement answered 12/9, 2019 at 6:13 Comment(2)
AttributeError: 'function' object has no attribute 'filter'Brigitta
In the context of the question it would be group_df.filter(lambda x:True). Worked for me. Has the overhead of copying to a new Dataframe.Judson
I
0

You can use transform() instead of aggregate.

group_df = df.groupby('family', as_index=False)
group_df = group_df.transform({'name': name_join, 'age': pd.np.mean})
Iphigenia answered 2/3 at 23:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.