Is groupby from pandas commutative?
Asked Answered
A

3

5

I would like to know if the rows selected by:

groupby(['a', 'b']) 

are the same as the rows selected by:

groupby(['b', 'a'])

In this case the order of the rows doesn't matter.

Is there any case in which groupby does not fulfill the commutative property?

Ahearn answered 17/12, 2019 at 13:2 Comment(1)
in term of number or result, they are same, but for understanding the data , a-b a is group and b is subgroup , it also show in the index , which make the data table more readable.Vachell
H
4

I think order for counts not matter, only after groupby get first columns/ levels in order like you have columns in list.

df = pd.DataFrame({
        'a':list('aaaaaa'),
         'b':[4,5,4,5,5,4],
         'c':[7,8,9,4,2,3],

})

Order of levels after groupby aggregation:

df1 = df.groupby(['a', 'b']).sum()
print (df1)
      c
a b    
a 4  19
  5  14

df2 = df.groupby(['b', 'a']).sum()
print (df2)
      c
b a    
4 a  19
5 a  14

And columns:

df3 = df.groupby(['a', 'b'], as_index=False).sum()
print (df3)
   a  b   c
0  a  4  19
1  a  5  14

df4 = df.groupby(['b', 'a'], as_index=False).sum()
print (df4)
   b  a   c
0  4  a  19
1  5  a  14

If use transormation for new column with same size like original result is same:

df['new1'] = df.groupby(['a', 'b'])['c'].transform('sum')
df['new2'] = df.groupby(['b', 'a'])['c'].transform('sum')
print (df)
   a  b  c  new1  new2
0  a  4  7    19    19
1  a  5  8    14    14
2  a  4  9    19    19
3  a  5  4    14    14
4  a  5  2    14    14
5  a  4  3    19    19
Heterotrophic answered 17/12, 2019 at 13:5 Comment(0)
C
8

Per definition and the logic applied when using groupby in pandas, it will always be commutative:

A groupby operation involves some combination of splitting the object, applying a function, and combining the results.

This combination is linear hence commutative. The importance, is that when passing multiple by values, there will be an order in the new index values that should be kept in mind when addressing them.

From wikipedia's linear combination and commutative property:

In mathematics, a linear combination is an expression constructed from a set of terms by multiplying each term by a constant and adding the results. The idea that simple operations, such as the multiplication and addition of numbers, are commutative was for many years implicitly assumed.

Candent answered 17/12, 2019 at 13:10 Comment(0)
H
4

I think order for counts not matter, only after groupby get first columns/ levels in order like you have columns in list.

df = pd.DataFrame({
        'a':list('aaaaaa'),
         'b':[4,5,4,5,5,4],
         'c':[7,8,9,4,2,3],

})

Order of levels after groupby aggregation:

df1 = df.groupby(['a', 'b']).sum()
print (df1)
      c
a b    
a 4  19
  5  14

df2 = df.groupby(['b', 'a']).sum()
print (df2)
      c
b a    
4 a  19
5 a  14

And columns:

df3 = df.groupby(['a', 'b'], as_index=False).sum()
print (df3)
   a  b   c
0  a  4  19
1  a  5  14

df4 = df.groupby(['b', 'a'], as_index=False).sum()
print (df4)
   b  a   c
0  4  a  19
1  5  a  14

If use transormation for new column with same size like original result is same:

df['new1'] = df.groupby(['a', 'b'])['c'].transform('sum')
df['new2'] = df.groupby(['b', 'a'])['c'].transform('sum')
print (df)
   a  b  c  new1  new2
0  a  4  7    19    19
1  a  5  8    14    14
2  a  4  9    19    19
3  a  5  4    14    14
4  a  5  2    14    14
5  a  4  3    19    19
Heterotrophic answered 17/12, 2019 at 13:5 Comment(0)
C
3

Yes, the final groups will always be the same.

Only difference is the order in which rows will be showed.

Confidant answered 17/12, 2019 at 13:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.