How to groupby().transform() to value_counts() in pandas?

Asked 20/12, 2017 at 4:25 Answered 8/1, 2023 at 0:13

Solved python pandas dataframe group-by pandas-groupby

I am processing a pandas dataframe df1 with prices of items.

  Item    Price  Minimum Most_Common_Price
0 Coffee  1      1       2
1 Coffee  2      1       2
2 Coffee  2      1       2
3 Tea     3      3       4
4 Tea     4      3       4
5 Tea     4      3       4

I create Minimum using:

df1["Minimum"] = df1.groupby(["Item"])['Price'].transform(min)

How do I create Most_Common_Price?

df1["Minimum"] = df1.groupby(["Item"])['Price'].transform(value_counts()) # Doesn't work

In the moment, I use a multi-step approach:

for item in df1.Item.unique().tolist(): # Pseudocode
 df1 = df1[df1.Price == Item]           # Pseudocode
 df1.Price.value_counts().max()         # Pseudocode

which is overkill. There must be a more simple way, ideally in one line

How to groupby().transform() to value_counts() in pandas?

Raylenerayless answered 20/12, 2017 at 4:25 Comment(0)

You could use groupby + transform with value_counts and idxmax.

df['Most_Common_Price'] = (
    df.groupby('Item')['Price'].transform(lambda x: x.value_counts().idxmax()))

df

     Item  Price  Minimum  Most_Common_Price
0  Coffee      1        1                  2
1  Coffee      2        1                  2
2  Coffee      2        1                  2
3     Tea      3        3                  4
4     Tea      4        3                  4
5     Tea      4        3                  4

An improvement involves the use of pd.Series.map,

# Thanks, Vaishali!
df['Item'] = (df['Item'].map(df.groupby('Item')['Price']
                        .agg(lambda x: x.value_counts().idxmax()))
df

     Item  Price  Minimum  Most_Common_Price
0  Coffee      1        1                  2
1  Coffee      2        1                  2
2  Coffee      2        1                  2
3     Tea      3        3                  4
4     Tea      4        3                  4
5     Tea      4        3                  4

Tetartohedral answered 20/12, 2017 at 4:36 Comment(9)

@Raylenerayless notice , this method will also work for the object :-) – Transmitter 20/12, 2017 at 4:44

@Wen Thank you, that's an important consideration I didn't think of! – Tetartohedral 20/12, 2017 at 4:45

as you can see in my reputation, I am a rookie. Could you kindly elaborate what you mean with object here? – Raylenerayless 20/12, 2017 at 4:49

@Raylenerayless For example, if Price was a column of strings and you wanted to find the string with the highest count per group, this would still work. Whereas mode only works for numbers. – Tetartohedral 20/12, 2017 at 4:50

get it - I was confused because a number is an (int or float) object too – Raylenerayless 20/12, 2017 at 4:53

@Raylenerayless pandas objects are typed like numpy, they can contain either arrays of primitive types, e.g. numpy.int64, or, dtype=object, in which case, they can contain any python object. Note, that usually comes at the expense of efficiency. – Scrappy 20/12, 2017 at 4:57

Use map instead of transform and performance improves further. df['Item'].map(df.groupby('Item').Price.agg(lambda x: x.value_counts().idxmax())) – Istle 20/12, 2017 at 6:19

What if two values have the same occurrences, and in that case, we need to put if-else statement, i.e get the mode only if 1 value has maximum occurrence. If 2 values have same occurrences, go to else statement. How'd we do that. – Albertina 11/3, 2021 at 22:57

I think in the second code block, it should also read:

df['Most_Common_Price'] = (df['Item'].map(df.groupby('Item')['Price']                         .agg(lambda x: x.value_counts().idxmax()))

– Si 14/12, 2022 at 10:25

A nice way is to use pd.Series.mode, if you want the most common element (i.e. the mode).

In [32]: df
Out[32]:
     Item  Price  Minimum
0  Coffee      1        1
1  Coffee      2        1
2  Coffee      2        1
3     Tea      3        3
4     Tea      4        3
5     Tea      4        3

In [33]: df['Most_Common_Price'] = df.groupby(["Item"])['Price'].transform(pd.Series.mode)

In [34]: df
Out[34]:
     Item  Price  Minimum  Most_Common_Price
0  Coffee      1        1                  2
1  Coffee      2        1                  2
2  Coffee      2        1                  2
3     Tea      3        3                  4
4     Tea      4        3                  4
5     Tea      4        3                  4

As @Wen noted, pd.Series.mode can returns a pd.Series of values, so just grab the first:

Out[67]:
     Item  Price  Minimum
0  Coffee      1        1
1  Coffee      2        1
2  Coffee      2        1
3     Tea      3        3
4     Tea      4        3
5     Tea      4        3
6     Tea      3        3

In [68]: df[df.Item =='Tea'].Price.mode()
Out[68]:
0    3
1    4
dtype: int64

In [69]: df['Most_Common_Price'] = df.groupby(["Item"])['Price'].transform(lambda S: S.mode()[0])

In [70]: df
Out[70]:
     Item  Price  Minimum  Most_Common_Price
0  Coffee      1        1                  2
1  Coffee      2        1                  2
2  Coffee      2        1                  2
3     Tea      3        3                  3
4     Tea      4        3                  3
5     Tea      4        3                  3
6     Tea      3        3                  3

Scrappy answered 20/12, 2017 at 4:39 Comment(3)

a minor change df.groupby(["Item"])['Price'].transform(lambda x : x.mode()[0]), in case have two same one :-) – Transmitter 20/12, 2017 at 4:45

Is it possible that pandas changed the evaluation of your first solution? df.groupby(["Item"])['Price'].transform(pd.Series.mode) returns ValueError: Length of passed values is 1, index implies 3 on my machine. – Huygens 30/3, 2020 at 12:32

@00schneider: that's because the mode in your case applied to two or more values; use BENY's suggestion – Villeneuve 29/1, 2021 at 10:40


    #Initial dataframe having Milk as Nan value to produce the scenario if we have any group nan value 
    data_stack_try = [['Coffee',1],['Coffee',2],['Coffee',2],['Tea',3],['Tea',4],['Tea',4],['Milk', np.nan]]
    df_stack_try = pd.DataFrame(data_stack_try, columns=["Item","Price"])
    print("---Before Min---")
    print(df_stack_try)
    #Created Minimum column with transform function with 'min'
    df_stack_try["Minimum"] = df_stack_try.groupby(["Item"])['Price'].transform(min)
    print("---After Min----")
    print(df_stack_try)

    #Function written to take care of null values (Milk item is np.nan)
    def mode_group(grp):
        try:
            #return mode of each group passed for each row
            return grp.mode()[0]
        except BaseException as e:
            # This exception will be raised if there is no mode value 
            # In this case it will appear for Milk value as because of nan, it can't have mode value
            print("Exception!!!")
    df_stack_try["Most_Common_Price"] = df_stack_try.groupby('Item')['Price'].transform(lambda x: mode_group(x))
    print("---After Mode----")
    print(df_stack_try)


---Before Min---
     Item  Price
0  Coffee    1.0
1  Coffee    2.0
2  Coffee    2.0
3     Tea    3.0
4     Tea    4.0
5     Tea    4.0
6    Milk    NaN
---After Min----
     Item  Price  Minimum
0  Coffee    1.0      1.0
1  Coffee    2.0      1.0
2  Coffee    2.0      1.0
3     Tea    3.0      3.0
4     Tea    4.0      3.0
5     Tea    4.0      3.0
6    Milk    NaN      NaN
Exception!!!
---After Mode----
     Item  Price  Minimum  Most_Common_Price
0  Coffee    1.0      1.0                2.0
1  Coffee    2.0      1.0                2.0
2  Coffee    2.0      1.0                2.0
3     Tea    3.0      3.0                4.0
4     Tea    4.0      3.0                4.0
5     Tea    4.0      3.0                4.0
6    Milk    NaN      NaN                NaN

Stepup answered 8/1, 2023 at 0:13 Comment(1)

Please, add text description to the code provided – Irfan 11/1, 2023 at 9:24

Recommended topics

Hot tags