Check if all elements in a group are equal using pandas GroupBy
Asked Answered
T

3

26

Is there a pythonic way to group by a field and check if all elements of each resulting group have the same value?

Sample data:

              datetime rating  signal
0  2018-12-27 11:33:00     IG       0
1  2018-12-27 11:33:00     HY      -1
2  2018-12-27 11:49:00     IG       0
3  2018-12-27 11:49:00     HY      -1
4  2018-12-27 12:00:00     IG       0
5  2018-12-27 12:00:00     HY      -1
6  2018-12-27 12:49:00     IG       0
7  2018-12-27 12:49:00     HY      -1
8  2018-12-27 14:56:00     IG       0
9  2018-12-27 14:56:00     HY      -1
10 2018-12-27 15:12:00     IG       0
11 2018-12-27 15:12:00     HY      -1
12 2018-12-20 15:14:00     IG       0
13 2018-12-20 15:14:00     HY      -1
14 2018-12-20 15:50:00     IG      -1
15 2018-12-20 15:50:00     HY      -1
16 2018-12-27 13:26:00     IG       0
17 2018-12-27 13:26:00     HY      -1
18 2018-12-27 13:44:00     IG       0
19 2018-12-27 13:44:00     HY      -1
20 2018-12-27 15:06:00     IG       0
21 2018-12-27 15:06:00     HY      -1
22 2018-12-20 15:48:00     IG       0
23 2018-12-20 15:48:00     HY      -1

The grouping part can be done by

df.groupby([datetime.dt.date,'rating'])

However, I'm sure there must be a simple way to leverage the grouper and use a transform statement to return 1 if all the values from signal are the same.

Desired output

2018-12-20  HY            True
            IG            False
2018-12-27  HY            True
            IG            True
Triable answered 27/12, 2018 at 21:7 Comment(7)
Could you check len(set(your_values)) == 1 ?Shambles
I don't see any 'temp' key in your input dfDownstairs
Should this be [True, False, True, False]?Trochee
I should have False for 2018-12-20, IG, and True for everything else.Triable
@Downstairs it is being generated from the assign statementTriable
Hmm, I see 2018-12-27/IG has [0, -1] as the unique values. Can you take a look?Trochee
@coldspeed fixed the sample data, thanks!Triable
T
37

Use groupby and nunique, and check whether the result is 1:

df.groupby([df.datetime.dt.date, 'rating']).signal.nunique().eq(1)

datetime    rating
2018-12-20  HY         True
            IG        False
2018-12-27  HY         True
            IG         True
Name: signal, dtype: bool

Or, similarly, using apply with set conversion:

(df.groupby([df.datetime.dt.date, 'rating']).signal
   .apply(lambda x: len(set(x)) == 1))

datetime    rating
2018-12-20  HY         True
            IG        False
2018-12-27  HY         True
            IG         True
Name: signal, dtype: bool

PS., you don't need to assign a temp column, groupby takes arbitrary grouper arguments.

Trochee answered 27/12, 2018 at 21:14 Comment(3)
follow up: is there an easy way to recover the index of the 'odd-one-out'? we can assume there's only one per dayTriable
@Triable do you mean per day there would only be one odd one out (either true or false)? Is it possible there can be no odd one out?Trochee
yes, it is possible that there is no odd one out, I have a pretty ugly way of doing it that involves using grouby twice :STriable
E
5

Try to find out alternative without using groupby just for fun

df.datetime=df.datetime.dt.date

s=pd.crosstab(df.datetime,[df.rating,df.signal])


s.eq(s.sum(axis=1,level=0),1).any(level=0,axis=1).stack()
Out[556]: 
datetime    rating
2018-12-20  HY         True
            IG        False
2018-12-27  HY         True
            IG         True
dtype: bool
Etna answered 27/12, 2018 at 21:38 Comment(1)
I have a challenge for you, make it false only if it's not the last entry of the day :)Triable
T
0

I think this would be more efficient than using the nunique method:

df.groupby([df.datetime.dt.date, 'rating'])['signal'].agg(
    lambda x: np.all(x.to_numpy() == x.iloc[0])
)
Tiphani answered 25/6, 2024 at 17:31 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.