This is my DataFrame:
import pandas as pd
df = pd.DataFrame(
{
'a': ['x', 'x', 'x', 'x', 'y', 'y', 'y', 'y', 'z', 'z', 'z', 'p', 'p', 'p', 'p'],
'b': [1, -1, 1, 1, -1, 1, 1, -1, -1, -1, -1, 1, 1, 1, 1]
}
)
And this the expected output. I want to create column c
:
a b c
0 x 1 first
1 x -1 first
2 x 1 first
3 x 1 first
4 y -1 second
5 y 1 second
6 y 1 second
7 y -1 second
11 p 1 first
12 p 1 first
13 p 1 first
14 p 1 first
Groups are defined by column a
. I want to filter df
and choose groups that either their first b
is 1 OR their second b
is 1.
I did this by this code:
df1 = df.groupby('a').filter(lambda x: (x.b.iloc[0] == 1) | (x.b.iloc[1] == 1))
And for creating column c
for df1
, again groups should be defined by a
and then if for each group first b
is 1 then c
is first
and if the second b
is 1 then c
is second
.
Note that for group p
, both first and second b
is 1, for these groups I want c
to be first
.
Maybe the way that I approach the issue is totally wrong.