Find rows of dataframe with the same column value in Pandas
Asked Answered
F

1

6

Consider a dataframe with 2 columns for easiness. The first column is id and it is the key. The second column, named code is not a key but the case of two entries having the same value is very rare.

I want to find the rows having the same code value but of course different id.

How can I do that in Pandas?

Farseeing answered 5/6, 2019 at 11:24 Comment(0)
S
13

I believe you need DataFrame.duplicated for all dupes by column and for ordering use DataFrame.sort_values:

df = pd.DataFrame({
        'id':[1,2,3,4,5,6],
        'code':list('abcdac'),

})

print (df)
   id code
0   1    a
1   2    b
2   3    c
3   4    d
4   5    a
5   6    c

df1 = df[df.duplicated('code', keep=False)].sort_values('code')
print (df1)
   id code
0   1    a
4   5    a
2   3    c
5   6    c

Or if need lists use groupby with list:

df2 = df[df.duplicated('code', keep=False)].groupby('code')['id'].apply(list).reset_index()
print (df2)
  code      id
0    a  [1, 5]
1    c  [3, 6]
Spheroidal answered 5/6, 2019 at 11:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.