Say I have the pandas DataFrame below:
A B C D
1 foo one 0 0
2 foo one 2 4
3 foo two 4 8
4 cat one 8 4
5 bar four 6 12
6 bar three 7 14
7 bar four 7 14
I would like to select all the rows that have equal values in A but differing values in B. So I would like the output of my code to be:
A B C D
1 foo one 0 0
3 foo two 4 8
5 bar three 7 14
6 bar four 7 14
What's the most efficient way to do this? I have approximately 11,000 rows with a lot of variation in the column values, but this situation comes up a lot. In my dataset, if elements in column A are equal then the corresponding column B value should also be equal, however due to mislabeling this is not the case and I would like to fix this, it would be impractical for me to do this one by one.