I am trying to figure out the fastest way to perform search and sort on a pandas dataframe. Below are before and after dataframes of what I am trying to accomplish.
Before:
flightTo flightFrom toNum fromNum toCode fromCode
ABC DEF 123 456 8000 8000
DEF XYZ 456 893 9999 9999
AAA BBB 473 917 5555 5555
BBB CCC 917 341 5555 5555
After search/sort:
flightTo flightFrom toNum fromNum toCode fromCode
ABC XYZ 123 893 8000 9999
AAA CCC 473 341 5555 5555
In this example I am essentially trying to filter out 'flights' that exist in between end destinations. This should be done by using some sort of drop duplicates method but what leaves me confused is how to handle all of the columns. Would a binary search be the best way to accomplish this? Hints appreciated, trying hard to figure this out.
possible edge case:
What if the data is switched up and our end connections are in the same column?
flight1 flight2 1Num 2Num 1Code 2Code
ABC DEF 123 456 8000 8000
XYZ DEF 893 456 9999 9999
After search/sort:
flight1 flight2 1Num 2Num 1Code 2Code
ABC XYZ 123 893 8000 9999
This case logically shouldn't happen. After all how can you go DEF-ABC and DEF-XYZ? You can't, but the 'endpoints' would still be ABC-XYZ
df['flightFrom'].shift() != df['fightTo']
? – HolloweyedfromNum, fromCode
expected output, that's what makes this question complex imo. – Dumdum