Pandas: Sort a dataframe based on multiple columns
Asked Answered
R

2

13

I've a pandas dataframe with columns, department and employee_count. I need to sort the employee_count column in descending order. But if there is a tie between 2 employee_counts then they should be sorted alphabetically based on department.

   Department Employee_Count
0    abc          10
1    adc          10
2    bca          11
3    cde          9
4    xyz          15

Required output:

   Department Employee_Count
0    xyz          15
1    bca          11
2    abc          10
3    adc          10
4    cde          9

This is what I've tried.

df = df.sort_values(['Department','Employee_Count'],ascending=[True,False])

But this just sorts the departments alphabetically.

I've also tried to sort by Department first and then by Employee_Count. Like this:

df = df.sort_values(['Department'],ascending=[True])
df = df.sort_values(['Employee_Count'],ascending=[False])

This doesn't give me correct output either:

   Department Employee_Count
4    xyz          15
2    bca          11
1    adc          10
0    abc          10
3    cde          9

It gives 'adc' first and then 'abc'.

Reine answered 4/11, 2019 at 8:53 Comment(0)
B
17

You can swap columns in list and also values in ascending parameter:

Explanation:

Order of columns names is order of sorting, first sort descending by Employee_Count and if some duplicates in Employee_Count then sorting by Department only duplicates rows ascending.

df1 = df.sort_values(['Employee_Count', 'Department'], ascending=[False, True])
print (df1)
  Department  Employee_Count
4        xyz              15
2        bca              11
0        abc              10 <-
1        adc              10 <-
3        cde               9

Or for test if use second False then duplicated rows are sorting descending:

df2 = df.sort_values(['Employee_Count', 'Department',],ascending=[False, False])
print (df2)
  Department  Employee_Count
4        xyz              15
2        bca              11
1        adc              10 <-
0        abc              10 <-
3        cde               9
Bluegreen answered 4/11, 2019 at 8:56 Comment(2)
Thanks a ton. Exactly the output I desired. Can you please tell me the rationale behind swapping the columns ?Reine
It sorts the columns in the order of the list.Kosak
P
0

It's a good 4 years too late but OP's initial attempt would work if the sorting was stable. Pandas sort_values() uses 'quicksort' by default which is not guaranteed to be stable. However, if the second sort_values() call used a 'stable' sort, it would produce the expected output.

df = df.sort_values('Department', ascending=True)
df = df.sort_values('Employee_Count', kind='stable', ascending=False, ignore_index=True)
#                                     ^^^^^^^^^^^^^  <--- stable sort here

You can verify that for any dataframe, it produces the same result as sorting by a list of two columns:

df = pd.DataFrame(np.random.randint(10, size=(1000, 2)), columns=['A', 'B'])
a = df.sort_values('A', ascending=True).sort_values('B', kind='stable', ascending=False, ignore_index=True)
b = df.sort_values(['B', 'A'], ascending=[False, True], ignore_index=True)
a.equals(b)   # True
Pianist answered 17/2 at 10:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.