How to sort pandas dataframe by one column
Asked Answered
B

15

667

I have a dataframe like this:

        0          1     2
0   354.7      April   4.0
1    55.4     August   8.0
2   176.5   December  12.0
3    95.5   February   2.0
4    85.6    January   1.0
5     152       July   7.0
6   238.7       June   6.0
7   104.8      March   3.0
8   283.5        May   5.0
9   278.8   November  11.0
10  249.6    October  10.0
11  212.7  September   9.0

As you can see, months are not in calendar order. So I created a second column to get the month number corresponding to each month (1-12). From there, how can I sort this dataframe according to calendar months' order?

Brynnbrynna answered 13/6, 2016 at 10:44 Comment(0)
D
813

Use sort_values to sort the df by a specific column's values:

In [18]:
df.sort_values('2')

Out[18]:
        0          1     2
4    85.6    January   1.0
3    95.5   February   2.0
7   104.8      March   3.0
0   354.7      April   4.0
8   283.5        May   5.0
6   238.7       June   6.0
5   152.0       July   7.0
1    55.4     August   8.0
11  212.7  September   9.0
10  249.6    October  10.0
9   278.8   November  11.0
2   176.5   December  12.0

If you want to sort by two columns, pass a list of column labels to sort_values with the column labels ordered according to sort priority. If you use df.sort_values(['2', '0']), the result would be sorted by column 2 then column 0. Granted, this does not really make sense for this example because each value in df['2'] is unique.

Denys answered 13/6, 2016 at 10:45 Comment(1)
for descending order use param, ascending=FalseComplected
A
294

I tried the solutions above and I do not achieve results, so I found a different solution that works for me. The ascending=False is to order the dataframe in descending order, by default it is True. I am using python 3.6.6 and pandas 0.23.4 versions.

final_df = df.sort_values(by=['2'], ascending=False)

You can see more details in pandas documentation here.

Agitprop answered 14/11, 2018 at 14:42 Comment(0)
S
78

Using column name worked for me.

sorted_df = df.sort_values(by=['Column_name'], ascending=True)
Sardine answered 27/8, 2020 at 9:57 Comment(0)
C
40

Panda's sort_values does the work.

There are various parameters one can pass, such as ascending (bool or list of bool):

Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the by.

As the default is ascending, and OP's goal is to sort ascending, one doesn't need to specify that parameter (see the last note below for the way to solve descending), so one can use one of the following ways:

  • Performing the operation in-place, and keeping the same variable name. This requires one to pass inplace=True as follows:

    df.sort_values(by=['2'], inplace=True)
    
    # or
    
    df.sort_values(by = '2', inplace = True)
    
    # or
    
    df.sort_values('2', inplace = True)
    
  • If doing the operation in-place is not a requirement, one can assign the change (sort) to a variable:

    • With the same name of the original dataframe, df as

      df = df.sort_values(by=['2'])
      
    • With a different name, such as df_new, as

      df_new = df.sort_values(by=['2'])
      

All this previous operations would give the following output

        0          1     2
4    85.6    January   1.0
3    95.5   February   2.0
7   104.8      March   3.0
0   354.7      April   4.0
8   283.5        May   5.0
6   238.7       June   6.0
5     152       July   7.0
1    55.4     August   8.0
11  212.7  September   9.0
10  249.6    October  10.0
9   278.8   November  11.0
2   176.5   December  12.0

Finally, one can reset the index with pandas.DataFrame.reset_index, to get the following

df.reset_index(drop = True, inplace = True)

# or

df = df.reset_index(drop = True)

[Out]:

        0          1     2
0    85.6    January   1.0
1    95.5   February   2.0
2   104.8      March   3.0
3   354.7      April   4.0
4   283.5        May   5.0
5   238.7       June   6.0
6     152       July   7.0
7    55.4     August   8.0
8   212.7  September   9.0
9   249.6    October  10.0
10  278.8   November  11.0
11  176.5   December  12.0

A one-liner that sorts ascending, and resets the index would be as follows

df = df.sort_values(by=['2']).reset_index(drop = True)

[Out]:

        0          1     2
0    85.6    January   1.0
1    95.5   February   2.0
2   104.8      March   3.0
3   354.7      April   4.0
4   283.5        May   5.0
5   238.7       June   6.0
6     152       July   7.0
7    55.4     August   8.0
8   212.7  September   9.0
9   249.6    October  10.0
10  278.8   November  11.0
11  176.5   December  12.0

Notes:

  • If one is not doing the operation in-place, forgetting the steps mentioned above may lead one (as this user) to not be able to get the expected result.

  • There are strong opinions on using inplace. For that, one might want to read this.

  • One is assuming that the column 2 is not a string. If it is, one will have to convert it:

  • If one wants in descending order, one needs to pass ascending=False as

     df = df.sort_values(by=['2'], ascending=False)
    
     # or
    
     df.sort_values(by = '2', ascending=False, inplace=True)
    
     [Out]:
    
            0          1     2
    2   176.5   December  12.0
    9   278.8   November  11.0
    10  249.6    October  10.0
    11  212.7  September   9.0
    1    55.4     August   8.0
    5     152       July   7.0
    6   238.7       June   6.0
    8   283.5        May   5.0
    0   354.7      April   4.0
    7   104.8      March   3.0
    3    95.5   February   2.0
    4    85.6    January   1.0
    
Contrastive answered 5/2, 2021 at 14:10 Comment(0)
U
28

Just as another solution:

Instead of creating the second column, you can categorize your string data(month name) and sort by that like this:

df.rename(columns={1:'month'},inplace=True)
df['month'] = pd.Categorical(df['month'],categories=['December','November','October','September','August','July','June','May','April','March','February','January'],ordered=True)
df = df.sort_values('month',ascending=False)

It will give you the ordered data by month name as you specified while creating the Categorical object.

Urethroscope answered 30/6, 2019 at 5:34 Comment(0)
Q
12

Just adding some more operations on data. Suppose we have a dataframe df, we can do several operations to get desired outputs

ID         cost      tax    label
1       216590      1600    test      
2       523213      1800    test 
3          250      1500    experiment

(df['label'].value_counts().to_frame().reset_index()).sort_values('label', ascending=False)

will give sorted output of labels as a dataframe

    index   label
0   test        2
1   experiment  1
Queri answered 17/7, 2018 at 16:19 Comment(0)
M
11

This worked for me

df.sort_values(by='Column_name', inplace=True, ascending=False)
Mismatch answered 18/12, 2020 at 4:16 Comment(0)
B
9

You probably need to reset the index after sorting:

df = df.sort_values('2')
df = df.reset_index(drop=True)
Bass answered 26/12, 2021 at 6:47 Comment(1)
df.sort_values('2', inplace=True) will do the jobComplected
I
8

Here is template of sort_values according to pandas documentation.

DataFrame.sort_values(by, axis=0,
                          ascending=True,
                          inplace=False,
                          kind='quicksort',
                          na_position='last',
                          ignore_index=False, key=None)[source]

In this case it will be like this.

df.sort_values(by=['2'])

API Reference pandas.DataFrame.sort_values

Igorot answered 10/8, 2020 at 12:20 Comment(0)
M
7

Just adding a few more insights

df=raw_df['2'].sort_values() # will sort only one column (i.e 2)

but ,

df =raw_df.sort_values(by=["2"] , ascending = False)  # this  will sort the whole df in decending order on the basis of the column "2"
Messier answered 3/7, 2022 at 8:8 Comment(1)
if ['2'] is working that means 2 is a char and if [2] is working then 2 is int . That's the only diff.Messier
S
6

If you want to sort column dynamically but not alphabetically. and dont want to use pd.sort_values(). you can try below solution.

Problem : sort column "col1" in this sequence ['A', 'C', 'D', 'B']

import pandas as pd
import numpy as np

## Sample DataFrame ##
df = pd.DataFrame({'col1': ['A', 'B', 'D', 'C', 'A']})

>>> df
   col1
0    A
1    B
2    D
3    C
4    A
## Solution ##

conditions = []
values = []

for i,j in enumerate(['A','C','D','B']):
    conditions.append((df['col1'] == j))
    values.append(i)

df['col1_Num'] = np.select(conditions, values)

df.sort_values(by='col1_Num',inplace = True)

>>> df

    col1  col1_Num
0    A         0
4    A         0
3    C         1
2    D         2
1    B         3
Sprayberry answered 24/11, 2022 at 14:44 Comment(0)
P
1

This one worked for me:

df=df.sort_values(by=[2])

Whereas:

df=df.sort_values(by=['2']) 

is not working.

Pulverize answered 15/2, 2021 at 7:55 Comment(0)
A
1

Sort using a key

Since pandas 1.1.0, we can pass a key= parameter which admits a function as a sorting key much like the key argument in the builtin sorted() function in Python. However, unlike the function passed to sorted's key, this function has to be vectorized, which means it must output a Series/DataFrame to be used to sort the input.

For the example in the OP, instead of creating column '2' to sort by column '1', we could directly apply a sorting key to column '1'. Because the column(s) passed as by= arguments are operated on internally in .sort_values(), we can create a month name-to-number mapper dictionary and pass a lambda that maps this dictionary to the column '1'.

import calendar   # <--- the builtin calendar module
month_to_number_mapper = {m:i for i,m in enumerate(calendar.month_name)}
df1 = df.sort_values(by='1', key=lambda col: col.map(month_to_number_mapper))

As you see, this is reminiscent of the following sorted() call in vanilla Python:

li = sorted(df.values, key=lambda row: month_to_number_mapper[row[1]])

For the example in the OP, since column '1' is a column of month names, we can treat it as if it were a datetime column to sort the dataframe. To do that we can pass pandas' to_datetime function as key.

df1 = df.sort_values(by='1', key=lambda col: pd.to_datetime(col, format='%B'))

This is reminiscent of the following sorted() call in vanilla Python:

from datetime import datetime
li = sorted(df.values, key=lambda row: datetime.strptime(row[1], '%B'))

Sort by index

Pandas' .loc[] rearranges rows according to values passed to it. So another way to sort could be to sort column '1' using whatever sorting key and then pass the sorted object's index to loc[].

sorted_index = pd.to_datetime(df['1'], format='%B').sort_values().index
df1 = df.loc[sorted_index]

All three ways listed above perform the following transformation:

result

Annalisaannalise answered 9/11, 2023 at 0:13 Comment(0)
O
0

I hope these will be helpful :

df.sort_values(by=['col1','col2','col3'],ascending = False)

If you have Na values, then use these:

df.sort_values(by=['col1','col2','col3'],ascending = False, na_position = first)
Oscilloscope answered 28/1 at 11:28 Comment(0)
S
-1

Example: Assume you have a column with values 1 and 0 and you want to separate and use only one value, then:

// furniture is one of the columns in the csv file.
 

allrooms = data.groupby('furniture')['furniture'].agg('count')
allrooms


myrooms1 = pan.DataFrame(allrooms, columns = ['furniture'], index = [1])

myrooms2 = pan.DataFrame(allrooms, columns = ['furniture'], index = [0])

print(myrooms1);print(myrooms2)
Schenck answered 18/7, 2021 at 16:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.