Sort dataframe by string length

G

6

55

I want to sort by name length. There doesn't appear to be a key parameter for sort_values so I'm not sure how to accomplish this. Here is a test df:

import pandas as pd
df = pd.DataFrame({'name': ['Steve', 'Al', 'Markus', 'Greg'], 'score': [2, 4, 2, 3]})

Glower answered 28/2, 2017 at 18:54 Comment(3)

Possible duplicate of sort dataframe by length of string in a column – Potential 12/9, 2017 at 13:34

@jezrael Please read my reason. I mentioned it explicitly: #46177862 – Potential 12/9, 2017 at 14:1

There are more options there. If not, you can edit this answer and include all those other solutions. – Potential 12/9, 2017 at 14:2

A

52

You can use reindex of index of Series created by len with sort_values:

print (df.name.str.len())
0    5
1    2
2    6
3    4
Name: name, dtype: int64

print (df.name.str.len().sort_values())
1    2
3    4
0    5
2    6
Name: name, dtype: int64

s = df.name.str.len().sort_values().index
print (s)
Int64Index([1, 3, 0, 2], dtype='int64')

print (df.reindex(s))
     name  score
1      Al      4
3    Greg      3
0   Steve      2
2  Markus      2

df1 = df.reindex(s)
df1 = df1.reset_index(drop=True)
print (df1)
     name  score
0      Al      4
1    Greg      3
2   Steve      2
3  Markus      2

Ashworth answered 28/2, 2017 at 18:56 Comment(2)

Great answer, I tried this approach with lists too (Sorting a DataFrame by list length), since .str.len() works with lists as mentioned in the question Pythonic way for calculating length of lists in pandas dataframe column in this link – Mcmahan 9/7, 2017 at 23:22

This is clever, but you should note it's only safe to do when it's ok to trash the existing index. – Lynx 16/4, 2024 at 2:20

K

50

Using DataFrame.sort_values we can pass an anonymous (lambda) function computing string length (using .str.len() Series method) to the key argument:

df = pd.DataFrame({
    'name': ['Steve', 'Al', 'Markus', 'Greg'], 
    'score': [2, 4, 2, 3]
})
print(df)

     name  score
0   Steve      2
1      Al      4
2  Markus      2
3    Greg      3

df.sort_values(by="name", key=lambda x: x.str.len())

     name  score
1      Al      4
3    Greg      3
0   Steve      2
2  Markus      2

Kaylil answered 20/9, 2020 at 19:48 Comment(1)

Thanks. Just in case someone needs to lower case and sort df.sort_index(key=lambda x: x.str.lower().str.len()) – Seward 7/1, 2023 at 16:48

T

16

I found this solution more intuitive, specially if you want to do something depending on the column length later on.

df['length'] = df['name'].str.len()
df.sort_values('length', ascending=False, inplace=True)

Now your dataframe will have a column with name length with the value of string length from column name in it and the whole dataframe will be sorted in descending order.

Trover answered 3/10, 2019 at 19:1 Comment(1)

This should be the accepted answer. Much simpler and easily reused. – Backgammon 7/8, 2020 at 14:13

P

3

The answer of @jezrael is great and explains well. Here is the final result :

index_sorted = df.name.str.len().sort_values(ascending=True).index
df_sorted = df.reindex(index_sorted)
df_sorted = df_sorted.reset_index(drop=True)

Preoccupy answered 5/2, 2020 at 14:3 Comment(0)

L

3

A fancy and minimal solution:

df.iloc[df.agg({"name":len}).sort_values('name').index]



     name  score
1      Al      4
3    Greg      3
0   Steve      2
2  Markus      2

Leopold answered 10/7, 2020 at 14:31 Comment(1)

Nice one! thanxx !! – Fango 15/11, 2020 at 20:11

O

0

It's worth using the key argument to avoid creating unnecessary columns:

df.sort_values("column_name", ascending=True, key=lambda col: col.str.len())

Onepiece answered 22/9, 2023 at 13:55 Comment(0)

Recommended topics

Hot tags