Sort dataframe by string length
Asked Answered
G

6

55

I want to sort by name length. There doesn't appear to be a key parameter for sort_values so I'm not sure how to accomplish this. Here is a test df:

import pandas as pd
df = pd.DataFrame({'name': ['Steve', 'Al', 'Markus', 'Greg'], 'score': [2, 4, 2, 3]})
Glower answered 28/2, 2017 at 18:54 Comment(3)
Possible duplicate of sort dataframe by length of string in a columnPotential
@jezrael Please read my reason. I mentioned it explicitly: #46177862Potential
There are more options there. If not, you can edit this answer and include all those other solutions.Potential
A
52

You can use reindex of index of Series created by len with sort_values:

print (df.name.str.len())
0    5
1    2
2    6
3    4
Name: name, dtype: int64

print (df.name.str.len().sort_values())
1    2
3    4
0    5
2    6
Name: name, dtype: int64

s = df.name.str.len().sort_values().index
print (s)
Int64Index([1, 3, 0, 2], dtype='int64')

print (df.reindex(s))
     name  score
1      Al      4
3    Greg      3
0   Steve      2
2  Markus      2

df1 = df.reindex(s)
df1 = df1.reset_index(drop=True)
print (df1)
     name  score
0      Al      4
1    Greg      3
2   Steve      2
3  Markus      2
Ashworth answered 28/2, 2017 at 18:56 Comment(2)
Great answer, I tried this approach with lists too (Sorting a DataFrame by list length), since .str.len() works with lists as mentioned in the question Pythonic way for calculating length of lists in pandas dataframe column in this linkMcmahan
This is clever, but you should note it's only safe to do when it's ok to trash the existing index.Lynx
K
50

Using DataFrame.sort_values we can pass an anonymous (lambda) function computing string length (using .str.len() Series method) to the key argument:

df = pd.DataFrame({
    'name': ['Steve', 'Al', 'Markus', 'Greg'], 
    'score': [2, 4, 2, 3]
})
print(df)

     name  score
0   Steve      2
1      Al      4
2  Markus      2
3    Greg      3
df.sort_values(by="name", key=lambda x: x.str.len())

     name  score
1      Al      4
3    Greg      3
0   Steve      2
2  Markus      2
Kaylil answered 20/9, 2020 at 19:48 Comment(1)
Thanks. Just in case someone needs to lower case and sort df.sort_index(key=lambda x: x.str.lower().str.len())Seward
T
16

I found this solution more intuitive, specially if you want to do something depending on the column length later on.

df['length'] = df['name'].str.len()
df.sort_values('length', ascending=False, inplace=True)

Now your dataframe will have a column with name length with the value of string length from column name in it and the whole dataframe will be sorted in descending order.

Trover answered 3/10, 2019 at 19:1 Comment(1)
This should be the accepted answer. Much simpler and easily reused.Backgammon
P
3

The answer of @jezrael is great and explains well. Here is the final result :

index_sorted = df.name.str.len().sort_values(ascending=True).index
df_sorted = df.reindex(index_sorted)
df_sorted = df_sorted.reset_index(drop=True)
Preoccupy answered 5/2, 2020 at 14:3 Comment(0)
L
3

A fancy and minimal solution:

df.iloc[df.agg({"name":len}).sort_values('name').index]



     name  score
1      Al      4
3    Greg      3
0   Steve      2
2  Markus      2
Leopold answered 10/7, 2020 at 14:31 Comment(1)
Nice one! thanxx !!Fango
O
0

It's worth using the key argument to avoid creating unnecessary columns:

df.sort_values("column_name", ascending=True, key=lambda col: col.str.len())
Onepiece answered 22/9, 2023 at 13:55 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.