Python analog of PHP's natsort function (sort a list using a "natural order" algorithm) [duplicate]
Asked Answered
P

3

24

I would like to know if there is something similar to PHP natsort function in Python?

l = ['image1.jpg', 'image15.jpg', 'image12.jpg', 'image3.jpg']
l.sort()

gives:

['image1.jpg', 'image12.jpg', 'image15.jpg', 'image3.jpg']

but I would like to get:

['image1.jpg', 'image3.jpg', 'image12.jpg', 'image15.jpg']

UPDATE

Solution base on this link

def try_int(s):
    "Convert to integer if possible."
    try: return int(s)
    except: return s

def natsort_key(s):
    "Used internally to get a tuple by which s is sorted."
    import re
    return map(try_int, re.findall(r'(\d+|\D+)', s))

def natcmp(a, b):
    "Natural string comparison, case sensitive."
    return cmp(natsort_key(a), natsort_key(b))

def natcasecmp(a, b):
    "Natural string comparison, ignores case."
    return natcmp(a.lower(), b.lower())

l.sort(natcasecmp);
Precancel answered 30/3, 2010 at 13:29 Comment(3)
Not builtin, an not in the standard library AFAIK. There's a recipe for it here, and other implementations can be found by Google.Denby
You can check this link: Compact python human sortBrunhild
It's a natural order, image3.jpg is in it's placePrecancel
A
52

From my answer to Natural Sorting algorithm:

import re
def natural_key(string_):
    """See https://blog.codinghorror.com/sorting-for-humans-natural-sort-order/"""
    return [int(s) if s.isdigit() else s for s in re.split(r'(\d+)', string_)]

Example:

>>> L = ['image1.jpg', 'image15.jpg', 'image12.jpg', 'image3.jpg']
>>> sorted(L)
['image1.jpg', 'image12.jpg', 'image15.jpg', 'image3.jpg']
>>> sorted(L, key=natural_key)
['image1.jpg', 'image3.jpg', 'image12.jpg', 'image15.jpg']

To support Unicode strings, .isdecimal() should be used instead of .isdigit(). See example in @phihag's comment. Related: How to reveal Unicodes numeric value property.

.isdigit() may also fail (return value that is not accepted by int()) for a bytestring on Python 2 in some locales e.g., '\xb2' ('²') in cp1252 locale on Windows.

Alexei answered 13/6, 2010 at 18:11 Comment(10)
@phihag: It works on Python 3.Alexei
Oops, you're totally right. I messed up the test case - the error has nothing to do with Python 3. \d and isdigit just match values that int does not accept. Observe [u'²'].sort(key=natural_key).Inspector
Caveat: works for the specific example shown but fails for cases like ['elm1', 'Elm2'] and ['0.501', '0.55'] and [0.01, 0.1, 1] ... see #4837210 for lower() and my more general solution for Python natural sort order.Scurrility
@ScottLawton: it works as expected. It is ok to use different defintions of what "natural sorting" is. It is not ok to tell that other (wildly used) defintions are wrong.Alexei
May I continue to ask, that if my array is a 2d array like [['image1.jpg', 'pathToImage1'], ['image15.jpg', 'pathToImage15'], ['image12.jpg', 'pathToImage12'], ['image3.jpg', 'pathToImage3']], and I want it to be sorted the same way(sort by the numeric value or the first element of each sub array, returned [['image1.jpg', 'pathToImage1'], ['image3.jpg', 'pathToImage3'], ['image12.jpg', 'pathToImage12'], ['image15.jpg', 'pathToImage15']]), where should I tune this code to work? Thanks! (Do I need to open a new post for this question?)Splenic
@Hang: it is a very simple variation:sorted(L, lambda sublist: natural_key(sublist[0])) If it is unclear, work through Sorting HOW TO examples.Alexei
Thanks @jfs! I read through the examples and changed lambda sublist: natural_key(sublist[0]) to key=lambda sublist: natural_key(sublist[0]) so the code could run, but it seems like the order of the sublists doesn't get changed at all. I will try more and put feedback here :D PS: a repl here repl.it/@hanglearning/testSortSublistsSplenic
@Hang: sorted(L) returns a new list (L is not changed). L.sort()modifies the list inplace (L is changed). It is said at the very top of the link that I've provided (under "Sorting Basics" header).Alexei
@Alexei Oh! Sorry my bad! That's right! Make it equal to a new list and now it works!!! Thanks!!!Splenic
Great stuff! natsort from PyPI is great, too, but with this is I just have to add a single line of code instead of a whole new package to my app. And it absolutely does the job for file version comparison à la major_minor_patch.Peroxidize
W
18

You can check out the third-party natsort library on PyPI:

>>> import natsort
>>> l = ['image1.jpg', 'image15.jpg', 'image12.jpg', 'image3.jpg']
>>> natsort.natsorted(l)
['image1.jpg', 'image3.jpg', 'image12.jpg', 'image15.jpg']

Full disclosure, I am the author.

Winograd answered 24/8, 2013 at 5:41 Comment(4)
I wanted to use it, but I didn't find it for python 3.5Measures
@Measures It is compatible with both python 2 and python 3. I am curious how you concluded that it is not available for python 3.Winograd
I tried to use it and natsort was not available. So I asked MacPort to install it, but it wanted to force me to install python 3.4 or 2.7 along with natsort, which I don't want because python 3.5 is already installed.Measures
@Measures It sounds like something to report to the MacPort folks. natsort works on all modern versions of python. You can use pip, or if you are on Mac I would consider changing to Homebrew.Winograd
I
2

This function can be used as the key= argument for sorted in Python 2.x and 3.x:

def sortkey_natural(s):
    return tuple(int(part) if re.match(r'[0-9]+$', part) else part
                for part in re.split(r'([0-9]+)', s))
Inspector answered 21/5, 2012 at 13:24 Comment(3)
.isdecimal() is unicode only method. It won't work on bytestrings. .isdecimal() matches the same set of characters ([Nd]) as \d which is larger than [0-9] in Unicode case.Alexei
I have no idea what the semantics of sorting two byte strings would be, so I didn't consider it. But you're right, the test is faulty. Switched to re.match.Inspector
+1. You don't use proper Unicode sorting so I don't see why you would reject bytestrings. btw, On *nix filenames are just bytes. You don't want ls to break just because there is a funny filename in a directory.Alexei

© 2022 - 2024 — McMap. All rights reserved.