How do you sort files numerically?
Asked Answered
S

6

45

I'm processing some files in a directory and need the files to be sorted numerically. I found some examples on sorting—specifically with using the lambda pattern—at wiki.python.org, and I put this together:

import re

file_names = """ayurveda_1.tif
ayurveda_11.tif
ayurveda_13.tif
ayurveda_2.tif
ayurveda_20.tif
ayurveda_22.tif""".split('\n')

num_re = re.compile('_(\d{1,2})\.')

file_names.sort(
    key=lambda fname: int(num_re.search(fname).group(1))
)

Is there a better way to do this?

Swashbuckler answered 7/1, 2011 at 7:34 Comment(4)
+1 for a proper question title.Spiffy
The right way to do what you're doing is to just ask the question in the question bit, then add your answer in an answer bit. Then sit back and wait ...Tomblin
@paxdiablo: Thank you for the instruction... I had read the FAQ to make sure I could answer, just wasn't quite sure about the mechanics. I'll do it right next time.Swashbuckler
No probs, Zachary, it's just that "How do I xyzzy?" is a must more useful question (as in more likely to elicit a wide range of possible answers) than "I have xyzzyed. What do you think of my method?" :-)Tomblin
R
70

This is called "natural sorting" or "human sorting" (as opposed to lexicographical sorting, which is the default). Ned B wrote up a quick version of one.

import re

def tryint(s):
    try:
        return int(s)
    except:
        return s

def alphanum_key(s):
    """ Turn a string into a list of string and number chunks.
        "z23a" -> ["z", 23, "a"]
    """
    return [ tryint(c) for c in re.split('([0-9]+)', s) ]

def sort_nicely(l):
    """ Sort the given list in the way that humans expect.
    """
    l.sort(key=alphanum_key)

It's similar to what you're doing, but perhaps a bit more generalized.

Rastus answered 7/1, 2011 at 7:48 Comment(6)
Thank you, Daniel! This was just what I was looking for. I followed the link you included and down the rabbit hole I went... weeee!!! I learned a little bit about the performance of try/except, and (of course) pre-compiling regexps. :)Swashbuckler
Will this work if we return a generator rather than a list comprehension?Oletaoletha
Doesn't handle negative embedded numbers properly.Volitive
@martineau: I understand that since the regexp is splitting only at the digit, that any sign character would be in the group before the number. Since this is just an indexed list of files starting at 1, I don't think this will be an issue.Swashbuckler
@Zachary Young: I suspected that handling negative numbers wasn't important to you, but made the comment only draw attention to the fact for others for whom it might be (after all, your question just says "numerically"). It's easy to fix, just use re.split('(-*[0-9]+)', s) instead...and even more generally, it can be made to handle [signed] real numbers, like -3.14, by using re.split('(-*\d+\.\d*)' , s). Lastly, if you don't want to define a separate function like sort_nicely(), you can always use tiffFiles.sort(key=alphanum_key) as you did in the code in your question.Volitive
If using real numbers, one should also convert the number to float not int (i.e. make a tryfloat(s) function instead of tryint(s))Smedley
S
21

Just use :

tiffFiles.sort(key=lambda var:[int(x) if x.isdigit() else x for x in re.findall(r'[^0-9]|[0-9]+', var)])

is faster than use try/except.

Sonatina answered 24/3, 2016 at 14:42 Comment(1)
This will fail if the file name contains, for example, "②" character.Trounce
S
7

@April provided a good solution in How is Pythons glob.glob ordered? that you could try

#First, get the files:
import glob
import re

files = glob.glob1(img_folder,'*'+output_image_format)

# Sort files according to the digits included in the filename
files = sorted(files, key=lambda x:float(re.findall("(\d+)",x)[0]))
Sternson answered 7/6, 2020 at 3:12 Comment(0)
I
5

If you are using key= in your sort method you shouldn't use cmp which has been removed from the latest versions of Python. key should be equated to a function which takes a record as input and returns any object which will compare in the order you want your list sorted. It doesn't need to be a lambda function and might be clearer as a stand alone function. Also regular expressions can be slow to evaluate.

You could try something like the following to isolate and return the integer part of the file name:

def getint(name):
    basename = name.partition('.')
    alpha, num = basename.split('_')
    return int(num)
tiffiles.sort(key=getint)
Isolation answered 7/1, 2011 at 8:5 Comment(2)
Thank you, Don. I really appreciate your explanation: very understandable. --ZacharySwashbuckler
@Don O'Donnell I got error AttributeError: 'tuple' object has no attribute 'split' so I modified a bit your code: basename = name.partition('.') I change with basename = name.split('.') (Important! Works only for filenames without dots) and alpha, num = basename.split('_') with alpha, num = basename[0].split('_') Anyway, you made my day. Thanks!Pelite
R
0

Partition results in Tuple

def getint(name):
    (basename, part, ext) = name.partition('.')
    (alpha, num) = basename.split('_')
    return int(num)
Rigorism answered 15/7, 2015 at 9:53 Comment(1)
Did you actually try that? (a, b, c) = 'ayurveda_11.tif'.split('.'), ValueError: need more than 2 values to unpackSwashbuckler
M
0

This is a modified version of @Don O'Donnell's answer, because I couldn't get it working as-is, but I think it's the best answer here as it's well-explained.

def getint(name):
    _, num = name.split('_')
    num, _ = num.split('.')
    return int(num)

print(sorted(tiffFiles, key=getint))

Changes:

1) The alpha string doesn't get stored, as it's not needed (hence _, num)

2) Use num.split('.') to separate the number from .tiff

3) Use sorted instead of list.sort, per https://docs.python.org/2/howto/sorting.html

Msg answered 18/10, 2018 at 16:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.