Non-alphanumeric list order from os.listdir()
Asked Answered
O

14

209

I often use python to process directories of data. Recently, I have noticed that the default order of the lists has changed to something almost nonsensical. For example, if I am in a current directory containing the following subdirectories: run01, run02, ... run19, run20, and then I generate a list from the following command:

dir = os.listdir(os.getcwd())

then I usually get a list in this order:

dir = ['run01', 'run18', 'run14', 'run13', 'run12', 'run11', 'run08', ... ]

and so on. The order used to be alphanumeric. But this new order has remained with me for a while now.

What is determining the (displayed) order of these lists?

Osteopath answered 27/1, 2011 at 5:30 Comment(0)
W
91

I think the order has to do with the way the files are indexed on your FileSystem. If you really want to make it adhere to some order you can always sort the list after getting the files.

Watery answered 27/1, 2011 at 5:41 Comment(0)
F
248

You can use the builtin sorted function to sort the strings however you want. Based on what you describe,

sorted(os.listdir(whatever_directory))

Alternatively, you can use the .sort method of a list:

lst = os.listdir(whatever_directory)
lst.sort()

I think should do the trick.

Note that the order that os.listdir gets the filenames is probably completely dependent on your filesystem.

Ferryboat answered 21/2, 2013 at 13:35 Comment(10)
Does not change the order if dealing with number-first filenames (ie 59.9780radps-0096 is still before 9.9746radps-0082). I think it's because everything is a string, so the decimal is not treated properly.Phillipphillipe
@Phillipphillipe -- Correct, It's sorting lexicographically as strings. To get it to sort some other way, you'd need to define a key function that determined the sort order. In your case, you'd want the key function to look at the string and return 59.9780 or 9.9746 (as float) for your filenames respectively.Ferryboat
Or use the natsort library, which I just found.Phillipphillipe
Only sorted(listdir) worked for me. listdir.sort() gave me: TypeError: 'NoneType' object is not iterableAngkor
@Angkor -- listdir.sort() won't work for statements like for i in listdir.sort(), because list.sort() method change the order of items in lists IN PLACE, which means process the list itself but won't return anything but None. So you need to use a_list = listdir('some_path'); a_list.sort() then do for i in a_listPreceptor
Do you know how to change the order to ascending or descending using .sort ?Bonkers
@AlexB -- sure ... just pass reverse=True to make it descending sort.Ferryboat
@Ferryboat is it possible to do it in like one line? something like lst = os.listdir(whatever_directory).sort() - this of course will just make lst = None, but do we need to do it in two lines?Shampoo
@user3895596 -- I think that the sorted thing written first does it in a single line OK?Ferryboat
You can use key in sorted to parse more complex filenames. A simple example of sorting a list like this ['0001.txt', '0002.txt'] is: sorted(os.listdir(path), key=lambda filename: int(filename.split('.')[0]))Nammu
W
91

I think the order has to do with the way the files are indexed on your FileSystem. If you really want to make it adhere to some order you can always sort the list after getting the files.

Watery answered 27/1, 2011 at 5:41 Comment(0)
R
64

Per the documentation:

os.listdir(path)

Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order. It does not include the special entries '.' and '..' even if they are present in the directory.

Order cannot be relied upon and is an artifact of the filesystem.

To sort the result, use sorted(os.listdir(path)).

Ramires answered 27/1, 2011 at 7:26 Comment(0)
A
63

Python for whatever reason does not come with a built-in way to have natural sorting (meaning 1, 2, 10 instead of 1, 10, 2), so you have to write it yourself:

import re
def sorted_alphanumeric(data):
    convert = lambda text: int(text) if text.isdigit() else text.lower()
    alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] 
    return sorted(data, key=alphanum_key)

You can now use this function to sort a list:

dirlist = sorted_alphanumeric(os.listdir(...))

PROBLEMS: In case you use the above function to sort strings (for example folder names) and want them sorted like Windows Explorer does, it will not work properly in some edge cases.
This sorting function will return incorrect results on Windows, if you have folder names with certain 'special' characters in them. For example this function will sort 1, !1, !a, a, whereas Windows Explorer would sort !1, 1, !a, a.

So if you want to sort exactly like Windows Explorer does in Python you have to use the Windows built-in function StrCmpLogicalW via ctypes (this of course won't work on Unix):

from ctypes import wintypes, windll
from functools import cmp_to_key

def winsort(data):
    _StrCmpLogicalW = windll.Shlwapi.StrCmpLogicalW
    _StrCmpLogicalW.argtypes = [wintypes.LPWSTR, wintypes.LPWSTR]
    _StrCmpLogicalW.restype  = wintypes.INT

    cmp_fnc = lambda psz1, psz2: _StrCmpLogicalW(psz1, psz2)
    return sorted(data, key=cmp_to_key(cmp_fnc))

This function is slightly slower than sorted_alphanumeric().

Bonus: winsort can also sort full paths on Windows.

Alternatively, especially if you use Unix, you can use the natsort library (pip install natsort) to sort by full paths in a correct way (meaning subfolders at the correct position).

You can use it like this to sort full paths:

from natsort import natsorted, ns
dirlist = natsorted(dirlist, alg=ns.PATH | ns.IGNORECASE)

Starting with version 7.1.0 natsort supports os_sorted which internally uses either the beforementioned Windows API or Linux sorting and should be used instead of natsorted().

Audie answered 30/12, 2017 at 2:7 Comment(5)
Works perfectly fine. print( sorted_aphanumeric(["1", "10", "2", "foo_10", "foo_8"]) ) -> ['1', '2', '10', 'foo_8', 'foo_10']. Exactly as expected.Audie
There is a longstanding open issue on natsorted to get Windows Explorer matching functionality implemented. Perhaps you should contribute a solution? github.com/SethMMorton/natsort/issues/41Parlour
The winsort function was exactly what I needed :)Newfeld
worked great on files name finishing a _1, _2,..._10, _11, etcEsemplastic
This is really helpful!Scandium
M
22

I think by default the order is determined with the ASCII value. The solution to this problem is this

dir = sorted(os.listdir(os.getcwd()), key=len)
Musser answered 9/7, 2019 at 6:24 Comment(1)
None of the above worked for me, that "key-len" seemed to be the last remaining trick, thanks so much.Besmear
A
13

Use natsort library:

Install the library with the following command for Ubuntu and other Debian versions

Python 2

sudo pip install natsort

Python 3

sudo pip3 install natsort

Details of how to use this library is found here

from natsort import natsorted

files = ['run01', 'run18', 'run14', 'run13', 'run12', 'run11', 'run08']
natsorted(files)

[out]:
['run01', 'run08', 'run11', 'run12', 'run13', 'run14', 'run18']
  • This is not a duplicate of answer. natsort was added as an edit on 2020-01-27.
Alsatia answered 22/8, 2018 at 13:36 Comment(1)
That is more accurate than sorted()! ThanksMiceli
A
5

It's probably just the order that C's readdir() returns. Try running this C program:

#include <dirent.h>
#include <stdio.h>

int main(void){
   DIR *dirp;
   struct dirent* de;
   dirp = opendir(".");
   while(de = readdir(dirp)) // Yes, one '='.
        printf("%s\n", de->d_name);
   closedir(dirp);
   return 0;
}

The build line should be something like gcc -o foo foo.c.

P.S. Just ran this and your Python code, and they both gave me sorted output, so I can't reproduce what you're seeing.

Anamorphoscope answered 27/1, 2011 at 5:46 Comment(1)
The reason that you're seeing soted output may depend on a lot of factors, such as OS, filesystem, time of creation of files, actions during the last defragmentation, ...Fafnir
M
5
aaa = ['row_163.pkl', 'row_394.pkl', 'row_679.pkl', 'row_202.pkl', 'row_1449.pkl', 'row_247.pkl', 'row_1353.pkl', 'row_749.pkl', 'row_1293.pkl', 'row_1304.pkl', 'row_78.pkl', 'row_532.pkl', 'row_9.pkl', 'row_1435.pkl']                                                                                                                                                                                                                                                                                                 
sorted(aaa, key=lambda x: int(os.path.splitext(x.split('_')[1])[0]))

As In case of mine requirement I have the case like row_163.pkl here os.path.splitext('row_163.pkl') will break it into ('row_163', '.pkl') so need to split it based on '_' also.

but in case of your requirement you can do something like

sorted(aa, key = lambda x: (int(re.sub('\D','',x)),x))

where

aa = ['run01', 'run08', 'run11', 'run12', 'run13', 'run14', 'run18']

and also for directory retrieving you can do sorted(os.listdir(path))

and for the case of like 'run01.txt' or 'run01.csv' you can do like this

sorted(files, key=lambda x : int(os.path.splitext(x)[0]))
Maxentia answered 6/9, 2017 at 12:46 Comment(1)
Unarguably best answer here.Anecdotal
S
3

The proposed combination of os.listdir and sorted commands generates the same result as ls -l command under Linux. The following example verifies this assumption:

user@user-PC:/tmp/test$ touch 3a 4a 5a b c d1 d2 d3 k l p0 p1 p3 q 410a 409a 408a 407a
user@user-PC:/tmp/test$ ls -l
total 0
-rw-rw-r-- 1 user user 0 Feb  15 10:31 3a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 407a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 408a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 409a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 410a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 4a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 5a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 b
-rw-rw-r-- 1 user user 0 Feb  15 10:31 c
-rw-rw-r-- 1 user user 0 Feb  15 10:31 d1
-rw-rw-r-- 1 user user 0 Feb  15 10:31 d2
-rw-rw-r-- 1 user user 0 Feb  15 10:31 d3
-rw-rw-r-- 1 user user 0 Feb  15 10:31 k
-rw-rw-r-- 1 user user 0 Feb  15 10:31 l
-rw-rw-r-- 1 user user 0 Feb  15 10:31 p0
-rw-rw-r-- 1 user user 0 Feb  15 10:31 p1
-rw-rw-r-- 1 user user 0 Feb  15 10:31 p3
-rw-rw-r-- 1 user user 0 Feb  15 10:31 q

user@user-PC:/tmp/test$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.listdir( './' )
['d3', 'k', 'p1', 'b', '410a', '5a', 'l', 'p0', '407a', '409a', '408a', 'd2', '4a', 'p3', '3a', 'q', 'c', 'd1']
>>> sorted( os.listdir( './' ) )
['3a', '407a', '408a', '409a', '410a', '4a', '5a', 'b', 'c', 'd1', 'd2', 'd3', 'k', 'l', 'p0', 'p1', 'p3', 'q']
>>> exit()
user@user-PC:/tmp/test$ 

So, for someone who wants to reproduce the result of the well-known ls -l command in their python code, sorted( os.listdir( DIR ) ) works pretty well.

Sing answered 15/2, 2017 at 8:45 Comment(0)
V
3

From the documentation:

The list is in arbitrary order, and does not include the special entries '.' and '..' even if they are present in the directory.

This means that the order is probably OS/filesystem dependent, has no particularly meaningful order, and is therefore not guaranteed to be anything in particular. As many answers mentioned: if preferred, the retrieved list can be sorted.

Cheers :)

Variometer answered 24/10, 2019 at 0:36 Comment(0)
G
2

I found "sort" does not always do what I expected. eg, I have a directory as below, and the "sort" give me a very strange result:

>>> os.listdir(pathon)
['2', '3', '4', '5', '403', '404', '407', '408', '410', '411', '412', '413', '414', '415', '416', '472']
>>> sorted([ f for f in os.listdir(pathon)])
['2', '3', '4', '403', '404', '407', '408', '410', '411', '412', '413', '414', '415', '416', '472', '5']

It seems it compares the first character first, if that is the biggest, it would be the last one.

Ganger answered 29/1, 2014 at 7:37 Comment(3)
This is expected behavior. ('5' > '403') is True.Shephard
@Shephard is correct, because at this point you're comparing the alphanumeric sort, not quantitative values of the numbers. In order to get a sort similar to your expectation, you may want to use number padding on your folders... ['002', '003', '004', '005', '403', '404', '405', '406']Colonialism
Don't use padding, just use a string to number conversion as sorting key: sorted(os.listdir(pathon), key=int) will properly return ['2', '3', '4', '5', '403', ...].Uropod
G
1
In [6]: os.listdir?

Type:       builtin_function_or_method
String Form:<built-in function listdir>
Docstring:
listdir(path) -> list_of_strings
Return a list containing the names of the entries in the directory.
path: path of directory to list
The list is in **arbitrary order**.  It does not include the special
entries '.' and '..' even if they are present in the directory.
Gnarly answered 21/2, 2013 at 13:36 Comment(4)
This explains why they are seeing the behaviour, without offering a solution.Equal
OP just want to know why, not how.Gnarly
@Gnarly thanks for pointing this out - I didn't notice it beforeCrumby
@DanielWatkins OK, Not it isnt.)Gnarly
W
1

To answer the question directly, you can use the following code.

dir = ['run01', 'run18', 'run14', 'run13', 'run12', 'run11', 'run08']
for file in sorted(dir, key=lambda x:int(x.replace('run', ''))):
    print(file)

It will print:

run01
run08
run11
run12
run13
run14
run18

This approach uses the Python built-in method sorted, and, through the key argument, it specifies the sorting criterium, that is, the list item without 'run' casted to an integer.

Writein answered 26/1, 2022 at 14:1 Comment(0)
I
0

ls by default previews the files sorted by name. (ls options can be used to sort by date, size, ...)

files = list(os.popen("ls"))
files = [file.strip("\n") for file in files]

Using ls would have much better performance when the directory contains so many files.

Imbue answered 26/1, 2021 at 10:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.