In what order does os.walk iterates iterate? [duplicate]
Asked Answered
S

3

91

I am concerned about the order of files and directories given by os.walk(). If I have these directories, 1, 10, 11, 12, 2, 20, 21, 22, 3, 30, 31, 32, what is the order of the output list?

Is it sorted by numeric values?

1 2 3 10 20 30 11 21 31 12 22 32

Or sorted by ASCII values, like what is given by ls?

1 10 11 12 2 20 21 22 3 30 31 32

Additionally, how can I get a specific sort?

Scottiescottish answered 16/8, 2013 at 21:27 Comment(2)
Why not touch those files or mkdir those directories (you can do either all on one line) and find out?Kea
FYI: On Linux/ext3 this is currently the same as ls -U.Durkee
T
133

os.walk uses os.listdir. Here is the docstring for os.listdir:

listdir(path) -> list_of_strings

Return a list containing the names of the entries in the directory.

path: path of directory to list

The list is in arbitrary order. It does not include the special entries '.' and '..' even if they are present in the directory.

(my emphasis).

You could, however, use sort to ensure the order you desire.

for root, dirs, files in os.walk(path):
   for dirname in sorted(dirs):
        print(dirname)

(Note the dirnames are strings not ints, so sorted(dirs) sorts them as strings -- which is desirable for once.

As Alfe and Ciro Santilli point out, if you want the directories to be recursed in sorted order, then modify dirs in-place:

for root, dirs, files in os.walk(path):
   dirs.sort()
   for dirname in dirs:
        print(os.path.join(root, dirname))

You can test this yourself:

import os

os.chdir('/tmp/tmp')
for dirname in '1 10 11 12 2 20 21 22 3 30 31 32'.split():
     try:
          os.makedirs(dirname)
     except OSError: pass


for root, dirs, files in os.walk('.'):
   for dirname in sorted(dirs):
        print(dirname)

prints

1
10
11
12
2
20
21
22
3
30
31
32

If you wanted to list them in numeric order use:

for dirname in sorted(dirs, key=int):

To sort alphanumeric strings, use natural sort.

Teresiateresina answered 16/8, 2013 at 21:29 Comment(3)
The reason Python goes out of its way to avoid documenting any reliable order is that it uses different functions on different platforms (FindNextFileW, DosFindNext, readdir), and those functions are themselves documented to punt to the filesystem on most platforms, and the filesystems generally either don't document an order or give you something completely useless.Nubilous
I think this does not sort multi level hierarchies because sorted is not in-place. To do that use sort as explained by Alfe.Polyunsaturated
Why dirs.sort() works? For answer see the docs: docs.python.org/3/library/….Leialeibman
K
54

os.walk() yields in each step what it will do in the next steps. You can in each step influence the order of the next steps by sorting the lists the way you want them. Quoting the 2.7 manual:

When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting

So sorting the dirNames will influence the order in which they will be visited:

for rootName, dirNames, fileNames in os.walk(path):
  dirNames.sort()  # you may want to use the args cmp, key and reverse here

After this, the dirNames are sorted in-place and the next yielded values of walk will be accordingly.

Of course you also can sort the list of fileNames but that won't influence any further steps (because files don't have descendants walk will visit).

And of course you can iterate through sorted versions of these lists as unutbu's answer proposes, but that won't influence the further progress of the walk itself.

The unmodified order of the values is undefined by os.walk, meaning that it will be "any" order. You should not rely on what you experience today. But in fact it will probably be what the underlying file system returns. In some file systems this will be alphabetically ordered.

Keyte answered 16/8, 2013 at 21:47 Comment(3)
Downvoter: Please comment on what you didn't like.Keyte
To have natural sorting (1, 2, 10), have a look at: https://mcmap.net/q/41116/-non-alphanumeric-list-order-from-os-listdirTsarevna
I got it at the end. The trick is that on the first level of iteration where is only one single rootName which does not need to be sorted. Every next rootName is inside already-sorted dirNames of that toplevel rootName which guarantees proper recursive sort and in-place modification possibility at the same time!Kew
R
51

The simplest way is to sort the return values of os.walk(), e.g. using:

for rootName, dirNames, fileNames in sorted(os.walk(path)):
    #root, dirs and files are iterated in order... 
Rotberg answered 16/9, 2015 at 12:15 Comment(8)
I don't know why are people ignoring this answer, it's the cleanest and simplest solution... TYPanhellenic
sadly this didn't work for me :(Loma
I required both this and sorting of the lists I was interested in (fileNames in my case). Then it worked consistently across platforms. Thanks :)Bon
This will first collect all values the os.walk() delivers into a list, then sort that list, then run the for loop. This list can become very large. Collecting it can take a lot of time. Effectively the advantages of the generator-features of os.walk() are destroyed by this. Sorting the results for each directory in-place (see my answer) may seem a little more complicated but I think keeping the generator-advantages is worth the effort.Keyte
I used this to sort directories and files : for subdir, dirs, files in sorted(os.walk(rootDir)): for file in sorted(files):Walworth
This sorts the individual yield returns, but does not sort recursively by root path if you're yielding the return values in a function. E.g., if you were returning for f in files: yield root + f, the result of the function wouldn't be sorted.Diclinous
You can always use os.walk() with internal sorting instead. for path,dirs,files in sorted(os.walk(...)): work() results always in the same as for path,dirs,files in os.walk(...): \\ dirs.sort() \\ files.sort() \\ work() (the \\ being newlines). (Of course that sorted(walk) cannot be meant verbatim, but I guess you understand what I mean.) The sorting always can take place in the iteration itself.Keyte
@Alfe, you are right. The internal sorting works for all cases. Just simple sorted(os.walk()) makes in-place modification impossible which is serious disadvantage.Kew

© 2022 - 2024 — McMap. All rights reserved.