Can I force os.walk to visit directories in alphabetical order?
Asked Answered
A

3

46

I would like to know if it's possible to force os.walk in python3 to visit directories in alphabetical order. For example, here is a directory and some code that will walk this directory:

ryan:~/bktest$ ls -1 sample
CD01
CD02
CD03
CD04
CD05

--------

def main_work_subdirs(gl):
    for root, dirs, files in os.walk(gl['pwd']):
        if root == gl['pwd']:
            for d2i in dirs:
                print(d2i)

When the python code hits the directory above, here is the output:

ryan:~/bktest$ ~/test.py sample
CD03
CD01
CD05
CD02
CD04

I would like to force walk to visit these dirs in alphabetical order, 01, 02 ... 05. In the python3 doc for os.walk, it says:

When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting

Does that mean that I can impose an alphabetical visiting order on os.walk? If so, how?

Archoplasm answered 12/7, 2011 at 19:37 Comment(0)
H
55

Yes. You sort dirs in the loop.

def main_work_subdirs(gl):
    for root, dirs, files in os.walk(gl['pwd']):
        dirs.sort()
        if root == gl['pwd']:
            for d2i in dirs:
                print(d2i)
Hamford answered 12/7, 2011 at 20:47 Comment(5)
so, that's pretty awesome. i thought the only thing you could do with generators was iterate over them.Archoplasm
@ryan_m: That is all you can do. But since the next step in the iteration isn't generated until after you are done with the first one, it allows tricks like this. :-)Hamford
Just do be clear, dirs is a list, not a generator.Aerothermodynamics
A subtle point I think is worth mentioning: it relies on the dirs object returned being modified in place, because os.walk continues to work with that list. Hence, if you pass topdown=False, so that the dirs are yielded after their contents, then it will not work properly. Similarly, if you did dirs = sorted(dirs) instead of dirs.sort(), it would also not work properly.Noblewoman
good point @Noblewoman More on difference between sorted(bx) and x.sort(): #22442878Cavour
E
7

I know this has already been answered but I wanted to add one little detail and adding more than a single line of code in the comments is wonky.

In addition to wanting the directories sorted I also wanted the files sorted so that my iteration through "gl" was consistent and predictable. To do this one more sort was required:

for root, dirs, files in os.walk(gl['pwd']):
  dirs.sort()
  for filename in sorted(files):
    print(os.path.join(root, filename))

And, with benefit of learning more about Python, a different (better) way:

from pathlib import Path
# Directories, per original question.
[print(p) for p in sorted(Path(gl['pwd']).glob('**/*')) if p.is_dir()]
# Files, like I usually need.
[print(p) for p in sorted(Path(gl['pwd']).glob('**/*')) if p.is_file()]
Elboa answered 7/1, 2018 at 16:31 Comment(0)
K
2

This answer is not specific to this question and the problem is a little different but the solution can be used in either case. Consider having these files ("one1.txt", "one2.txt", "one10.txt") and the content of all of them is a String "default":

I want to loop through a directory that contains these files and find a specific String in every file and replace it with the name of the file. If you use any other methods which have already mentioned here and in other questions (like dirs.sort() and sorted(files) and sorted(dirs), the result will be something like this:

"one1.txt"--> "one10"
"one2.txt"--> "one1"
"one10.txt" --> "one2"

But we want it to be:

"one1.txt"--> "one1"
"one2.txt"--> "one2"
"one10.txt" --> "one10"

I found this method which changes file content alphabetically:

import re, os, fnmatch

def atoi(text):
    return int(text) if text.isdigit() else text

def natural_keys(text):
    '''
    alist.sort(key=natural_keys) sorts in human order
    http://nedbatchelder.com/blog/200712/human_sorting.html
    (See Toothy's implementation in the comments)
    '''
    return [ atoi(c) for c in re.split('(\d+)', text) ]

def findReplace(directory, find, replace, filePattern):
    count = 0
    for path, dirs, files in sorted(os.walk(os.path.abspath(directory))):
        dirs.sort()
        for filename in sorted(fnmatch.filter(files, filePattern), key=natural_keys):
            count = count +1
            filepath = os.path.join(path, filename)
            with open(filepath) as f:
                s = f.read()
            s = s.replace(find, replace+str(count)+".png")
            with open(filepath, "w") as f:
                f.write(s)

Then run this line:

findReplace(os.getcwd(), "default", "one", "*.xml")
Kaliski answered 17/7, 2018 at 21:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.