How to skip directories in os walk Python 2.7
Asked Answered
H

2

5

I have written an image carving script to assist with my work. The tool carves images by specified extention and compares to a hash database.

The tool is used to search across mounted drives, some which have operating systems on.

The problem I am having is that when a drive is mounted with an OS, it is searching across the 'All Users' directory, and so is including images from my local disc.

I can't figure out how to skip the 'All Users' directory and just stick to the mounted drive.

My section for os.walk is as follows:

for path, subdirs, files in os.walk(root):
    for name in files:
        if re.match(pattern, name.lower()):
                appendfile.write (os.path.join(path, name))
                appendfile.write ('\n')
                log(name)
                i=i+1

Any help is much appreciated

Havre answered 16/7, 2015 at 8:58 Comment(2)
is All Users the name of the directory?Colvert
apologies the path i want to skip is always C:\Users\All UsersHavre
C
6

Assuming All Users is the name of the directory, you can remove the directory from your subdirs list, so that os.walk() does not iterate over it.

Example -

for path, subdirs, files in os.walk(root):
    if 'All Users' in subdirs:
        subdirs.remove('All Users')
    for name in files:
        if re.match(pattern, name.lower()):
                appendfile.write (os.path.join(path, name))
                appendfile.write ('\n')
                log(name)
                i=i+1

If you only want to not walk for All Users inside a particular parent, you can include the check for that as well in the above if condition.

From os.walk documentation -

os.walk(top, topdown=True, onerror=None, followlinks=False)

Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).

When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again. Modifying dirnames when topdown is False is ineffective, because in bottom-up mode the directories in dirnames are generated before dirpath itself is generated.

topdown is normally true, unless specified otherwise.

Colvert answered 16/7, 2015 at 9:6 Comment(8)
Will this skip all sub-directories within 'All Users'?Havre
Yes it will skip sub-directories within 'All Users' as wellColvert
@adrianus Did you even try this? It will never walk to All Users directory, no files or folders from it would be readColvert
Yes, I tried this, and all the files from the directory that is excluded still show up.Eat
that would not happen unless you have some other issue in your code.Colvert
When you remove a directory from the subdirs name, it does not go inside it (walk inside it).Colvert
I'm sorry, you're right. It will not walk though that directory and the files will get excluded, that is the correct solution.Eat
Would really like the other downvoters to comment why they think this is a bad solution.Colvert
P
2

if you have more than one directory to remove you can use a slice-assignment in oder to remove excluded directories in the subdirs

excl_dirs = {'All Users', 'some other dir'}

for path, dirnames, files in os.walk(root):
    dirnames[:] = [d for d in dirnames if d not in excl_dirs]
    ...

as the documentation states:

When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; ..

Paterfamilias answered 9/4, 2019 at 18:32 Comment(3)
why must like this [:] ?Tie
@AbdullahSaid added a bit of explanation. additionally: this is a slice assignment; it modifies the list in-place (i.e. does not create a new list).Paterfamilias
Thanks for this info: When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames;Prosimian

© 2022 - 2024 — McMap. All rights reserved.