Efficiently removing subdirectories in dirnames from os.walk
Asked Answered
M

3

8

On a mac in python 2.7 when walking through directories using os.walk my script goes through 'apps' i.e. appname.app, since those are really just directories of themselves. Well later on in processing I am hitting errors when going through them. I don't want to go through them anyways so for my purposes it would be best just to ignore those types of 'directories'.

So this is my current solution:

for root, subdirs, files in os.walk(directory, True):
    for subdir in subdirs:
        if '.' in subdir:
            subdirs.remove(subdir)
    #do more stuff

As you can see, the second for loop will run for every iteration of subdirs, which is unnecessary since the first pass removes everything I want to remove anyways.

There must be a more efficient way to do this. Any ideas?

Maceio answered 16/5, 2012 at 14:30 Comment(4)
For those unaware of this feature: removing a directory from the subdirs list returned by os.walk causes os.walk not to recurse into that directory.Occultism
The way os.walk works you won't iterate into the subdirectories that you remove from the list, so I don't understand why you're concerned.Morgen
@Occultism exactly that! which is why I don't think bottom-up would work for me. What I am doing in my example is exactly what I want to do as Mark Ransom is stating; I'm just asking if there is a more efficient way to do this since the for loop will be repeated for each of the valid subdirectories I will be iterating through; to me this seems inefficient, albeit not much of a performance hit anyways. My question really wraps around what a best practice would look like. Is this it?Maceio
@Mark Ransom, I am only concerned with the fact that the second for loop would be go through on each iteration of valid subdirs.Maceio
O
21

You can do something like this (assuming you want to ignore directories containing '.'):

subdirs[:] = [d for d in subdirs if '.' not in d]

The slice assignment (rather than just subdirs = ...) is necessary because you need to modify the same list that os.walk is using, not create a new one.

Note that your original code is incorrect because you modify the list while iterating over it, which is not allowed.

Occultism answered 16/5, 2012 at 14:41 Comment(5)
This would run on each iteration of what is left in the subdirs though correct? Because this would be placed below the first for loop?Maceio
Yep, just confirmed that it does iterate for each subdir that is still going to be processed.Maceio
@PatrickBateman: It has to execute for each directory, because each directory has different subdirectories.Occultism
oh I see and I can show that too! This looks much better then what I was doing.Maceio
Assuming the OP wants to hide Unix "hidden" directories starting with ".", I would suggest: subdirs[:] = [d for d in subdirs if not d.startswith('.')]Shively
E
1

Perhaps this example from the Python docs for os.walk will be helpful. It works from the bottom up (deleting).

# Delete everything reachable from the directory named in "top",
# assuming there are no symbolic links.
# CAUTION:  This is dangerous!  For example, if top == '/', it
# could delete all your disk files.
import os
for root, dirs, files in os.walk(top, topdown=False):
    for name in files:
        os.remove(os.path.join(root, name))
    for name in dirs:
        os.rmdir(os.path.join(root, name))

I am a bit confused about your goal, are you trying to remove a directory subtree and are encountering errors, or are you trying to walk a tree and just trying to list simple file names (excluding directory names)?

Echevarria answered 16/5, 2012 at 14:31 Comment(3)
The question is about not traversing into certain directories, not about deleting them.Occultism
@Occultism Oops .. I got taken by the "removing subdirectories" in the question, I'll re-read and update the question as needed. Thanks for pointing this out.Echevarria
@Echevarria I'm not encountering an error removing anything. Actually what I am doing is building a syncing app (for my own personal learning, i know programs exist that does that already) to take data from a target and copy it to a source. I am getting issues during the copy saying the file doesn't exist, even though I already went through the process of finding the files before hand. This is only happening when there are folders the process goes through with a .app extension (MAC). So, since I don't want to copy those anyways, I thought just removing those types of folders would be best.Maceio
T
0

I think all that is required is to remove the directory before iterating over it:

for root, subdirs, files in os.walk(directory, True):
        if '.' in subdirs:
            subdirs.remove('.')
        for subdir in subdirs:
            #do more stuff
Trituration answered 17/2, 2022 at 13:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.