A Python walker that can ignore directories
Asked Answered
R

4

8

I need a file system walker that I could instruct to ignore traversing directories that I want to leave untouched, including all subdirectories below that branch. The os.walk and os.path.walk just don't do it.

Rafaelita answered 29/5, 2009 at 8:52 Comment(0)
R
1

So I made this home-roles walker function:

import os
from os.path import join, isdir, islink, isfile

def mywalk(top, topdown=True, onerror=None, ignore_list=('.ignore',)):
    try:
        # Note that listdir and error are globals in this module due
        # to earlier import-*.
        names = os.listdir(top)
    except Exception, err:
        if onerror is not None:
            onerror(err)
        return
    if len([1 for x in names if x in ignore_list]):
        return 
    dirs, nondirs = [], []
    for name in names:
        if isdir(join(top, name)):
            dirs.append(name)
        else:
            nondirs.append(name)

    if topdown:
        yield top, dirs, nondirs
    for name in dirs:
        path = join(top, name)
        if not islink(path): 
            for x in mywalk(path, topdown, onerror, ignore_list):
                yield x
    if not topdown:
        yield top, dirs, nondirs
Rafaelita answered 29/5, 2009 at 8:59 Comment(0)
B
9

Actually, os.walk may do exactly what you want. Say I have a list (perhaps a set) of directories to ignore in ignore. Then this should work:

def my_walk(top_dir, ignore):
    for dirpath, dirnames, filenames in os.walk(top_dir):
        dirnames[:] = [ 
            dn for dn in dirnames 
            if os.path.join(dirpath, dn) not in ignore ]
        yield dirpath, dirnames, filenames
Bondwoman answered 29/5, 2009 at 10:6 Comment(6)
I somehow forgot about slice assignment, I took the liberty of adding that to my code.Grote
This is the expected way of doing so, even says so in the documentation of os.path.walk().Franza
No, I mean full slice assignment as a way of modifying the whole list, not the fact that you can change it.Grote
@Torsten Marek: you start your comment with “No”, while you don't say anything different than unwind, who mentioned the docs, and I quote: “When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment)”.Microreader
@TZ...: I believe that @Torsten was treating @unwind's comment as a response to @Torsten's initial comment, in which case it makes perfect sense (to me, at least).Bondwoman
I think it should be if os.path.join(dirpath, dn)... dirname is not defined.Marti
G
7

It is possible to modify the second element of os.walk's return values in-place:

[...] the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search [...]

def fwalk(root, predicate):
    for dirpath, dirnames, filenames in os.walk(root):
        dirnames[:] = [d for d in dirnames if predicate(r, d)]
        yield dirpath, dirnames, filenames

Now, you can just hand in a predicate for subdirectories:

>>> ignore_list = [...]
>>> list(fwalk("some/root", lambda r, d: d not in ignore_list))
Grote answered 29/5, 2009 at 10:5 Comment(0)
O
2

Here's the best and simple solution.

def walk(ignores):
    global ignore
    path = os.getcwd()
    for root, dirs, files in os.walk(path):
        for ignore in ignores:
            if(ignore in dirs):
                dirs.remove(ignore)
        print root
        print dirs
        print files
walk(['.git', '.svn'])

Remember, if you remove the folder name from dirs, it won't be explore by os.walk.

hope it helps

Orna answered 11/1, 2012 at 23:55 Comment(0)
R
1

So I made this home-roles walker function:

import os
from os.path import join, isdir, islink, isfile

def mywalk(top, topdown=True, onerror=None, ignore_list=('.ignore',)):
    try:
        # Note that listdir and error are globals in this module due
        # to earlier import-*.
        names = os.listdir(top)
    except Exception, err:
        if onerror is not None:
            onerror(err)
        return
    if len([1 for x in names if x in ignore_list]):
        return 
    dirs, nondirs = [], []
    for name in names:
        if isdir(join(top, name)):
            dirs.append(name)
        else:
            nondirs.append(name)

    if topdown:
        yield top, dirs, nondirs
    for name in dirs:
        path = join(top, name)
        if not islink(path): 
            for x in mywalk(path, topdown, onerror, ignore_list):
                yield x
    if not topdown:
        yield top, dirs, nondirs
Rafaelita answered 29/5, 2009 at 8:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.