Travel directory tree with limited recursion depth
Asked Answered
M

2

12

I need to process all files in a directory tree recursively, but with a limited depth.

That means for example to look for files in the current directory and the first two subdirectory levels, but not any further. In that case, I must process e.g. ./subdir1/subdir2/file, but not ./subdir1/subdir2/subdir3/file.

How would I do this best in Python 3?

Currently I use os.walk to process all files up to infinite depth in a loop like this:

for root, dirnames, filenames in os.walk(args.directory):
    for filename in filenames:
        path = os.path.join(root, filename)
        # do something with that file...

I could think of a way counting the directory separators (/) in root to determine the current file's hierarchical level and break the loop if that level exceeds the desired maximum.

I consider this approach as maybe insecure and probably pretty inefficient when there's a large number of subdirectories to ignore. What would be the optimal approach here?

Marivaux answered 10/2, 2016 at 12:54 Comment(1)
Related: List all subdirectories on given levelTuddor
M
22

I think the easiest and most stable approach would be to copy the functionality of os.walk straight out of the source and insert your own depth-controlling parameter.

import os
import os.path as path

def walk(top, topdown=True, onerror=None, followlinks=False, maxdepth=None):
    islink, join, isdir = path.islink, path.join, path.isdir

    try:
        names = os.listdir(top)
    except OSError, err:
        if onerror is not None:
            onerror(err)
        return

    dirs, nondirs = [], []
    for name in names:
        if isdir(join(top, name)):
            dirs.append(name)
        else:
            nondirs.append(name)

    if topdown:
        yield top, dirs, nondirs

    if maxdepth is None or maxdepth > 1:
        for name in dirs:
            new_path = join(top, name)
            if followlinks or not islink(new_path):
                for x in walk(new_path, topdown, onerror, followlinks, None if maxdepth is None else maxdepth-1):
                    yield x
    if not topdown:
        yield top, dirs, nondirs

for root, dirnames, filenames in walk(args.directory, maxdepth=2):
    #...

If you're not interested in all those optional parameters, you can pare down the function pretty substantially:

import os

def walk(top, maxdepth):
    dirs, nondirs = [], []
    for name in os.listdir(top):
        (dirs if os.path.isdir(os.path.join(top, name)) else nondirs).append(name)
    yield top, dirs, nondirs
    if maxdepth > 1:
        for name in dirs:
            for x in walk(os.path.join(top, name), maxdepth-1):
                yield x

for x in walk(".", 2):
    print(x)
Migdaliamigeon answered 10/2, 2016 at 13:14 Comment(3)
That's a pretty long piece of code for a small problem... I'd prefer a more compact solution if possible. And I think you mean for ... in walk(...): in the second last line instead of os.walk, don't you?Marivaux
Funny, I was just composing a shorter version :-) and you're right about the errant os. on the penultimate line; fixed.Migdaliamigeon
That short version looks cool. I modified it to not return directories (as I only need files), and to compare if maxdepth != 0 so that 0 means only the current directory and I can use negative values to travel the entire directory structure.Marivaux
J
12

Starting in python 3.5, os.scandir is used in os.walk instead of os.listdir. It works many times faster. I corrected @kevin sample a little.

import os

def walk(top, maxdepth):
    dirs, nondirs = [], []
    for entry in os.scandir(top):
        (dirs if entry.is_dir() else nondirs).append(entry.path)
    yield top, dirs, nondirs
    if maxdepth > 1:
        for path in dirs:
            for x in walk(path, maxdepth-1):
                yield x

for x in walk(".", 2):
    print(x)
Jasmin answered 19/11, 2018 at 18:10 Comment(4)
it's much faster on windows. And there are backports (scandir module) for python < 3.5Brach
walkMaxDepth is not defined. should be walk?Bituminous
It's funny that in two years no one paid attention to this mistake. I took the code from two different places and the copy paste resulted in different names. This is recursion and instead of walkMaxDepth there should be the name of the motherboard function walk. I have fixed this in code. Thank you for paying attention to this. I myself suffer a lot when the finished snippet does not work.Jasmin
To be sufficiently like os.walk, the nodirs list should consist of basenames only. In these solutions, it contains full paths. This is tad ugly, but it would make walk & os.walk produce similar results: (dirs if entry.is_dir() else nondirs).append(entry.path if entry.is_dir() else os.path.basename(entry.path))Eon

© 2022 - 2024 — McMap. All rights reserved.