os.walk multiple directories at once [duplicate]
Asked Answered
L

4

17

Possible Duplicate:
How to join two generators in Python?

Is there a way in python to use os.walk to traverse multiple directories at once?

my_paths = []
path1 = '/path/to/directory/one/'
path2 = '/path/to/directory/two/'
for path, dirs, files in os.walk(path1, path2):
    my_paths.append(dirs)

The above example doesn't work (as os.walk only accepts one directory), but I was hoping for a more elegant solution rather than calling os.walk twice (plus then I can sort it all at once). Thanks.

Lackluster answered 28/9, 2011 at 19:36 Comment(7)
What about https://mcmap.net/q/108317/-how-to-join-two-generators-or-other-iterables-in-python/320726 ?Enantiomorph
@Enantiomorph nice catch; it's an exact duplicate.Ramires
And quite appropriate, as we now have three identical answers as well as it being an identical question.Ramires
Exact duplicate question with exact duplicate answers! Nice!Huffman
@DavidHeffernan I was first by 90 seconds :PRamires
@Ramires Actually, Phillip beat you by 14 months!! ;-)Huffman
Apologies for not seeing that.Lackluster
R
35

To treat multiples iterables as one, use itertools.chain:

from itertools import chain

paths = ('/path/to/directory/one/', '/path/to/directory/two/', 'etc.', 'etc.')
for path, dirs, files in chain.from_iterable(os.walk(path) for path in paths):
Ramires answered 28/9, 2011 at 19:40 Comment(1)
Thanks, amazing!Courtund
H
7

Use itertools.chain().

for path, dirs, files in itertools.chain(os.walk(path1), os.walk(path2)):
    my_paths.append(dirs)
Huffman answered 28/9, 2011 at 19:42 Comment(0)
A
5

Others have mentioned itertools.chain.

There's also the option of just nesting one level more:

my_paths = []
for p in ['/path/to/directory/one/', '/path/to/directory/two/']:
    for path, dirs, files in os.walk(p):
        my_paths.append(dirs)
Aplomb answered 28/9, 2011 at 19:54 Comment(4)
I had thought about doing it that way, but I figured there was a more "pythonic" way of doing it. Thanks!Lackluster
I like this solution since it does not require a special lib.Stringfellow
For my use case this is perfectly simple, thanks.Ailin
The chain solution is work but personally I like this one. Less import.Revelatory
T
1

since nobody mentioned it, in this or the other referenced post:

http://docs.python.org/library/multiprocessing.html

>>> from multiprocessing import Pool
>>> p = Pool(5)
>>> def f(x):
...     return x*x
...
>>> p.map(f, [1,2,3])

in this case, you'd have a list of directories. the call to map would return a list of lists from each dir, you could then choose to flatten it, or keep your results clustered

def t(p):
    my_paths = []
    for path, dirs, files in os.walk(p):
        my_paths.append(dirs)


paths = ['p1','p2','etc']
p = Pool(len(paths))
dirs = p.map(t,paths)
Threlkeld answered 28/9, 2011 at 20:19 Comment(2)
He doesn't mean "at once" as in "at the same time" but as in "as a set" or "as a unit", so your answer doesn't really address his question.Ramires
I believe it does both right? Not only do you get back your search along multiple paths as a list, which is what everyone's chain() suggestion does, but this has the added benefit of doing all these searches as a separate process. What if these are paths do unique drives. If that's the case you get even better results using this method since you are searching multiple drives simultaneously.Threlkeld

© 2022 - 2024 — McMap. All rights reserved.