Making os.walk work in a non-standard way
Asked Answered
P

2

7

I'm trying to do the following, in this order:

Use os.walk() to go down each directory.
Each directory has subfolders, but I'm only interested in the first subfolder. So the directory looks like:

/home/RawData/SubFolder1/SubFolder2

For example. I want, in RawData2, to have folders that stop at the SubFolder1 level.

The thing is, it seems like os.walk() goes down through ALL of the RawData folder, and I'm not certain how to make it stop.

The below is what I have so far - I've tried a number of other combinations of substituting variable dirs for root, or files, but that doesn't seem to get me what I want.

import os 

for root, dirs, files in os.walk("/home/RawData"): 

    os.chdir("/home/RawData2/")
    make_path("/home/RawData2/"+str(dirs))
Prove answered 17/10, 2015 at 16:52 Comment(5)
It's not clear that what you mean, can you explain more?Ultraconservative
Well, os.walk() goes through all levels of RawData and associated subfolders. I'm interested in it only going down one level, instead of all of them. Maybe a different function would be more appropriate?Prove
Maybe glob would be a useful alternative?Cirrus
So you mean that you just have the path of root and the name of that sub-folder right?Ultraconservative
Yeah, so instead of keeping the entire structure, of SubFolder1/SubFolder2/SubFolder3, capping it at SubFolder1. Eventually, I'm going to take all files that are in SubFolders1,2,3 and put them into this new folder instead.Prove
H
1

I suggest you use glob instead.

As the help on glob describes:

glob(pathname)
    Return a list of paths matching a pathname pattern.

    The pattern may contain simple shell-style wildcards a la
    fnmatch. However, unlike fnmatch, filenames starting with a
    dot are special cases that are not matched by '*' and '?'
    patterns.

So, your pattern is every first level directory, which I think would be something like this:

/root_path/*/sub_folder1/sub_folder2

So, you start at your root, get everything in that first level, and then look for sub_folder1/sub_folder2. I think that works.

To put it all together:

from glob import glob

dirs = glob('/root_path/*/sub_folder1/sub_folder2')

# Then iterate for each path
for i in dirs:
    print(i)
Hook answered 17/10, 2015 at 17:30 Comment(2)
The thing is, I need sub_folder2 to be something which iterates over a bunch of different folders. I am unsure if your method would do that.Prove
So what you are saying is that you want to go through all the subdirectories of the filtered match? So if you filtered out /root_path/*/sub1/sub2. Then you want to take each of those and iterate each of those for everything else in those paths?Hook
P
1

Beware: Documentation for os.walk says:

don’t change the current working directory between resumptions of walk(). walk() never changes the current directory, and assumes that its caller doesn’t either

so you should avoid os.chdir("/home/RawData2/") in the walk loop.

You can easily ask walk not to recurse by using topdown=True and clearing dirs:

for root, dirs, files in os.walk("/home/RawData", True):
    for rep in dirs:
        make_path(os.join("/home/RawData2/", rep )
        # add processing here
    del dirs[]  # tell walk not to recurse in any sub directory
Phycomycete answered 17/10, 2015 at 17:39 Comment(3)
I'll be trying this in a moment. Will report back on how it goes.Prove
Well, this is better inasmuch it now produces the "SubFolder1", but it still produces all folders on the SubFolder2 level as well.Prove
@ZR: I should have tested it. A list is a mutable object, but l = [] does not change the original list but just make the reference point to an empty list. It should be del l[:]. Post editedPhycomycete

© 2022 - 2024 — McMap. All rights reserved.