Filtering os.walk() dirs and files
Asked Answered
A

8

53

I'm looking for a way to include/exclude files patterns and exclude directories from a os.walk() call.

Here's what I'm doing by now:

import fnmatch
import os

includes = ['*.doc', '*.odt']
excludes = ['/home/paulo-freitas/Documents']

def _filter(paths):
    for path in paths:
        if os.path.isdir(path) and not path in excludes:
            yield path

        for pattern in (includes + excludes):
            if not os.path.isdir(path) and fnmatch.fnmatch(path, pattern):
                yield path

for root, dirs, files in os.walk('/home/paulo-freitas'):
    dirs[:] = _filter(map(lambda d: os.path.join(root, d), dirs))
    files[:] = _filter(map(lambda f: os.path.join(root, f), files))

    for filename in files:
        filename = os.path.join(root, filename)

        print(filename)

Is there a better way to do this? How?

Ashmore answered 28/2, 2011 at 11:36 Comment(0)
P
65

This solution uses fnmatch.translate to convert glob patterns to regular expressions (it assumes the includes only is used for files):

import fnmatch
import os
import os.path
import re

includes = ['*.doc', '*.odt'] # for files only
excludes = ['/home/paulo-freitas/Documents'] # for dirs and files

# transform glob patterns to regular expressions
includes = r'|'.join([fnmatch.translate(x) for x in includes])
excludes = r'|'.join([fnmatch.translate(x) for x in excludes]) or r'$.'

for root, dirs, files in os.walk('/home/paulo-freitas'):

    # exclude dirs
    dirs[:] = [os.path.join(root, d) for d in dirs]
    dirs[:] = [d for d in dirs if not re.match(excludes, d)]

    # exclude/include files
    files = [os.path.join(root, f) for f in files]
    files = [f for f in files if not re.match(excludes, f)]
    files = [f for f in files if re.match(includes, f)]

    for fname in files:
        print fname
Pepper answered 28/2, 2011 at 12:15 Comment(7)
Ermm, we need if excludes checks in both re.match(excludes, ...), no? If excludes = [], it'll match all entries. But I liked your approach, much more clear. :)Ashmore
@pf.me: You're right, I did not considered that case. So either you 1) wrap the exclude list comprehension in an if exclude, 2) prefix not re.match(excludes, ...) with not exclude or, or 3) set excludes to a never matching regex if the original excludes is empty. I updated my answer using variant 3.Pepper
After some googling, it would appear that the point of the [:] syntax dirs[:] = [os.path.join(root, d) for d in dirs] is to employ the mutating slice method, which alters the list in place, instead of creating a new list. This caught me out - without the [:], it doesn't work.Scotsman
I still do not get mechanics , how dirs[:] alter original list? All manuals says that slice[:] returns new fresh copy of the list, with members as pointers to the original list values.Here is a discussion on Stack about this. So how does it happen that dirs[:] alter original list?Dillondillow
@Daniel: Slicing may not only be used to get values of a list but also to assign selected items. As [:] denotes the complete list, assigning to this slice replaces the whole previous content of the list. See docs.python.org/2/library/stdtypes.html#mutable-sequence-types.Pepper
As stated below by @kojiro, I guess you need to provide topdown=True to os.walk, so that the dirs can be modified in place?Beavers
@gonvaled topdown=True is the default.Pepper
S
24

From docs.python.org:

os.walk(top[, topdown=True[, onerror=None[, followlinks=False]]])

When topdown is True, the caller can modify the dirnames list in-place … this can be used to prune the search …

for root, dirs, files in os.walk('/home/paulo-freitas', topdown=True):
    # excludes can be done with fnmatch.filter and complementary set,
    # but it's more annoying to read.
    dirs[:] = [d for d in dirs if d not in excludes] 
    for pat in includes:
        for f in fnmatch.filter(files, pat):
            print os.path.join(root, f)

I should point out that the above code assumes excludes is a pattern, not a full path. You would need to adjust the list comprehension to filter if os.path.join(root, d) not in excludes to match the OP case.

Soper answered 28/2, 2011 at 12:2 Comment(2)
What do excludes and includes look like here? Is there an example to go with this answer?Bouchard
A dumb question, if i say exclude on a directory,does it exclude everything under that directory? or it will only skip that directory but will still navigate to its subdirecroy?sorry if this has been asked before. But, basically if I want to exclude the directory and everything under it what should i be looking for?Overturf
T
12

why fnmatch?

import os
excludes=....
for ROOT,DIR,FILES in os.walk("/path"):
    for file in FILES:
       if file.endswith(('doc','odt')):
          print file
    for directory in DIR:
       if not directory in excludes :
          print directory

not exhaustively tested

Toitoiboid answered 28/2, 2011 at 11:42 Comment(5)
The endswith should be .doc and .odt instead. Because a file with name such as mydoc [with no file extension] will be returned in the above code. Also, I think this will meet just the specific case the OP has posted. The excludes may contain files too and inclides may contain dirs I guess.Respect
You need fnmatch if you have to make use of glob patterns (though this is not the case in the example given in the question).Pepper
@Oben Sonne, glob (IMO) has more "functionality" than fnmatch. for eg, path name expansion. You could do this for example glob.glob("/path/*/*/*.txt").Toitoiboid
Good point. For simple include/exclude patterns glob.glob() probably would be the better solution at all.Pepper
Out of good practices and simplifing debugging I try to not use variable names that match built-in types like your use of "file" as that is a built-in type.Roseliaroselin
S
1

dirtools is perfect for your use-case:

from dirtools import Dir

print(Dir('.', exclude_file='.gitignore').files())
Scold answered 18/7, 2014 at 21:42 Comment(0)
M
0

Here is one way to do that

import fnmatch
import os

excludes = ['/home/paulo-freitas/Documents']
matches = []
for path, dirs, files in os.walk(os.getcwd()):
    for eachpath in excludes:
        if eachpath in path:
            continue
    else:
        for result in [os.path.abspath(os.path.join(path, filename)) for
                filename in files if fnmatch.fnmatch(filename,'*.doc') or fnmatch.fnmatch(filename,'*.odt')]:
            matches.append(result)
print matches
Malamut answered 28/2, 2011 at 11:51 Comment(4)
There's a typo: filename.odt should be `filename, '*.odt'Pepper
Impractical if the number of include patterns grows. Also, does not allow to use glob patterns for dir names to exclude.Pepper
Oben, corrected the mistake. I agree with the include patterns part. It can be coded where it is more generic.Malamut
should that continue under the "if eachpath in path" be a break?Laney
C
0
import os
includes = ['*.doc', '*.odt']
excludes = ['/home/paulo-freitas/Documents']
def file_search(path, exe):
for x,y,z in os.walk(path):
    for a in z:
        if a[-4:] == exe:
            print os.path.join(x,a)
        for x in includes:
            file_search(excludes[0],x)
Candelabra answered 26/12, 2012 at 5:33 Comment(0)
S
0

This is an example of excluding directories and files with os.walk():

ignoreDirPatterns=[".git"]
ignoreFilePatterns=[".php"]
def copyTree(src, dest, onerror=None):
    src = os.path.abspath(src)
    src_prefix = len(src) + len(os.path.sep)
    for root, dirs, files in os.walk(src, onerror=onerror):
        for pattern in ignoreDirPatterns:
            if pattern in root:
                break
        else:
            #If the above break didn't work, this part will be executed
            for file in files:
                for pattern in ignoreFilePatterns:
                    if pattern in file:
                        break
                else:
                    #If the above break didn't work, this part will be executed
                    dirpath = os.path.join(dest, root[src_prefix:])
                    try:
                        os.makedirs(dirpath,exist_ok=True)
                    except OSError as e:
                        if onerror is not None:
                            onerror(e)
                    filepath=os.path.join(root,file)
                    shutil.copy(filepath,dirpath)
                continue;#If the above else didn't executed, this will be reached

        continue;#If the above else didn't executed, this will be reached

python >=3.2 due to exist_ok in makedirs

Superhighway answered 11/5, 2015 at 18:57 Comment(0)
F
0

The above methods had not worked for me.

So, This is what I came up with an expansion of my original answer to another question.

What worked for me was:

if (not (str(root) + '/').startswith(tuple(exclude_foldr)))

which compiled a path and excluded the tuple of my listed folders.

This gave me the exact result I was looking for.

My goal for this was to keep my mac organized.

I can Search any folder by path, locate & move specific file.types, ignore subfolders and i preemptively prompt the user if they want to move the files.

NOTE: the Prompt is only one time per run and is NOT per file

By Default the prompt defaults to NO when you hit enter instead of [y/N], and will just list the Potential files to be moved.

This is only a snippet of my GitHub Please visit for the total script.

HINT: Read the script below as I added info per line as to what I had done.

#!/usr/bin/env python3
# =============================================================================
# Created On  : MAC OSX High Sierra 10.13.6 (17G65)
# Created On  : Python 3.7.0
# Created By  : Jeromie Kirchoff
# =============================================================================
"""THE MODULE HAS BEEN BUILD FOR KEEPING YOUR FILES ORGANIZED."""
# =============================================================================
from os import walk
from os import path
from shutil import move
import getpass
import click

mac_username = getpass.getuser()
includes_file_extensn = ([".jpg", ".gif", ".png", ".jpeg", ])
search_dir = path.dirname('/Users/' + mac_username + '/Documents/')
target_foldr = path.dirname('/Users/' + mac_username + '/Pictures/Archive/')
exclude_foldr = set([target_foldr,
                    path.dirname('/Users/' + mac_username +
                                 '/Documents/GitHub/'),
                     path.dirname('/Users/' + mac_username +
                                  '/Documents/Random/'),
                     path.dirname('/Users/' + mac_username +
                                  '/Documents/Stupid_Folder/'),
                     ])

if click.confirm("Would you like to move files?",
                 default=False):
    question_moving = True
else:
    question_moving = False


def organize_files():
    """THE MODULE HAS BEEN BUILD FOR KEEPING YOUR FILES ORGANIZED."""
    # topdown=True required for filtering.
    # "Root" had all info i needed to filter folders not dir...
    for root, dir, files in walk(search_dir, topdown=True):
        for file in files:
            # creating a directory to str and excluding folders that start with
            if (not (str(root) + '/').startswith(tuple(exclude_foldr))):
                # showcase only the file types looking for
                if (file.endswith(tuple(includes_file_extensn))):
                    # using path.normpath as i found an issue with double //
                    # in file paths.
                    filetomove = path.normpath(str(root) + '/' +
                                               str(file))
                    # forward slash required for both to split
                    movingfileto = path.normpath(str(target_foldr) + '/' +
                                                 str(file))
                    # Answering "NO" this only prints the files "TO BE Moved"
                    print('Files To Move: ' + str(filetomove))
                    # This is using the prompt you answered at the beginning
                    if question_moving is True:
                        print('Moving File: ' + str(filetomove) +
                              "\n To:" + str(movingfileto))
                        # This is the command that moves the file
                        move(filetomove, movingfileto)
                        pass

            # The rest is ignoring explicitly and continuing
                    else:
                        pass
                    pass
                else:
                    pass
            else:
                pass


if __name__ == '__main__':
    organize_files()

Example of running my script from terminal:

$ python3 organize_files.py
Exclude list: {'/Users/jkirchoff/Pictures/Archive', '/Users/jkirchoff/Documents/Stupid_Folder', '/Users/jkirchoff/Documents/Random', '/Users/jkirchoff/Documents/GitHub'}
Files found will be moved to this folder:/Users/jkirchoff/Pictures/Archive
Would you like to move files?
No? This will just list the files.
Yes? This will Move your files to the target folder.
[y/N]: 

Example of listing files:

Files To Move: /Users/jkirchoff/Documents/Archive/JayWork/1.custom-award-768x512.jpg
Files To Move: /Users/jkirchoff/Documents/Archive/JayWork/10351458_318162838331056_9023492155204267542_n.jpg
...etc

Example of moving files:

Moving File: /Users/jkirchoff/Documents/Archive/JayWork/1.custom-award-768x512.jpg
To: /Users/jkirchoff/Pictures/Archive/1.custom-award-768x512.jpg
Moving File: /Users/jkirchoff/Documents/Archive/JayWork/10351458_318162838331056_9023492155204267542_n.jpg
To: /Users/jkirchoff/Pictures/Archive/10351458_318162838331056_9023492155204267542_n.jpg
...
Far answered 16/8, 2018 at 7:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.