Python shutil copytree: use ignore function to keep specific files types
Asked Answered
S

2

30

I'm trying to figure out how to copy CAD drawings (".dwg", ".dxf) from a source directory with subfolders to a destination directory and maintaining the original directory and subfolders structure.

  • Original Directory: H:\Tanzania...\Bagamoyo_Single_line.dwg
  • Source Directory: H:\CAD\Tanzania...\Bagamoyo_Single_line.dwg

I found the following answer from @martineau within the following post: Python Factory Function

from fnmatch import fnmatch, filter
from os.path import isdir, join
from shutil import copytree

def include_patterns(*patterns):
    """Factory function that can be used with copytree() ignore parameter.

    Arguments define a sequence of glob-style patterns
    that are used to specify what files to NOT ignore.
    Creates and returns a function that determines this for each directory
    in the file hierarchy rooted at the source directory when used with
    shutil.copytree().
    """
    def _ignore_patterns(path, names):
        keep = set(name for pattern in patterns
                            for name in filter(names, pattern))
        ignore = set(name for name in names
                        if name not in keep and not isdir(join(path, name)))
        return ignore
    return _ignore_patterns

# sample usage

copytree(src_directory, dst_directory,
         ignore=include_patterns('*.dwg', '*.dxf'))

Updated: 18:21. The following code works as expected, except that I'd like to ignore folders that don't contain any include_patterns('.dwg', '.dxf')

Southey answered 27/2, 2017 at 13:57 Comment(2)
That code is already demonstrating how to do it. You pass the patterns to include_patterns, and the return is a callback that you pass to copytree. copytree does the work of passing paths and names to the resulting _ignore_patterns function as it traverses the tree.Bloater
Hi @Bloater I now understand how the following works. I need to amend the following only to copy the tree if there is a match based on my include_patterns so that I don't end up with empty directories.Southey
H
58

shutil already contains a function ignore_patterns, so you don't have to provide your own. Straight from the documentation:

from shutil import copytree, ignore_patterns

copytree(source, destination, ignore=ignore_patterns('*.pyc', 'tmp*'))

This will copy everything except .pyc files and files or directories whose name starts with tmp.

It's a bit tricky (and not strictly necessary) to explain what's going on: ignore_patterns returns a function _ignore_patterns as its return value, this function gets stuffed into copytree as a parameter, and copytree calls this function as needed, so you don't have to know or care how to call this function _ignore_patterns. It just means that you can exclude certain unneeded cruft files (like *.pyc) from being copied. The fact that the name of the function _ignore_patterns starts with an underscore is a hint that this function is an implementation detail you may ignore.

copytree expects that the folder destination doesn't exist yet. It is not a problem that this folder and its subfolders come into existence once copytree starts to work, copytree knows how to handle that.

Now include_patterns is written to do the opposite: ignore everything that's not explicitly included. But it works the same way: you just call it, it returns a function under the hood, and coptytree knows what to do with that function:

copytree(source, destination, ignore=include_patterns('*.dwg', '*.dxf'))
Halt answered 27/2, 2017 at 14:41 Comment(8)
Hi @Halt , the following function is to generate a dynamic ignore list based on files that I want to keep i.e. CAD (".dwg", ".dxf"), so all other file types are then ignored. I've got the following working, my last hurdle is to exclude folders that have no files within them based on the include_patterns(".dwg", ".dxf").Southey
Where is the include_patterns method defined?Gainer
@AK47 include_patterns is defined in the OP.Halt
Is there a way to combine both?Parcel
Let's say I have a list (i.e. mylist=[,*.txt, '*.o']). How do I pass this list as an argument to ignore_patterns()? ignore_patterns(mylist) doesn't seem to work here.Dygal
@Dygal Did you find a solution?Depone
I don't remember, check stackoverflow.com/search?q=user:5082463+[python]Dygal
@KcFnMi, you need to unpack the values from the list. use: ignore_patterns(*mylist). i tested it and it worked.Hinds
W
0
import os
import shutil



def has_extension(src: str, ext: str) -> bool:
    _, actual_extension = os.path.splitext(src)
    return ext == actual_extension

def copy_if(src: str, dst: str, *, 
    include_ext_patterns: tuple[str, ...]) -> None:
    if any(has_extension(src, ext) for ext in include_ext_patterns):
        print(f"Copying {src} to {dst}")
        shutil.copy2(src, dst)


def copy_files(
    *, src: str, dst: str, include_ext_patterns: tuple[str, ...]) -> None:
    if not isinstance(include_ext_patterns, tuple):
    raise TypeError(f"include_ext_patterns should be {tuple} not {type(include_ext_patterns)}")
    shutil.copytree(
                str(src),
                str(dst),
                dirs_exist_ok=True,
                copy_function=partial(copy_if, include_ext_patterns=include_ext_patterns)
                )

To copy dir and files from src to destination:

copy_files(src="path/to/src/dir", 
           dst="path/to/dest/dir",
           include_ext_patterns=('.dwg', '.dxf'))
Warrantable answered 7/5 at 15:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.