How to copy directory recursively in python and overwrite all?
Asked Answered
U

6

112

I'm trying to copy /home/myUser/dir1/ and all its contents (and their contents, etc.) to /home/myuser/dir2/ in python. Furthermore, I want the copy to overwrite everything in dir2/.

It looks like distutils.dir_util.copy_tree might be the right tool for the job, but not sure if there's anything easier/more obvious to use for such a simple task.

If it is the right tool, how do I use it? According to the docs there are 8 parameters that it takes. Do I have to pass all 8 are just src, dst and update, and if so, how (I'm brand new to Python).

If there's something out there that's better, can someone give me an example and point me in the right direction?

Unscientific answered 2/10, 2012 at 2:37 Comment(5)
os.system("cp -rf /src/dir /dest/dir") would be pretty easy...Carisacarissa
Thanks @JoranBeasley (+1) - however, according to cp's docs, the -f arg ("force"): if an existing destination file cannot be opened, remove it and try again... this doesn't seem to be the same as "overwrite all". Can you confirm it is the same and that whatever dir1's contents are all get (recrusively) copied to dir2's subtree? Thanks again!Unscientific
try it ... it should work fine :) ive never had a problem with it...Carisacarissa
@4herpsand7derpsago: cp overwrites files by default. There's a switch that prevents it from overwriting files, but not the other way around.Steffen
@JoranBeasley Is that method platform independent?Carapace
E
83

Notice:

distutils has been deprecated and will be removed in Python 3.12. Consider looking for other answers at this question if you are looking for a post-3.12 solution.


Original answer:

You can use distutils.dir_util.copy_tree. It works just fine and you don't have to pass every argument, only src and dst are mandatory.

However in your case you can't use a similar tool likeshutil.copytree because it behaves differently: as the destination directory must not exist this function can't be used for overwriting its contents.

If you want to use the cp tool as suggested in the question comments beware that using the subprocess module is currently the recommended way for spawning new processes as you can see in the documentation of the os.system function.

Eugeneeugenia answered 2/10, 2012 at 8:4 Comment(9)
Thanks! Just a note, I needed two imports to get this to work. See #18909441Overbalance
This module some times has conflict with git files. For example, it cannot copy .git/COMMIT_EDITMSG files. Recommend shutil.copytree instead.Interglacial
What is the logical/philosophical difference between shutil and distutils.dir_util? Is one preferable to the other in general?Anurous
distutils.dir_util.copy_tree doesn't overwrites exiting filesIntraatomic
Found the solution, you need to set the preserve_mode=0, otherwise, it doesn't overwrite anything.Intraatomic
Also, beware that if you call this method twice with the same arguments it will fail if you cleaned the target directory: bugs.python.org/issue22132Peipus
Isn't distutils deprecated, or is that only in favour of setuptools in setup.py?Sputum
distutils.dir_util.copy_tree() works wonders if you want to avoid shutil.copytree's "cannot copy when file exists' issue.Loretaloretta
As the documentation suggests, shutil.copytree now has dirs_exist_ok option. See docs.python.org/3.8/library/shutil.html#shutil.copytree.Cree
K
67

Have a look at the shutil package, especially rmtree and copytree. You can check if a file / path exists with os.paths.exists(<path>).

import shutil
import os

def copy_and_overwrite(from_path, to_path):
    if os.path.exists(to_path):
        shutil.rmtree(to_path)
    shutil.copytree(from_path, to_path)

Vincent was right about copytree not working, if dirs already exist. So distutils is the nicer version. Below is a fixed version of shutil.copytree. It's basically copied 1-1, except the first os.makedirs() put behind an if-else-construct:

import os
from shutil import *
def copytree(src, dst, symlinks=False, ignore=None):
    names = os.listdir(src)
    if ignore is not None:
        ignored_names = ignore(src, names)
    else:
        ignored_names = set()

    if not os.path.isdir(dst): # This one line does the trick
        os.makedirs(dst)
    errors = []
    for name in names:
        if name in ignored_names:
            continue
        srcname = os.path.join(src, name)
        dstname = os.path.join(dst, name)
        try:
            if symlinks and os.path.islink(srcname):
                linkto = os.readlink(srcname)
                os.symlink(linkto, dstname)
            elif os.path.isdir(srcname):
                copytree(srcname, dstname, symlinks, ignore)
            else:
                # Will raise a SpecialFileError for unsupported file types
                copy2(srcname, dstname)
        # catch the Error from the recursive copytree so that we can
        # continue with other files
        except Error, err:
            errors.extend(err.args[0])
        except EnvironmentError, why:
            errors.append((srcname, dstname, str(why)))
    try:
        copystat(src, dst)
    except OSError, why:
        if WindowsError is not None and isinstance(why, WindowsError):
            # Copying file access times may fail on Windows
            pass
        else:
            errors.extend((src, dst, str(why)))
    if errors:
        raise Error, errors
Klystron answered 2/10, 2012 at 9:2 Comment(7)
Your example doesn't overwrite anything, simply replace one dir with another. Replacing content and overwriting content are different things as far as I know. The question specifically ask for overwriting.Eugeneeugenia
Well, I guess that depends on your definition of overwrite. If you want to have non-duplicate files in the target folder preserved or not. This version is not preserving anything, true.Klystron
I just had a look into the code of copytree. So if you just want this plain overwrite as mentioned by @Vincent, you can simply use shutil.copytree() and you're done. Present files are automatically overwritten.Klystron
shutil.copytree shouldn't work; if it does, that's a bug, because its docs say "The destination directory, named by dst, must not already exist".Eatage
how to use ignore param, can you share an example?Commorant
This won't merge contents.Biotechnology
@Biotechnology What WILL merge contents?Planography
S
60

In Python 3.8 the dirs_exist_ok keyword argument was added to shutil.copytree():

dirs_exist_ok dictates whether to raise an exception in case dst or any missing parent directory already exists.

So, the following will work in recent versions of Python, even if the destination directory already exists:

shutil.copytree(src, dest, dirs_exist_ok=True)  # 3.8+ only!

One major benefit is that it's more flexible than distutils.dir_util.copy_tree(), as it takes additional arguments on files to ignore, etc. (see the documentation). On top of that, the accepted PEP 632 also states that distutils will be deprecated and subsequently removed in future versions of Python 3.

Schreibman answered 13/10, 2020 at 17:15 Comment(0)
T
39

Here's a simple solution to recursively overwrite a destination with a source, creating any necessary directories as it goes. This does not handle symlinks, but it would be a simple extension (see answer by @Michael above).

def recursive_overwrite(src, dest, ignore=None):
    if os.path.isdir(src):
        if not os.path.isdir(dest):
            os.makedirs(dest)
        files = os.listdir(src)
        if ignore is not None:
            ignored = ignore(src, files)
        else:
            ignored = set()
        for f in files:
            if f not in ignored:
                recursive_overwrite(os.path.join(src, f), 
                                    os.path.join(dest, f), 
                                    ignore)
    else:
        shutil.copyfile(src, dest)
Tonettetoney answered 5/4, 2013 at 0:54 Comment(7)
Thanks this worked for me. I like the fact that it doesn't remove existing files. I had to add a little piece to make sure that the destination directory exists for files in the else: part.Zawde
shouldn't it be else: shutil.copyfile(src, dest, ignore)?Junker
This was the only function I found working when I have to do repetitive copying and deleting.Immitigable
@JackJames No it shouldn't since copyfile() is only called for a single file. When using ignore then you pass a folder to src (since otherwise if you would ignore the file you are pass to src you would not have to call the function at all). When passing a folder, copyfile() will be never called for ignored files.Funeral
I replaced shutil.copyfile(src, dest) with try: copy2( src, dest ) except SameFileError: src.replace( dest ) to copyover the file meta data too.Seventh
Can you explain what is if ignore is not None: ignored = ignore(src, files)? Is ignore a function? But it has not been declared yet...Seventh
@SunBear The ignore parameter has exact same meaning as shutil.copytree()'s ignore parameter. The caller of recursive_overwrite provides an ignore callable, which will receive two parameters on callback: (1) a directory string, (2) all entries(file+dir) from that directory.Johann
D
1

My simple answer.

def get_files_tree(src="src_path"):
    req_files = []
    for r, d, files in os.walk(src):
        for file in files:
            src_file = os.path.join(r, file)
            src_file = src_file.replace('\\', '/')
            if src_file.endswith('.db'):
                continue
            req_files.append(src_file)

    return req_files
def copy_tree_force(src_path="",dest_path=""):
    """
    make sure that all the paths has correct slash characters.
    """
    for cf in get_files_tree(src=src_path):
        df= cf.replace(src_path, dest_path)
        if not os.path.exists(os.path.dirname(df)):
            os.makedirs(os.path.dirname(df))
        shutil.copy2(cf, df)
Doersten answered 29/11, 2020 at 19:22 Comment(0)
F
1

What about SYNC.

How to synchronize two folders using python script

from dirsync import sync
source_path = '/Give/Source/Folder/Here'
target_path = '/Give/Target/Folder/Here'

sync(source_path, target_path, 'sync') #for syncing one way
sync(target_path, source_path, 'sync') #for syncing the opposite way

Frankenstein answered 12/7, 2023 at 20:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.