How to Copy Files Fast
Asked Answered
H

4

31

What is the fastest way to copy files in a python program?

It takes at least 3 times longer to copy files with shutil.copyfile() versus to a regular right-click-copy > right-click-paste using Windows File Explorer or Mac's Finder. Is there any faster alternative to shutil.copyfile() in Python? What could be done to speed up a file copying process? (The files destination is on the network drive... if it makes any difference...).

EDITED LATER:

Here is what I have ended up with:

def copyWithSubprocess(cmd):        
    proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

win=mac=False
if sys.platform.startswith("darwin"):mac=True
elif sys.platform.startswith("win"):win=True

cmd=None
if mac: cmd=['cp', source, dest]
elif win: cmd=['xcopy', source, dest, '/K/O/X']

if cmd: copyWithSubprocess(cmd)
Hyaloplasm answered 27/2, 2014 at 19:37 Comment(6)
You can use the native command line options like cp for Linux & Mac and COPY for Windows. They should be as fast as when you use the GUI.Fernferna
On Windows SHFileOperation gives you the native shell file copySestertium
Depending on some factors not stated in the question it could be beneficial to pack the files into a compressed archive before transmission... Have you considered using something like rsync?Phantasy
If you are concerned with ownership and ACL don't use shutil for that reason alone: 'On Windows, file owners, ACLs and alternate data streams are not copied. 'Panzer
If I use the native operation system's commands (such as OSX cp) should I be then using subprocess? Is there any Python module to call cp directly without a need for a subprocess on Mac?Hyaloplasm
It's worth noting that in Python 3.8 functions that copy files and directories have been optimized to work faster on several major OS.Televise
B
20

The fastest version w/o overoptimizing the code I've got with the following code:

class CTError(Exception):
    def __init__(self, errors):
        self.errors = errors

try:
    O_BINARY = os.O_BINARY
except:
    O_BINARY = 0
READ_FLAGS = os.O_RDONLY | O_BINARY
WRITE_FLAGS = os.O_WRONLY | os.O_CREAT | os.O_TRUNC | O_BINARY
BUFFER_SIZE = 128*1024

def copyfile(src, dst):
    try:
        fin = os.open(src, READ_FLAGS)
        stat = os.fstat(fin)
        fout = os.open(dst, WRITE_FLAGS, stat.st_mode)
        for x in iter(lambda: os.read(fin, BUFFER_SIZE), ""):
            os.write(fout, x)
    finally:
        try: os.close(fin)
        except: pass
        try: os.close(fout)
        except: pass

def copytree(src, dst, symlinks=False, ignore=[]):
    names = os.listdir(src)

    if not os.path.exists(dst):
        os.makedirs(dst)
    errors = []
    for name in names:
        if name in ignore:
            continue
        srcname = os.path.join(src, name)
        dstname = os.path.join(dst, name)
        try:
            if symlinks and os.path.islink(srcname):
                linkto = os.readlink(srcname)
                os.symlink(linkto, dstname)
            elif os.path.isdir(srcname):
                copytree(srcname, dstname, symlinks, ignore)
            else:
                copyfile(srcname, dstname)
            # XXX What about devices, sockets etc.?
        except (IOError, os.error), why:
            errors.append((srcname, dstname, str(why)))
        except CTError, err:
            errors.extend(err.errors)
    if errors:
        raise CTError(errors)

This code runs a little bit slower than native linux "cp -rf".

Comparing to shutil the gain for the local storage to tmfps is around 2x-3x and around than 6x for NFS to local storage.

After profiling I've noticed that shutil.copy does lots of fstat syscals which are pretty heavyweight. If one want to optimize further I would suggest to do a single fstat for src and reuse the values. Honestly I didn't go further as I got almost the same figures as native linux copy tool and optimizing for several hundrends of milliseconds wasn't my goal.

Bugg answered 24/1, 2015 at 20:5 Comment(13)
Not sure if this is specific to later versions of python (3.5+), but the sentinel in iter needs to be b'' in order to stop. (At least on OSX)Olvan
doesn't work with python 3.6 even after fixing 'except' syntaxSiderite
This is a python2 version. I haven't tested this with python3. Probably python3 native filetree copy is fast enough, one shall do a benchmark.Bugg
This is way faster than anything else I tried.Partisan
FYI I would suggest adding shutil.copystat(src, dst) after the file is written out to bring over metadata.Marymarya
@Bugg Just tested in py3.6, still 3x faster than shutil.copy2()!Marymarya
@Marymarya What did you changed in the code to make it work with python 3.5+ ?, Best regards!Compound
@Compound From what I remember I just changed this one line to the following: for x in iter(lambda: os.read(fin, BUFFER_SIZE), b""): (made the string into bytes by adding 'b')Marymarya
ah, because I get also a syntax error with the 'why' in the except line 'except (IOError, os.error), why:'Compound
To boost performance also see the pyfastcopy module which uses the system call sendfile(). This module works for Python 2 and 3. All you have to do is literally "import pyfastcopy" and shutils will automatically perform better. Like @Televise mentioned above, Python 3.8 will have sendfile() built into its impl.Noreen
I've tried to replace shutil.copyfile by the copyfile from the example under python 3.8.0, but it seems hanged in the for x in iter loop.Harwell
@VahidPazirandeh pyfastcopy does not work on Windows: The sendfile system call does not exist on Windows, so importing this module will have no effect.Harwell
For those trying to get the best performance, buffer size is VERY important zabkat.com/blog/buffered-disk-access.htmMarymarya
P
6

You could simply just use the OS you are doing the copy on, for Windows:

from subprocess import call
call(["xcopy", "c:\\file.txt", "n:\\folder\\", "/K/O/X"])

/K - Copies attributes. Typically, Xcopy resets read-only attributes
/O - Copies file ownership and ACL information.
/X - Copies file audit settings (implies /O).

Panzer answered 27/2, 2014 at 20:5 Comment(5)
Would "xcopy" on Windows work with a "regular" subprocess, such as: cmd = ['xcopy', source, dest, "/K/O/X"] subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)Hyaloplasm
That will work as well.Panzer
Invalid number of parameters errorArabesque
Note that the /O and /X flags require an elevated subprocess, else you will result in "Access denied"Marriott
This is a very fast option for single-file copies, but for anyone out there trying to thread large numbers of files it can run far slower (9x in a recent test of 4000 files). I was able to get better results modifying copy2 buffer size as in other answer.Marymarya
H
3
import sys
import subprocess

def copyWithSubprocess(cmd):        
    proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

cmd=None
if sys.platform.startswith("darwin"): cmd=['cp', source, dest]
elif sys.platform.startswith("win"): cmd=['xcopy', source, dest, '/K/O/X']

if cmd: copyWithSubprocess(cmd)
Hyaloplasm answered 27/2, 2014 at 21:42 Comment(1)
Economic on explanations, but this is a great answer.Fash
M
0

this is just a guess but ... your timing it wrong ... that is when you copy the file it opens the file and reads it all into memory so that when you paste you only create a file and dump your memory contents

in python

copied_file = open("some_file").read()

is the equivelent of the ctrl + c copy

then

with open("new_file","wb") as f:
     f.write(copied_file)

is the equivelent of the ctrl + v paste (so time that for equivelency ....)

if you want it to be more scalable to larger data (but its not going to be as fast as ctrl+v /ctrl+c

with open(infile,"rb") as fin,open(outfile,"wb") as fout:
     fout.writelines(iter(fin.readline,''))
Mariejeanne answered 27/2, 2014 at 19:54 Comment(8)
I believe you would be a good instructor, nice!Dilation
Good point. I should be more specific. Instead of right-click-copy and then paste: This schema: 1. Select files; 2. Drag files. 3 Drop files onto a destination folder.Hyaloplasm
thats a move then ... which is much different ... try shutil.move insteadMariejeanne
In win it could be a move. In osx it could be a copy.Hyaloplasm
This solution does not scale. As the files get large this becomes a less usable solution. You would need to make multiple system calls to the OS to still read portions of the file into memory as files get large.Wende
this was a question about timings ... and it was basically pointing out his operations were not equivelentMariejeanne
I find it hard to believe that if you CTRL + C a 100 gigabyte file in Windows it attempts to load that in memory right then and there...Budde
theres more than a grain of truth to that im sure ... perhaps i oversimplified itMariejeanne

© 2022 - 2024 — McMap. All rights reserved.