Get progress back from shutil file copy thread
Asked Answered
B

6

25

I've got an application from which a file is copied from src to dst:

import shutil
from threading import Thread

t = Thread(target=shutil.copy, args=[ src, dst ]).start()

I wish to have the application query the progress of the copy every 5 seconds without locking up the application itself. Is this possible?

My intention is to set this progress to a QtGui.QLabel to give the user feedback on the file copy.

Can this be achieved when copying using a threaded shutil file copy?

Biff answered 30/4, 2015 at 12:13 Comment(2)
Hey I think I found your final solution: fredrikaverpil.github.io/2015/05/12/…. Good stuff!Pease
Link above is dead. Here is the new updated link: fredrikaverpil.github.io/blog/2015/05/12/…Aiguille
B
38

shutil.copy() doesn't offer any options to track the progress, no. At most you could monitor the size of the destination file (using os.* functions on the target filename).

The alternative would be to implement your own copy function. The implementation is really quite simple; shutil.copy() is basically a shutil.copyfile() plus shutil.copymode() call; shutil.copyfile() in turn delegates the real work to shutil.copyfileobj()* (links to the Python 3.8.2 source code).

Implementing your own shutil.copyfileobj() to include progress should be trivial; inject support for a callback function to report inform your program each time another block has copied:

import os
import shutil

def copyfileobj(fsrc, fdst, callback, length=0):
    try:
        # check for optimisation opportunity
        if "b" in fsrc.mode and "b" in fdst.mode and fsrc.readinto:
            return _copyfileobj_readinto(fsrc, fdst, callback, length)
    except AttributeError:
        # one or both file objects do not support a .mode or .readinto attribute
        pass

    if not length:
        length = shutil.COPY_BUFSIZE

    fsrc_read = fsrc.read
    fdst_write = fdst.write

    copied = 0
    while True:
        buf = fsrc_read(length)
        if not buf:
            break
        fdst_write(buf)
        copied += len(buf)
        callback(copied)

# differs from shutil.COPY_BUFSIZE on platforms != Windows
READINTO_BUFSIZE = 1024 * 1024

def _copyfileobj_readinto(fsrc, fdst, callback, length=0):
    """readinto()/memoryview() based variant of copyfileobj().
    *fsrc* must support readinto() method and both files must be
    open in binary mode.
    """
    fsrc_readinto = fsrc.readinto
    fdst_write = fdst.write

    if not length:
        try:
            file_size = os.stat(fsrc.fileno()).st_size
        except OSError:
            file_size = READINTO_BUFSIZE
        length = min(file_size, READINTO_BUFSIZE)

    copied = 0
    with memoryview(bytearray(length)) as mv:
        while True:
            n = fsrc_readinto(mv)
            if not n:
                break
            elif n < length:
                with mv[:n] as smv:
                    fdst.write(smv)
            else:
                fdst_write(mv)
            copied += n
            callback(copied)

and then, in the callback, compare the copied size with the file size.

Note that in the above implementation we look for the opportunity to use a different method for binary files, where you can use fileobj.readinto() and a memoryview object to avoid redundant data copying; see the original _copyfileobj_readinto() implementation for comparison.


* footnote to … delegates the real work to shutil.copyfileobj(): As of Python 3.8, on OS X and Linux the copyfile() implementation delegates file copying to OS-specific, optimised system calls (to fcopyfile() and sendfile(), respectively) but these calls have no hooks whatsoever to track progress, and so if you need to track progress you'd want to disable these delegation paths anyway. On Windows the code uses the aforementioned _copyfileobj_readinto() function.

Bijugate answered 30/4, 2015 at 12:24 Comment(8)
Thanks for this! I implemented this with the other necessary functions from shutil and a progress bar function to create a complete solution which I posted as another answer.Pease
Note this is no longer quite as relevant in Python 3.8 and above due to changes in shutilDispatch
@MattM: okay, all updated, including the Windows-specific optimisation made generic for OS X and Linux (as you can't track process via the system calls).Bijugate
How does this compare to using the ignore= option of copytree, as mentioned here?Lunation
@Jay: that hook is called per copied file in the tree. The hook I introduced is called per block.Bijugate
@MartijnPieters So for best copy speeds on Python 3.8, the per-file hook is preferred, whereas for better info, this one is more accurate?Lunation
@Jay: depending on the platform, using the code in my answer to track per-block copying progress could well be slower, because you then can't use the OS-level APIs, yes.Bijugate
Recently I attempted to use shutil.filecopy on a windows platform and was disgusted to find that the shutil buffer is too small to keep up with the copy rates of xcopy/robocopy and to make matters worse, all three copy routines (xcopy/robocopy/shutil.copyfile) on windows begin by creating zero padded files matching the size of the source file which makes it hard to even monitor their progress from another thread/process. I ended up just writing my own file copy routine with dynamically calculated block sizes to balance optimal copy speed with user progress updates.Necrosis
P
13

I combined Martijn Pieters answer with some progress bar code from this answer with modifications to work in PyCharm from this answer which gives me the following. The function copy_with_progress was my goal.

import os
import shutil


def progress_percentage(perc, width=None):
    # This will only work for python 3.3+ due to use of
    # os.get_terminal_size the print function etc.

    FULL_BLOCK = '█'
    # this is a gradient of incompleteness
    INCOMPLETE_BLOCK_GRAD = ['░', '▒', '▓']

    assert(isinstance(perc, float))
    assert(0. <= perc <= 100.)
    # if width unset use full terminal
    if width is None:
        width = os.get_terminal_size().columns
    # progress bar is block_widget separator perc_widget : ####### 30%
    max_perc_widget = '[100.00%]' # 100% is max
    separator = ' '
    blocks_widget_width = width - len(separator) - len(max_perc_widget)
    assert(blocks_widget_width >= 10) # not very meaningful if not
    perc_per_block = 100.0/blocks_widget_width
    # epsilon is the sensitivity of rendering a gradient block
    epsilon = 1e-6
    # number of blocks that should be represented as complete
    full_blocks = int((perc + epsilon)/perc_per_block)
    # the rest are "incomplete"
    empty_blocks = blocks_widget_width - full_blocks

    # build blocks widget
    blocks_widget = ([FULL_BLOCK] * full_blocks)
    blocks_widget.extend([INCOMPLETE_BLOCK_GRAD[0]] * empty_blocks)
    # marginal case - remainder due to how granular our blocks are
    remainder = perc - full_blocks*perc_per_block
    # epsilon needed for rounding errors (check would be != 0.)
    # based on reminder modify first empty block shading
    # depending on remainder
    if remainder > epsilon:
        grad_index = int((len(INCOMPLETE_BLOCK_GRAD) * remainder)/perc_per_block)
        blocks_widget[full_blocks] = INCOMPLETE_BLOCK_GRAD[grad_index]

    # build perc widget
    str_perc = '%.2f' % perc
    # -1 because the percentage sign is not included
    perc_widget = '[%s%%]' % str_perc.ljust(len(max_perc_widget) - 3)

    # form progressbar
    progress_bar = '%s%s%s' % (''.join(blocks_widget), separator, perc_widget)
    # return progressbar as string
    return ''.join(progress_bar)


def copy_progress(copied, total):
    print('\r' + progress_percentage(100*copied/total, width=30), end='')


def copyfile(src, dst, *, follow_symlinks=True):
    """Copy data from src to dst.

    If follow_symlinks is not set and src is a symbolic link, a new
    symlink will be created instead of copying the file it points to.

    """
    if shutil._samefile(src, dst):
        raise shutil.SameFileError("{!r} and {!r} are the same file".format(src, dst))

    for fn in [src, dst]:
        try:
            st = os.stat(fn)
        except OSError:
            # File most likely does not exist
            pass
        else:
            # XXX What about other special files? (sockets, devices...)
            if shutil.stat.S_ISFIFO(st.st_mode):
                raise shutil.SpecialFileError("`%s` is a named pipe" % fn)

    if not follow_symlinks and os.path.islink(src):
        os.symlink(os.readlink(src), dst)
    else:
        size = os.stat(src).st_size
        with open(src, 'rb') as fsrc:
            with open(dst, 'wb') as fdst:
                copyfileobj(fsrc, fdst, callback=copy_progress, total=size)
    return dst


def copyfileobj(fsrc, fdst, callback, total, length=16*1024):
    copied = 0
    while True:
        buf = fsrc.read(length)
        if not buf:
            break
        fdst.write(buf)
        copied += len(buf)
        callback(copied, total=total)


def copy_with_progress(src, dst, *, follow_symlinks=True):
    if os.path.isdir(dst):
        dst = os.path.join(dst, os.path.basename(src))
    copyfile(src, dst, follow_symlinks=follow_symlinks)
    shutil.copymode(src, dst)
    return dst
Pease answered 25/1, 2018 at 19:15 Comment(0)
A
6

This might be a bit hacky but it works:

"""
Copying a file and checking its progress while it's copying.
"""

import os
import shutil
import threading
import time

des = r'<PATH/TO/SPURCE/FILE>'
src = r'<PATH/TO/DESTINATION/FILE>'


def checker(source_path, destination_path):
    """
    Compare 2 files till they're the same and print the progress.

    :type source_path: str
    :param source_path: path to the source file
    :type destination_path: str
    :param destination_path: path to the destination file
    """

    # Making sure the destination path exists
    while not os.path.exists(destination_path):
        print "not exists"
        time.sleep(.01)

    # Keep checking the file size till it's the same as source file
    while os.path.getsize(source_path) != os.path.getsize(destination_path):
        print "percentage", int((float(os.path.getsize(destination_path))/float(os.path.getsize(source_path))) * 100)
        time.sleep(.01)

    print "percentage", 100


def copying_file(source_path, destination_path):
    """
    Copying a file

    :type source_path: str
    :param source_path: path to the file that needs to be copied
    :type destination_path: str
    :param destination_path: path to where the file is going to be copied
    :rtype: bool
    :return: True if the file copied successfully, False otherwise
    """
    print "Copying...."
    shutil.copyfile(source_path, destination_path)

    if os.path.exists(destination_path):
        print "Done...."
        return True

    print "Filed..."
    return False


t = threading.Thread(name='copying', target=copying_file, args=(src, des))
# Start the copying on a separate thread
t.start()
# Checking the status of destination file on a separate thread
b = threading.Thread(name='checking', target=checker, args=(src, des))
b.start()
Amaranthine answered 6/4, 2018 at 3:31 Comment(1)
Thanks man . I was waiting for this code for a long time not just in this project . There was a lot of projects in which i was needed to use this code . Thank you soo muchNightlong
C
0

No, it can't be done this way, because shutil.copy doesn't have any means of providing progress.

But you can write your own copy function (or even fork the code from shutil--notice that it's one of the modules that includes a link to the source at the top, meaning it's meant to be as useful for sample code as for just using as-is). Your function can, e.g., take a progress callback function as an extra argument and calls it after each buffer (or each N buffers, or each N bytes, or each N seconds). Something like:

def copy(src, dst, progress):
    # ...
    for something:
        progress(bytes_so_far, bytes_total)
        # ...
    progress(bytes_total, bytes_total)

Now, that callback is still going to be called in the background thread, not the main thread. With most GUI frameworks, that means it can't directly touch any GUI widgets. But most GUI frameworks have a way to post a message to the main thread's event loop from a background thread, so just make the callback do that. With Qt you do this with signals and slots, exactly the same way you do within the main thread; there's lots of great tutorials out there if you don't know how.

Alternatively, you could do it the way you suggested: have the main thread signal the background thread (e.g., by posting on a queue.Queue or triggering an Event or Condition) and have your copy function check for that signal every time through the loop and respond. But that seems both more complicated and less responsive.

One more thing: Qt has its own threading library, and you may want to use it instead of Python's native one, because you can attach a slot directly to QThread object and make that your callback. I'm not sure, but Qt might even have its own file-copy-with-progress methods in there somewhere; they try to wrap up everything that might be at all different between platforms and vaguely related to GUIs.

Critic answered 30/4, 2015 at 12:20 Comment(0)
S
0

In addition to Martijn Pieters excellent reply, if (like me, I'm an idiot) you need to figure out how to pass the actual callback into the copyfileobj() function, you can do it like this:

def myscopefunction():
    ### Inside wherever you want to call the copyfileobj() function, you can
    ### make a nested function like so:
    def progress(bytescopied):
        updateui(bytescopied) #update your progress bar or whatever

    #and then call it like this
    copyfileobj(source,destination,progress)
    ...
Spinet answered 24/8, 2017 at 15:36 Comment(0)
F
0

I also faced the same task. Looked for a simple solution and finally made my own. However I think that someone has found it already.

It grounds on reading file by bytes with chunks and writing them into destination file. No external libraries, no threading, no timings. Thanks to this answer: https://mcmap.net/q/86981/-lazy-method-for-reading-big-file-in-python

Here is a code:

import os 
import time
from shutil import copyfile 

file_in = r'C:\folder\file.mp4'
file_out = r'C:\folder\file_out.mp4'

def read_in_chunks(file_object, chunk_size=None):
    while True:
        data = file_object.read(chunk_size)
        if not data:
            break
        yield data

# chunks method 
chunks_number = 100
start_time = time.time()
file_stats = os.stat(file_in) 
size_b = file_stats.st_size
chunk_optimal_size = int(size_b/chunks_number)
with open(file_in, 'rb') as f:
    for piece in read_in_chunks(f, chunk_optimal_size):
        with open(file_out, 'ab') as fout:
            fout.write(piece)
print("--- {} seconds ---".format(time.time() - start_time))

# standard copyfile
start_time = time.time()
copyfile(file_in, file_out)
print("--- {} seconds ---".format(time.time() - start_time))

As you can see, I did a comparison with standard shutil method. Here are the results for 1 GB file copying:

--- 0.5481183528900146 seconds ---
--- 0.45009660720825195 seconds ---  

I need that for having a responsive QProgressBar in my PyQT5 widget, so in my case a difference in ~0.1 ms is not too much.

I'm still not 100% sure that it will work for all kind of files, but in my case (images, archives, documents) everything is ok.

Full answered 12/6, 2023 at 12:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.