I would like to highlight a few points with regards to OP's question and the (currently accepted) answer by @Tim Roberts:
"shutil copies in chunks so you can copy files larger than memory". You can also copy a file in chunks using read()
—please
have a look at the short example below, as well as this and this answer for more
details—just like you can load the whole file into memory
using shutil.copyfileobj()
, by giving a negative length
value.
with open(uploaded_file.filename, 'wb') as f:
while contents := uploaded_file.file.read(1024 * 1024): # adjust the chunk size as desired
f.write(contents)
Under the hood, copyfileob()
uses a very similar approach to the above, utilising read()
and write()
methods of file objects; hence, it would make little difference, if you used one over the other. The source code of copyfileob()
can be seen below. The default buffer size, i.e., COPY_BUFSIZE
below, is set to 1MB
(1024 *1024
bytes), if it is running on Wnidows, or 64KB
(64 * 1024
bytes) on other platforms (see here).
def copyfileobj(fsrc, fdst, length=0):
"""copy data from file-like object fsrc to file-like object fdst"""
if not length:
length = COPY_BUFSIZE
# Localize variable access to minimize overhead.
fsrc_read = fsrc.read
fdst_write = fdst.write
while True:
buf = fsrc_read(length)
if not buf:
break
fdst_write(buf)
"shutil
has routines to copy files by name so you don't have to open them at all..." Since OP seems to be using FastAPI
framework (which is actually
Starlette underneath), UploadFile
exposes an actual Python
SpooledTemporaryFile
(a file-like object) that you can get using the .file
attribute (source code can be found here). When FastAPI/Starlette creates a new instance of UploadFile
, it already creates the SpooledTemporaryFile
behind the scenes, which remains open. Hence, since you are dealing with a temporary
file that has no visible name in the file system—that would otherwise allow you to copy the contents without opening the file using shutil
—and which is already open, it would make no
difference using either read()
or copyfileobj()
.
"it can preserve the permissions, ownership, and creation/modification/access timestamps." Even though this is about saving a file uploaded through a web framework—and hence, most of these metadata wouldn't be transfered along with the file—as per the documentation, the above statement is not entirely true:
Warning: Even the higher-level file copying functions (shutil.copy()
, shutil.copy2()
) cannot copy all file
metadata.
On POSIX platforms, this means that file owner and group are lost
as well as ACLs. On Mac OS, the resource fork and other metadata are
not used. This means that resources will be lost and file type and creator codes will not be correct. On Windows, file
owners,
ACLs and alternate data streams are not copied.
That being said, there is nothing wrong with using copyfileobj()
. On the contrary, if you are dealing with large files and you would like to avoid loading the entire file into memory—as you may not have enough RAM to accommodate all the data—and you would rather use copyfileobj()
instead of a similar solution using read()
method (as described in point 1 above), it is perfectly fine to use shutil.copyfileobj(fsrc, fdst)
. Besides, copyfileobj()
has been offered (since Python 3.8) as an alternative platform-dependent efficient copy operation. You can change the default buffer size through adjusting the length
argument in copyfileobj()
.
Note
If copyfileobj()
is used inside a FastAPI def
(sync) endpoint, it is perfectly fine, as a normal def
endpoint in FastAPI is run in an external threadpool that is then awaited, instead of being called directly (as it would block the server). On the other hand, async def
endpoints run on the main (single) thread, and thus, calling such a method, i.e., copyfileobj()
, that performs blocking I/O operations (as shown in the source code) would result in blocking the entire server (for more information on def
vs async def
, please have a look at this answer). Hence, if you are about to call copyfileobj()
from within an async def
endpoint, you should make sure to run this operation—as well as all other file operations, such as open()
and close()
—in a separate thread to ensure that the main thread (where coroutines are run) does not get blocked. You can do that using Starlette's run_in_threadpool()
, which is also used by FastAPI internally when you call the async
methods of the UploadFile
object, as shown here. For instance:
await run_in_threadpool(shutil.copyfileobj, fsrc, fdst)
For more details and code examples, please have a look at this answer.