How to download a large file using FastAPI?

Asked 31/8, 2022 at 2:49 Answered 26/4, 2023 at 18:13

Solved python download fastapi pydantic starlette

I am trying to download a large file (.tar.gz) from FastAPI backend. On server side, I simply validate the filepath, and I then use Starlette.FileResponse to return the whole file—just like what I've seen in many related questions on StackOverflow.

Server side:

return FileResponse(path=file_name, media_type='application/octet-stream', filename=file_name)

After that, I get the following error:

  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 149, in serialize_response
    return jsonable_encoder(response_content)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/encoders.py", line 130, in jsonable_encoder
    return ENCODERS_BY_TYPE[type(obj)](obj)
  File "pydantic/json.py", line 52, in pydantic.json.lambda
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

I also tried using StreamingResponse, but got the same error. Any other ways to do it?

The StreamingResponse in my code:

@x.post("/download")
async def download(file_name=Body(), token: str | None = Header(default=None)):
    file_name = file_name["file_name"]
    # should be something like xx.tar
    def iterfile():
        with open(file_name,"rb") as f:
            yield from f
    return StreamingResponse(iterfile(),media_type='application/octet-stream')

Ok, here is an update to this problem. I found the error did not occur on this api, but the api doing forward request of this.

@("/")
def f():
    req = requests.post(url ="/download")
    return req.content

And here if I returned a StreamingResponse with .tar file, it led to (maybe) encoding problems.

When using requests, remember to set the same media-type. Here is media_type='application/octet-stream'. And it works!

Wilheminawilhide answered 31/8, 2022 at 2:49 Comment(5)

Does this answer your question? How to make a large file accessible to external APIs? – Terse 31/8, 2022 at 3:54

I checked this answer and used StreamingResponse. Since the file type varies, I did not set a specific media_type. The code is just like return StreamingResponse(iterfile()) And I still got error: No json object could be decoded when downloading tar file – Wilheminawilhide 31/8, 2022 at 5:55

Did you try setting media_type='application/octet-stream' for the StreamingResponse to indicate that it's binary data? Do you have the example code that fails? – Wishbone 31/8, 2022 at 7:23

That is just something I put in the data body. The actual name is the abosolute file path ,like /opt/123.tar. I tried with some other files like syslog or json files and they worked. – Wilheminawilhide 1/9, 2022 at 3:15

In yield from f I found this could use a large amount of CPU. How can I solve it? Maybe the reason is that chunk size is small and lead to massive file operation? Can I increase the chunk size here? – Wilheminawilhide 14/9, 2022 at 6:8

If you find yield from f being rather slow when using StreamingResponse with file-like objects, for instance:

from fastapi import FastAPI
from fastapi.responses import StreamingResponse

some_file_path = 'large-video-file.mp4'
app = FastAPI()

@app.get('/')
def main():
    def iterfile():
        with open(some_file_path, mode='rb') as f:
            yield from f

    return StreamingResponse(iterfile(), media_type='video/mp4')

you could instead create a generator where you read the file in chunks using a specified chunk size; hence, speeding up the process. Examples can be found below.

Note that StreamingResponse can take either an async generator or a normal generator/iterator to stream the response body. In case you used the standard open() method that doesn't support async/await, you would have to declare the generator function with normal def. Regardless, FastAPI/Starlette will still work asynchronously, as it will check whether the generator you passed is asynchronous (as shown in the source code), and if is not, it will then run the generator in a separate thread, using iterate_in_threadpool, that is then awaited.

You can set the Content-Disposition header in the response (as described in this answer, as well as here and here) to indicate whether the content is expected to be displayed inline in the browser (if you are streaming, for example, a .mp4 video, .mp3 audio file, etc), or as an attachment that is downloaded and saved locally (using the specified filename).

As for the media_type (also known as MIME type), there are two primary MIME types (see Common MIME types):

text/plain is the default value for textual files. A textual file should be human-readable and must not contain binary data.

application/octet-stream is the default value for all other cases. An unknown file type should use this type.

For a file with .tar extension, as shown in your question, you can also use a different subtype from octet-stream, that is, x-tar. Otherwise, if the file is of unknown type, stick to application/octet-stream. See the linked documentation above for a list of common MIME types.

Option 1 - Using normal generator

from fastapi import FastAPI
from fastapi.responses import StreamingResponse

CHUNK_SIZE = 1024 * 1024  # = 1MB - adjust the chunk size as desired
some_file_path = 'large_file.tar'
app = FastAPI()

@app.get('/')
def main():
    def iterfile():
        with open(some_file_path, 'rb') as f:
            while chunk := f.read(CHUNK_SIZE):
                yield chunk

    headers = {'Content-Disposition': 'attachment; filename="large_file.tar"'}
    return StreamingResponse(iterfile(), headers=headers, media_type='application/x-tar')

Option 2 - Using `async` generator with `aiofiles`

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import aiofiles

CHUNK_SIZE = 1024 * 1024  # = 1MB - adjust the chunk size as desired
some_file_path = 'large_file.tar'
app = FastAPI()

@app.get('/')
async def main():
    async def iterfile():
       async with aiofiles.open(some_file_path, 'rb') as f:
            while chunk := await f.read(CHUNK_SIZE):
                yield chunk

    headers = {'Content-Disposition': 'attachment; filename="large_file.tar"'}
    return StreamingResponse(iterfile(), headers=headers, media_type='application/x-tar')

Terse answered 25/9, 2022 at 8:59 Comment(0)

I would use app.mount("/static", StaticFiles(directory="static"), name="static") to mount a static folder, and put this big file into this folder, so user can have the link to this big file to download directly.

In this way, you don't need code to read the file and feed the file to the user.

Woolworth answered 26/4, 2023 at 18:13 Comment(0)

Option 1 - Using normal generator

Option 2 - Using `async` generator with `aiofiles`

Recommended topics

Hot tags

Option 1 - Using normal generator

Option 2 - Using async generator with aiofiles

Recommended topics

Hot tags

Option 2 - Using `async` generator with `aiofiles`