Flask large file download
Asked Answered
W

3

3

Memory Error occurs when downloading a file from Flask. The size of the file is about 100 megabytes. How can I fix it?

Flask Download Code

return send_from_directory(s_trash_path, s_zip_name, mimetype='zip', as_attachment=True)

Error Code

[2018-07-21 16:11:22,328] ERROR in app: Exception on /ec-fileupload/download/select [POST]
Traceback (most recent call last):
  File "/home/venv_ec_fileupload/lib/python3.6/site-packages/flask/app.py", line 1982, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/venv_ec_fileupload/lib/python3.6/site-packages/flask/app.py", line 1615, in full_dispatch_request
    return self.finalize_request(rv)
  File "/home/venv_ec_fileupload/lib/python3.6/site-packages/flask/app.py", line 1632, in finalize_request
    response = self.process_response(response)
  File "/home/venv_ec_fileupload/lib/python3.6/site-packages/flask/app.py", line 1856, in process_response
    response = handler(response)
  File "./app/__init__.py", line 170, in after_request
    s_data = resp.get_data()
  File "/home/venv_ec_fileupload/lib/python3.6/site-packages/werkzeug/wrappers.py", line 987, in get_data
    rv = b''.join(self.iter_encoded())
MemoryError
Wot answered 21/7, 2018 at 7:17 Comment(0)
W
4

Since your file is large and dynamically generated, I'd suggest you not to use send_from_directory() to send files.

Check out the flask streaming documentation on how to stream files (sending small chunks of data instead of a full file) : http://flask.pocoo.org/docs/1.0/patterns/streaming/

from flask import Response

@app.route('/large.csv')
def generate_large_csv():
    def generate():
        for row in iter_all_rows():
            yield ','.join(row) + '\n'
    return Response(generate(), mimetype='text/csv')

The above code is a snippet for how to stream csv files using flask.

However, if your file is static, then Flask recommends using nginx for deployment.

Wapiti answered 22/7, 2018 at 11:51 Comment(2)
Thank you for your answer. I still have a memory error. It's happening. Is my code wrong? \n def generate(): \n with open(s_path, 'rb') as r: for line in r: yield str(line) response = Response(generate(),mimetype='application/zip') response.headers['Content-Type'] = "application/octet-stream" response.headers['Content-Disposition'] = "inline; filename=" + os.path.basename(s_path) return responseFaenza
This way, the browser will not respond until all iterations are completed, and it will take a longlonglong... time for the browser to respond. And the gateway is likely to time out.Unscramble
P
15

If you serve binary files, you should not iterate through lines since it basically contains only one "line", which means you still load the whole file all at once into the RAM.

The only proper way to read large files is via chunks:

CHUNK_SIZE = 8192
def read_file_chunks(path):
    with open(path, 'rb') as fd:
        while 1:
            buf = fd.read(CHUNK_SIZE)
            if buf:
                yield buf
            else:
                break

Then it's safe to call stream_with_context on this chunk reader, e.g. if you serve video files:

@app.route('/videos/<name>')
def serve_video(name):
    fp = resource_path_for(name)
    if fp.exists():
        return Response(
            stream_with_context(read_file_chunks(fp)),
            headers={
                'Content-Disposition': f'attachment; filename={name}'
            }
        )
    else:
        raise exc.NotFound()

Under the hood, the Flask response procedure takes each chunk (from the generator read_file_chunks(fp)) and flushes it to the connection before loading the next chunk. After flushing, the chunk data is no more referenced and gets cleaned up by the garbage collector, thus there will be no many chunks staying in the RAM at the same time.

Pact answered 27/7, 2019 at 22:29 Comment(0)
W
4

Since your file is large and dynamically generated, I'd suggest you not to use send_from_directory() to send files.

Check out the flask streaming documentation on how to stream files (sending small chunks of data instead of a full file) : http://flask.pocoo.org/docs/1.0/patterns/streaming/

from flask import Response

@app.route('/large.csv')
def generate_large_csv():
    def generate():
        for row in iter_all_rows():
            yield ','.join(row) + '\n'
    return Response(generate(), mimetype='text/csv')

The above code is a snippet for how to stream csv files using flask.

However, if your file is static, then Flask recommends using nginx for deployment.

Wapiti answered 22/7, 2018 at 11:51 Comment(2)
Thank you for your answer. I still have a memory error. It's happening. Is my code wrong? \n def generate(): \n with open(s_path, 'rb') as r: for line in r: yield str(line) response = Response(generate(),mimetype='application/zip') response.headers['Content-Type'] = "application/octet-stream" response.headers['Content-Disposition'] = "inline; filename=" + os.path.basename(s_path) return responseFaenza
This way, the browser will not respond until all iterations are completed, and it will take a longlonglong... time for the browser to respond. And the gateway is likely to time out.Unscramble
M
0

When using Gunicorn, I found a very good solution right here: gunicorn: how to resolve "WORKER TIMEOUT"? The default timeout is something like 30, so every connection is closed after 30 seconds, I just set that value to 3600, so I have 1 hour for every file download. In the command line, you can do that by just adding

--timeout 3600
Melodie answered 3/10 at 13:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.