CSV export in Stream (from Django admin on Heroku)
Asked Answered
S

2

11

we have the need to export a csv-file comprising data from the model from Django admin which runs on Heroku. Therefore we created an action where we created the csv and returned it in the response. This worked fine until our client started exporting huge sets of data and we run into the 30 second timeout of the Web worker.

To circumvent this problem we thought about streaming the csv to the client instead of building it first in memory and sending it in one piece. Trigger was this piece of information:

Cedar supports long-polling and streaming responses. Your app has an initial 30 second window to respond with a single byte back to the client. After each byte sent (either recieved from > the client or sent by your application) you reset a rolling 55 second window. If no data is > sent during the 55 second window your connection will be terminated.

We therefore implemented something that looks like this to test it:

import cStringIO as StringIO
import csv, time

def csv(request):
    csvfile = StringIO.StringIO()
    csvwriter = csv.writer(csvfile)

def read_and_flush():
    csvfile.seek(0)
    data = csvfile.read()
    csvfile.seek(0)
    csvfile.truncate()
    return data

def data():
    for i in xrange(100000):
        csvwriter.writerow([i,"a","b","c"])
        time.sleep(1)
        data = read_and_flush()
        yield data

response = HttpResponse(data(), mimetype="text/csv")
response["Content-Disposition"] = "attachment; filename=test.csv"
return response

The HTTP header of the download looks like this (from FireBug):

HTTP/1.1 200 OK
Cache-Control: max-age=0
Content-Disposition: attachment; filename=jobentity-job2.csv
Content-Type: text/csv
Date: Tue, 27 Nov 2012 13:56:42 GMT
Expires: Tue, 27 Nov 2012 13:56:41 GMT
Last-Modified: Tue, 27 Nov 2012 13:56:41 GMT
Server: gunicorn/0.14.6
Vary: Cookie
Transfer-Encoding: chunked
Connection: keep-alive

"Transfer-encoding: chunked" would indicate that Cedar is actually streaming the content chunkwise we guess.

Problem is that the download of the csv is still interrupted after 30 seconds with these lines in the Heroku log:

2012-11-27T13:00:24+00:00 app[web.1]: DEBUG: exporting tasks in csv-stream for job id: 56, 
2012-11-27T13:00:54+00:00 app[web.1]: 2012-11-27 13:00:54 [2] [CRITICAL] WORKER TIMEOUT (pid:5)
2012-11-27T13:00:54+00:00 heroku[router]: at=info method=POST path=/admin/jobentity/ host=myapp.herokuapp.com fwd= dyno=web.1 queue=0 wait=0ms connect=2ms service=29480ms status=200 bytes=51092
2012-11-27T13:00:54+00:00 app[web.1]: 2012-11-27 13:00:54 [2] [CRITICAL] WORKER TIMEOUT (pid:5)
2012-11-27T13:00:54+00:00 app[web.1]: 2012-11-27 13:00:54 [12] [INFO] Booting worker with pid: 12

This should work conceptually, right? Is there anything we missed?

We really appreciate your help. Tom

Shiah answered 27/11, 2012 at 13:59 Comment(0)
S
10

I found the solution to the problem. It's not a Heroku timeout because otherwise there would be a H12 timeout in the Heroku log (thanks to Caio of Heroku to point that out).

The problem was the default timeout of Gunicorn wich is 30 seconds. After adding --timeout 600 to the Procfile (at the line of Gunicorn) the problem was gone.

The Procfile now looks like this:

web: gunicorn myapp.wsgi -b 0.0.0.0:$PORT --timeout 600
celeryd: python manage.py celeryd -E -B --loglevel=INFO
Shiah answered 30/11, 2012 at 8:39 Comment(0)
M
1

That's rather not the problem of your script, but the problem of 30 seconds web request default Heroku timeout. You could read this: https://devcenter.heroku.com/articles/request-timeout and according to this doc - move your CSV export to background process.

Multivalent answered 27/11, 2012 at 18:10 Comment(4)
But shouldn't the 30 second timeout window be extended because we stream the content instead of waiting until the csv has been created in memory? So there are bytes transmitted in this 30 second window and this should avoid the timeout according to this: Cedar supports HTTP 1.1 features such as long-polling and streaming responses. An application has an initial 30 second window to respond with a single byte back to the client. However, each byte transmitted thereafter resets a rolling 55 second window.Shiah
Could it be that Django has an internal timeout when sending a response?Shiah
You have your web-request working for more than 30 seconds - that's a fact, and Heroku has 30-second default timeout for any web request in it's http server config. I suppose, your tries to emulate keepalive session won't be successful - you'd better consider moving your long file processing to background process/daemon.Multivalent
So strange. Did move the csv-export block to be executed in a celery background task but still the timeout: 2012-11-30T08:04:54+00:00 app[web.1]: 2012-11-30 08:04:54 [2] [CRITICAL] WORKER TIMEOUT (pid:5) 2012-11-30T08:04:54+00:00 heroku[router]: at=info method=POST path=/admin/jobentity/ host=myapp.herokuapp.com fwd=84.39.225.23 dyno=web.1 queue=0 wait=0ms connect=0ms service=29440ms status=200 bytes=410 Heroku Support says the timeout happens in the application itself because it's not a H12 timeout in the router. But I can't find the place where this happens.Shiah

© 2022 - 2024 — McMap. All rights reserved.