Making a web interface to a script that takes 30 minutes to execute
Asked Answered
D

6

7

I wrote a python script to process some data from CSV files. The script takes between 3 to 30 minutes to complete, depending on the size of the CSV.

Now I want to put in a web interface to this, so I can upload the CSV data files from anywhere. I wrote a basic HTTP POST upload page and used Python's CGI module - but the script just times out after some time.

The script outputs HTTP headers at the start, and outputs bits of data after iterating over every line of the CSV. As an example, this print statement would trigger every 30 seconds or so.

# at the very top, with the 'import's
print "Content-type: text/html\n\n Processing ... <br />"

# the really long loop.
for currentRecord in csvRecords:
    count = count + 1
    print "On line " + str(count) + " <br />"

I assumed the browser would receive the headers, and wait since it keeps on receiving little bits of data. But what actually seems to happen is it doesn't receive any data at all, and Error 504 times out when given a CSV with lots of lines.

Perhaps there's some caching happening somewhere? From the logs,

[Wed Jan 20 16:59:09 2010] [error] [client ::1] Script timed out before returning headers: datacruncher.py, referer: http://localhost/index.htm
[Wed Jan 20 17:04:09 2010] [warn] [client ::1] Timeout waiting for output from CGI script /Library/WebServer/CGI-Executables/datacruncher.py, referer: http://localhost/index.htm

What's the best way to resolve this, or, is it not appropriate to run such scripts in a browser?

Edit: This is a script for my own use - I normally intend to use it on my computer, but I thought a web-based interface could come in handy while travelling, or for example from a phone. Also, there's really nothing to download - the script will most probably e-mail a report off at the end.

Divot answered 20/1, 2010 at 11:47 Comment(7)
Do you think anyone in this earth patient enough to wait 30 minutes to load a web page in browser, instead of downloading as data?Wonky
This is a script for my own use - I normally intend to use it on my computer, but I thought a web-based interface could come in handy while travelling, or for example from a phone. Also, there's really nothing to download - the script will e-mail a report off at the end.Divot
@Pranab: "This is a script for my own use." Then what problem do you have? Why not just let it run? Why mess around? If it's for you -- and it only sends an email -- it's not even a web page is it? Why not just write a simple Python script?Bove
I wanted a web-facing method accessible from a browser, but the script would time-out in the browser.. until Wim's answer below.Divot
@Pranab: I'll ask again. Why web facing? Why browser? If it's for your own use, why add this complexity?Bove
@S.Lott: To use via a phone, laptop, from anywhere really - and all the other cliched benefits of a 'web app' :-P like avoiding setting up and maintaining a relatively complex script and dependencies thrice. As for the added complexity - well it's a good opportunity to learn about design, and to practice Python :)Divot
@Pranab: "maintaining a relatively complex script and dependencies thrice"? You can't just login to a server with putty and run it simply? I would think that "simple" has more benefits than "web app" with this specific application. Perhaps this is not the best problem to practice web application design with.Bove
E
12

I would separate the work like this:

  1. A web app URL that accept the POSTed CSV file. The web app puts the CSV content into an off line queue, for instance a database table. The web app's response should be an unique ID for the queued item (use an auto-incremented ID column, for instance). The client must store this ID for part 3.

  2. A stand-alone service app that polls the queue for work, and does the processing. Upon completion of the processing, store the results in another database table, using the unique ID as a key.

  3. A web app URL that can get processed results, http://server/getresults/uniqueid/. If the processing is finished (i.e. the unique ID is found in the results database table), then return the results. If not finished, the response should be a code that indicates this. For instance a custom HTTP header, a HTTP status response, response body 'PENDING' or similar.

Enterprising answered 20/1, 2010 at 12:0 Comment(0)
I
5

I've had this situation before and I used cronjobs. The HTTP script would just write in a queue a job to be performed (a DB or a file in a directory) and the cronjob would read it and execute that job.

Ichang answered 20/1, 2010 at 11:52 Comment(3)
+1; user could be notified through email when his job finishes, with a link to download itCello
You can also make the cronjob write its progress periodically in a file and use ajax to read the value from the webserver and display it to the user in his browser.Ichang
If you need the job process to run as a different user, using a cronjob is certainly a good way. Else, it's usually just as easy to just start the processing thread directly from the CGI app.Bisulcate
B
4

You'll probably need to do a stdout.flush(), as the script isn't really writing anything yet to the webserver until you've written a page buffer's worth of data - which doesn't happen before the timeout.

But the proper way to solve this is, as others suggested, to do the processing in a separate thread/process, and show the user an auto-refreshed page which shows the status, with a progress bar or some other fancy visual to keep them from being bored.

Bisulcate answered 20/1, 2010 at 11:59 Comment(1)
Adding sys.stdout.flush() right after a print statement in the loop seems to resolve the issue.Divot
S
2

See Randal Schwartz's Watching long processes through CGI. The article uses Perl, but the technique does not depend on the language.

Senary answered 20/1, 2010 at 12:3 Comment(0)
I
2

Very similar question here. I suggest spawning off the lengthy process and returning an ajax based progress bar to the user. This way they user has the luxury of the web-interface and you have the luxury of no time-outs.

Iverson answered 20/1, 2010 at 12:6 Comment(0)
I
1

imho the best way would be to run an independent script which posts updates somewhere (flat file, database, etc...). I don't know how to fork an independent process from python so i can't give any code examples.

To show progress on a WebSite implement an ajax request to a page that reads those status updates and for example shows a nice progress bar.

Add something like setTimeout("refreshProgressBar[...]) or meta-refresh for auto-refresh.

Ilocano answered 20/1, 2010 at 11:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.