Tornado blocking asynchronous requests
Asked Answered
J

2

14

Using Tornado, I have a Get request that takes a long time as it makes many requests to another web service and processes the data, could take minutes to fully complete. I don't want this to block the entire web server from responding to other requests, which it currently does.

As I understand it, Tornado is single threaded and executes each request synchronously, even though it handles them asynchronously (still confused on that bit). There are parts of the long process that could be pause points to allow the server to handle other requests (possible solution?). I'm running it on Heroku with a single worker, so not sure how that translates into spawning a new thread or multiprocessing, which I have no experience in with python.

Here is what I'm trying to do: the client makes the get call to start the process, then I loop through another get call every 5 seconds to check the status and update the page with new information (long polling would also work but running into the same issue). Problem is that starting the long process blocks all new get requests (or new long polling sessions) until it completes.

Is there an easy way to kick off this long get call and not have it block the entire web server in the process? Is there anything I can put in the code to say.. "pause, go handle pending requests then continue on"?

I need to initiate a get request on ProcessHandler. I then need to continue to be able to query StatusHandler while ProcessHandler is running.

Example:

class StatusHandler(tornado.web.RequestHandler):
    @tornado.web.asynchronous
    def get(self):
       self.render("status.html")

class ProcessHandler(tornado.web.RequestHandler):
    @tornado.web.asynchronous
    def get(self):
       self.updateStatus("0")
       result1 = self.function1()
       self.updateStatus("1")
       result2 = self.function2(result1)
       self.updateStatus("2")
       result3 = self.function3(result2)
       self.updateStatus("3")
       self.finish()
Johannajohannah answered 24/10, 2012 at 14:43 Comment(4)
Have you tried the tornado.gen module? tornadoweb.org/documentation/gen.htmlEnduring
did you remember to annotate it as an asynchronous call: add: @asynchronous on your GET methodsTelevise
andyboot yes, I have @asynchronous on my GET methodsJohannajohannah
Don, I tried to wrap the functions in the get.task but it still blocked the other get requests. I've updated my post to provide a better idea of what I'm trying to do.Johannajohannah
D
18

Here's a complete sample Tornado app that uses the Async HTTP client and the gen.Task module to make things simple.

If you read more about gen.Task in the docs you'll see that you can actually dispatch multiple requests at the same time. This is using the core idea of Tornado where everything is no blocking and still maintaining a single process.

Update: I've added a Thread handler to demonstrate how you could dispatch work into a second thread and receive the callback() when it's done.

import os
import threading
import tornado.options
import tornado.ioloop
import tornado.httpserver
import tornado.httpclient
import tornado.web
from tornado import gen
from tornado.web import asynchronous

tornado.options.define('port', type=int, default=9000, help='server port number (default: 9000)')
tornado.options.define('debug', type=bool, default=False, help='run in debug mode with autoreload (default: False)')

class Worker(threading.Thread):
   def __init__(self, callback=None, *args, **kwargs):
        super(Worker, self).__init__(*args, **kwargs)
        self.callback = callback

   def run(self):
        import time
        time.sleep(10)
        self.callback('DONE')

class Application(tornado.web.Application):
    def __init__(self):
        handlers = [
            (r"/", IndexHandler),
            (r"/thread", ThreadHandler),
        ]
        settings = dict(
            static_path = os.path.join(os.path.dirname(__file__), "static"),
            template_path = os.path.join(os.path.dirname(__file__), "templates"),
            debug = tornado.options.options.debug,
        )
        tornado.web.Application.__init__(self, handlers, **settings)

class IndexHandler(tornado.web.RequestHandler):
    client = tornado.httpclient.AsyncHTTPClient()

    @asynchronous
    @gen.engine
    def get(self):
        response = yield gen.Task(self.client.fetch, "http://google.com")

        self.finish("Google's homepage is %d bytes long" % len(response.body))

class ThreadHandler(tornado.web.RequestHandler):
    @asynchronous
    def get(self):
        Worker(self.worker_done).start()

    def worker_done(self, value):
        self.finish(value)

def main():
    tornado.options.parse_command_line()
    http_server = tornado.httpserver.HTTPServer(Application())
    http_server.listen(tornado.options.options.port)
    tornado.ioloop.IOLoop.instance().start()

if __name__ == "__main__":
    main()
Dramatic answered 24/10, 2012 at 15:4 Comment(10)
I wrapped my function in the gen.Task but it still did the same thing. I created a get that had multiple response = get.Tasks(). I don't need them executed at the same time.. in fact they need to be serial, but any other get requests are blocked while this get request is in process.Johannajohannah
I've updated my example above. I've tried to wrap all the functions with the gen.Task() and everything worked, but it still blocked me from making responding to queries on StatusHandler until it was finished.Johannajohannah
In your example self.function1() is a pure python function which does no other calls to external services? The original assumption was that it called to another service and your were blocked on that.Dramatic
The function calls a class, which opens up a urllib2.urlopen() and read(). So wrapping the entire function wouldn't do it, I need to wrap the specific call inside that function?Johannajohannah
That's it - you'll need to look to replace the urllib2.urlopen() functions with the Tornado AsyncHTTPClient versions. Since urlopen() blocks until data is received, while the AsyncHTTPClient will return control back to the ioloop.Dramatic
I think I got the Worker working, but had to do some movement of the args. self.args = args and args = {} before the super. Otherwise it gave me an "AssertionError: group argument must be None for now". I plan to work on it tonight or tomorrow. I tried working with the AsyncHTTPClient, but can't get it to behave like a urllib inside another class, or even in my Tornado class when trying to return the response in other functions. Tried removing yield, but not having luck with it. I'll try to play more tomorrow, but looks like the Worker might do the job.Johannajohannah
This is exactly what I wanted to achieve ! Thank you very much koblas !Eran
for thread-safe purpose, shouldn't the callback being called via ' IOLoop.add_callback()' ?Stat
@xingzhi.sg, I believe you are right: tornadoweb.org/en/stable/web.html#thread-safety-notesPropinquity
Thanks for the solution. However, if /thread is loading, and you open another tab for /thread, it will still block. Why?Manoff
U
6

koblas's solution is great. Here is an alternative that makes use of tornado.gen

import tornado.ioloop
import tornado.web
import tornado.gen
import tornado.concurrent
import time
from threading import Thread
from functools import wraps

def run_async(func):
  @wraps(func)
  def async_func(*args, **kwargs):
    func_hl = Thread(target = func, args = args, kwargs = kwargs)
    func_hl.start()
    return func_hl

  return async_func

@run_async
def sleeper(callback):
  i = 0
  while i <= 10:
    print i
    time.sleep(1)
    i += 1
  callback('DONE')


class MainHandler(tornado.web.RequestHandler):
    @tornado.web.asynchronous
    @tornado.gen.coroutine
    def get(self):
        response = yield tornado.gen.Task(sleeper)
        self.write(response)
        self.finish()

class OtherHandler(tornado.web.RequestHandler):
    def get(self):
        self.write('hello world')
        print 'in other'
        self.finish()
Unchain answered 11/4, 2013 at 15:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.