Use Twisted's getPage as urlopen?
Asked Answered
A

2

5

I would like to use Twisted non-blocking getPage method within a webapp, but it feels quite complicated to use such function compared to urlopen.

This is an example of what I'm trying to achive:

def web_request(request):
   response = urllib.urlopen('http://www.example.org')
   return HttpResponse(len(response.read()))

Is it so hard to have something similar with getPage?

Anabranch answered 27/4, 2010 at 10:48 Comment(0)
F
20

The thing to realize about non-blocking operations (which you seem to explicitly want) is that you can't really write sequential code with them. The operations don't block because they don't wait for a result. They start the operation and return control to your function. So, getPage doesn't return a file-like object you can read from like urllib.urlopen does. And even if it did, you couldn't read from it until the data was available (or it would block.) And so you can't call len() on it, since that needs to read all the data first (which would block.)

The way to deal with non-blocking operations in Twisted is through Deferreds, which are objects for managing callbacks. getPage returns a Deferred, which means "you will get this result later". You can't do anything with the result until you get it, so you add callbacks to the Deferred, and the Deferred will call these callbacks when the result is available. That callback can then do what you want it to:

def web_request(request)
    def callback(data):
        HttpResponse(len(data))
    d = getPage("http://www.example.org")
    d.addCallback(callback)
    return d

An additional problem with your example is that your web_request function itself is blocking. What do you want to do while you wait for the result of getPage to become available? Do something else within web_request, or just wait? Or do you want to turn web_request itself non-blocking? If so, how do you want to produce the result? (The obvious choice in Twisted is to return another Deferred -- or even the same one as getPage returns, as in the example above. This may not always be appropriate if you're writing code in another framework, though.)

There is a way to write sequential code using Deferreds, although it's somewhat restrictive, harder to debug, and core Twisted people cry when you use it: twisted.internet.defer.inlineCallbacks. It uses the new generator feature in Python 2.5 where you can send data into a generator, and the code would look somewhat like this:

@defer.inlineCallbacks
def web_request(request)
    data = yield getPage("http://www.example.org")
    HttpResponse(len(data))

Like the example that explicitly returned the d Deferred, this'll only work if the caller expects web_request to be non-blocking -- the defer.inlineCallbacks decorator turns the generator into a function that returns a Deferred.

Fluoridation answered 27/4, 2010 at 11:53 Comment(6)
@Thomas: thanks for the awesome reply!! :D Unfortunately the first example is returning Deferred object instead of the HttpResponse object!!Anabranch
Your code didn't return the HttpResponse object either, but yes, this comes down to what I explained about the caller expecting web_request to be blocking. You can't change something from blocking to non-blocking without changing how it's called.Fluoridation
I'm saying it's far from simple.Fluoridation
@Thomas: what if I use a while loop with a timeout in there? :|Anabranch
RadiantHex, you're going to have to accept that if you want non-blocking operations you can't use blocking code.Cavatina
If you use a while loop, it's blocking again. Why would you want to use getPage in order to have a non-blocking operation just to make it blocking again?Fluoridation
P
4

I posted a response to a similar question recently that provides the minimal amount of code required to get the contents from a URL using getPage. Here it is for completeness:

from twisted.web.client import getPage
from twisted.internet import reactor

url = 'http://aol.com'

def print_and_stop(output):
    print output
    if reactor.running:
       reactor.stop()

if __name__ == '__main__':
    print 'fetching', url
    d = getPage(url)
    d.addCallback(print_and_stop)
    reactor.run()

Keep in mind that you'll probably need a more in-depth understanding of the reactor pattern used by Twisted to handle events (getPage firing being an event in this instance).

Potiche answered 27/4, 2010 at 16:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.