Asynchronous HTTP calls in Python
Asked Answered
L

4

12

I have a need for a callback kind of functionality in Python where I am sending a request to a webservice multiple times, with a change in the parameter each time. I want these requests to happen concurrently instead of sequentially, so I want the function to be called asynchronously.

It looks like asyncore is what I might want to use, but the examples I've seen of how it works all look like overkill, so I'm wondering if there's another path I should be going down. Any suggestions on modules/process? Ideally I'd like to use these in a procedural fashion instead of creating classes but I may not be able to get around that.

Leitmotif answered 10/2, 2011 at 21:17 Comment(4)
Way overkill. All i need are simultaneous http calls from within a script (I don't need to call a process from the command line, etc). I simply need to have callback functionality but I can't find the process for this in python. Further research is leading me toward urllib2.Leitmotif
Overkill? Threads have nothing to do with calling processes from the command line.Colon
tippytop, yes of course urllib2 for transport.. but you still need to spawn them in parallel. so you can do threading, multiprocessing, concurrent.futures, or an asynch i/o based solution.Gerladina
@Colon Because python threads are terrible.Cuckoopint
R
8

Twisted framework is just the ticket for that. But if you don't want to take that on you might also use pycurl, wrapper for libcurl, that has its own async event loop and supports callbacks.

Retrieve answered 10/2, 2011 at 21:55 Comment(2)
I ended up taking the pycurl approach back when I posted this (sorry for the late acceptance).Leitmotif
@tippytop Cool. You might also be interested in my simplifying wrapper on top of that. The pycopia.WWW.client module.Retrieve
G
18

Starting in Python 3.2, you can use concurrent.futures for launching parallel tasks.

Check out this ThreadPoolExecutor example:

http://docs.python.org/dev/library/concurrent.futures.html#threadpoolexecutor-example

It spawns threads to retrieve HTML and acts on responses as they are received.

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the url and contents
def load_url(url, timeout):
    conn = urllib.request.urlopen(url, timeout=timeout)
    return conn.readall()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

The above example uses threading. There is also a similar ProcessPoolExecutor that uses a pool of processes, rather than threads:

http://docs.python.org/dev/library/concurrent.futures.html#processpoolexecutor-example

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the url and contents
def load_url(url, timeout):
    conn = urllib.request.urlopen(url, timeout=timeout)
    return conn.readall()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))
Gerladina answered 10/2, 2011 at 23:32 Comment(0)
K
16

Do you know about eventlet? It lets you write what appears to be synchronous code, but have it operate asynchronously over the network.

Here's an example of a super minimal crawler:

urls = ["http://www.google.com/intl/en_ALL/images/logo.gif",
     "https://wiki.secondlife.com/w/images/secondlife.jpg",
     "http://us.i1.yimg.com/us.yimg.com/i/ww/beta/y3.gif"]

import eventlet
from eventlet.green import urllib2

def fetch(url):

  return urllib2.urlopen(url).read()

pool = eventlet.GreenPool()

for body in pool.imap(fetch, urls):
  print "got body", len(body)
Kersten answered 11/2, 2011 at 18:54 Comment(0)
R
8

Twisted framework is just the ticket for that. But if you don't want to take that on you might also use pycurl, wrapper for libcurl, that has its own async event loop and supports callbacks.

Retrieve answered 10/2, 2011 at 21:55 Comment(2)
I ended up taking the pycurl approach back when I posted this (sorry for the late acceptance).Leitmotif
@tippytop Cool. You might also be interested in my simplifying wrapper on top of that. The pycopia.WWW.client module.Retrieve
P
-1

(Although this thread is about server-side Python. Since this question was asked a while back. Others might stumble on this where they are looking for a similar answer on the client side)

For a client side solution, you might want to take a look at Async.js library especially the "Control-Flow" section.

https://github.com/caolan/async#control-flow

By combining the "Parallel" with a "Waterfall" you can achieve your desired result.

WaterFall( Parallel(TaskA, TaskB, TaskC) -> PostParallelTask)

If you examine the example under Control-Flow - "Auto" they give you an example of the above: https://github.com/caolan/async#autotasks-callback where "write-file" depends on "get_data" and "make_folder" and "email_link" depends on write-file".

Please note that all of this happens on the client side (unless you're doing Node.JS - on the server-side)

For server-side Python, look at PyCURL @ https://github.com/pycurl/pycurl/blob/master/examples/basicfirst.py

By combining the example below with pyCurl, you can achieve the non-blocking multi-threaded functionality.

Perform answered 6/2, 2014 at 20:57 Comment(1)
This is not a thread - it is a question. And this doesn't seem to answer it...Comfort

© 2022 - 2024 — McMap. All rights reserved.