How could I use requests in asyncio?
Asked Answered
L

8

211

I want to do parallel http request tasks in asyncio, but I find that python-requests would block the event loop of asyncio. I've found aiohttp but it couldn't provide the service of http request using a http proxy.

So I want to know if there's a way to do asynchronous http requests with the help of asyncio.

Langur answered 5/3, 2014 at 6:36 Comment(3)
If you are just sending requests you could use subprocess to parallel your your code.Amalekite
There is now an asyncio port of requests. github.com/rdbhost/yieldfromRequestsTernion
This question is also useful for cases where something indirectly relies on requests (like google-auth) and can't be trivially rewritten to use aiohttp.Consequent
U
242

To use requests (or any other blocking libraries) with asyncio, you can use BaseEventLoop.run_in_executor to run a function in another thread and yield from it to get the result. For example:

import asyncio
import requests

@asyncio.coroutine
def main():
    loop = asyncio.get_event_loop()
    future1 = loop.run_in_executor(None, requests.get, 'http://www.google.com')
    future2 = loop.run_in_executor(None, requests.get, 'http://www.google.co.uk')
    response1 = yield from future1
    response2 = yield from future2
    print(response1.text)
    print(response2.text)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

This will get both responses in parallel.

With python 3.5 you can use the new await/async syntax:

import asyncio
import requests

async def main():
    loop = asyncio.get_event_loop()
    future1 = loop.run_in_executor(None, requests.get, 'http://www.google.com')
    future2 = loop.run_in_executor(None, requests.get, 'http://www.google.co.uk')
    response1 = await future1
    response2 = await future2
    print(response1.text)
    print(response2.text)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

See PEP0492 for more.

Unknow answered 14/3, 2014 at 20:6 Comment(19)
I tried but got the exception SyntaxError: 'yield' outside functionLangur
You can't use 'yield' outside of a function because the 'yield' keyword will convert a function into a generator (so it needs to be done inside a function). I'll update the example to be more complete.Unknow
Can you explain how exactly this works? I don't understand how this doesn't block.Fransis
@scoarescoare According to the docs, run_in_executor() will use an Executor (by default a ThreadPoolExecutor) to run the methods in different threads (or subprocess if specified) and wait for the result. The advantage run_in_executor() has over just using an Executor is that it integrates nicely with asyncio.Unknow
@Unknow but if its running concurrently in another thread, isn't that defeating the point of asyncio?Fransis
@scoarescoare Not really, if you do it right. In this case, you're simply firing a call off and getting its return value. While run_in_executor() is 'blocking' (for instance, in a similar way to how asyncio's Streams will block, or how a socket might block), control will be yielded to another coroutine waiting in asyncio's event loop.Unknow
@Unknow Yeah, the part about it firing a call off and resuming execution makes sense. But if I understand correctly, requests.get will be executing in another thread. I believe one of the big pros of asyncio is the idea of keeping things single-threaded: not having to deal with shared memory, locking, etc. I think my confusion lies in the fact that your example uses both asyncio and concurrent.futures module.Fransis
@scoarescoare That's where the 'if you do it right' part comes in - the method you run in the executor should be self-contained ((mostly) like requests.get in the above example). That way you don't have to deal with shared memory, locking, etc., and the complex parts of your program are still single threaded thanks to asyncio.Unknow
@Unknow ok thanks for clearing it up! Really solidified it for me. Good point about Executor is that it integrates nicely with asyncio. It took me a while to realize these are completely different libraries.Fransis
@scoarescoare The main use case is for integrating with IO libraries that don't have support for asyncio. For instance, I'm doing some work with a truly ancient SOAP interface, and I'm using the suds-jurko library as the "least bad" solution. I'm trying to integrate it with an asyncio server, so I'm using run_in_executor to make the blocking suds calls in a way that looks asynchronous.Spearwort
Really cool that this works and so is so easy for legacy stuff, but should be emphasised this uses an OS threadpool and so doesn't scale up as a true asyncio oriented lib like aiohttp doesNiigata
Note that you may want to avoid using the hooks= keyword when calling Requests like that, as hook functions will likely break the self-contained requirement, depending on what you do in there.Khartoum
why does the line loop = asyncio.get_event_loop() appears twice?Rubeola
@Rubeola It doesn't need to appear twice. You can pass the module level "loop" into main, if you want (assuming you change the definition of "main" to take an argument).Unknow
run_in_executor doesn't allow to pass kwargs for target callback func (like loop.run_in_executor(None, requests.get, 'http://www.google.com', auth=blah) # fails, this can be achieved with lambda or functools.partial as a proxy: loop.run_in_executor(None, lambda: requests.get('http://www.google.com', auth=blah)). see python.org/dev/peps/pep-3156/#callback-styleHydrocele
Why does main() need to be in event loop when the main() body already running event loop?Bartender
main() isn't running an event loop. It gets the event loop in order to run requests.get() asynchronously. If a function wants to await an async function, it needs to be running in the event loop.Unknow
@Unknow Why can't I simply write an async function do_get() that calls requests.get() and await that function? As far as I understand, the program will go do something else while it waits for do_get() to finish, and the only thing do_get() does is make the requests call. Won't that be async then?Caporal
@Caporal The problem is that requests.get() isn't async, so when do_get() would call requests.get(), the function would block. The event loop will only do something else when a function yields control (using await) which requests.get() doesn't. And since it's blocking, the only way to run something else concurrently is in another thread/process, which is what run_in_executor() does.Unknow
S
114

aiohttp can be used with HTTP proxy already:

import asyncio
import aiohttp


@asyncio.coroutine
def do_request():
    proxy_url = 'http://localhost:8118'  # your proxy address
    response = yield from aiohttp.request(
        'GET', 'http://google.com',
        proxy=proxy_url,
    )
    return response

loop = asyncio.get_event_loop()
loop.run_until_complete(do_request())
Stinking answered 21/5, 2014 at 13:30 Comment(4)
What does the connector do here?Favien
It provides a connection through proxy serverStinking
This is a much better solution then to use requests in a separate thread. Since it is truly async it has lower overhead and lower mem usage.Retrorse
for python >=3.5 replace @asyncio.coroutine with "async" and "yield from" with "await"Lindesnes
S
94

The answers above are still using the old Python 3.4 style coroutines. Here is what you would write if you got Python 3.5+.

aiohttp supports http proxy now

import aiohttp
import asyncio

async def fetch(session, url):
    async with session.get(url) as response:
        return await response.text()

async def main():
    urls = [
            'http://python.org',
            'https://google.com',
            'http://yifei.me'
        ]
    tasks = []
    async with aiohttp.ClientSession() as session:
        for url in urls:
            tasks.append(fetch(session, url))
        htmls = await asyncio.gather(*tasks)
        for html in htmls:
            print(html[:100])

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

There is also the httpx library, which is a drop-in replacement for requests with async/await support. However, httpx is somewhat slower than aiohttp.

Another option is curl_cffi, which has the ability to impersonate browsers' ja3 and http2 fingerprints.

Singband answered 13/5, 2018 at 5:17 Comment(5)
could you elaborate with more urls? It does not make sense to have only one url when the question is about parallel http request.Lingua
Legend. Thank you! Works greatMairemaise
@Singband How this code can be modified to deliver say 10k URLs using 100 requests in parallel? The idea is to use all 100 slots simultaneously, not to wait for 100 to be delivered in order to start next 100.Euphrosyne
@AntoanMilkov That's a different question that can not be answered in the comment area.Singband
@Singband You are right, here is the question: #56523543Euphrosyne
A
13

Requests does not currently support asyncio and there are no plans to provide such support. It's likely that you could implement a custom "Transport Adapter" (as discussed here) that knows how to use asyncio.

If I find myself with some time it's something I might actually look into, but I can't promise anything.

Acidforming answered 5/3, 2014 at 10:56 Comment(1)
The link leads to a 404.Mckinleymckinney
S
12

There is a good case of async/await loops and threading in an article by Pimin Konstantin Kefaloukos Easy parallel HTTP requests with Python and asyncio:

To minimize the total completion time, we could increase the size of the thread pool to match the number of requests we have to make. Luckily, this is easy to do as we will see next. The code listing below is an example of how to make twenty asynchronous HTTP requests with a thread pool of twenty worker threads:

# Example 3: asynchronous requests with larger thread pool
import asyncio
import concurrent.futures
import requests

async def main():

    with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:

        loop = asyncio.get_event_loop()
        futures = [
            loop.run_in_executor(
                executor, 
                requests.get, 
                'http://example.org/'
            )
            for i in range(20)
        ]
        for response in await asyncio.gather(*futures):
            pass


loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Swoosh answered 30/11, 2017 at 11:17 Comment(5)
problem with this is that if I need to run 10000 requests with chunks of 20 executors, I have to wait for all 20 executors to finish in order to start with the next 20, right? I cannot do for for i in range(10000) because one requests might fail or timeout, right?Saucepan
Can you pls explain why do you need asyncio when you can do the same just using ThreadPoolExecutor?Philipson
@lya Rusin Based on what, do we set the number of max_workers? Does it have to do with number of CPUs and threads?Headliner
@AsafPinhassi if the rest of your script/program/service is asyncio, you'll want to use it "all the way". you'd probably be better off using aiohttp (or some other lib that supports asyncio)Simpkins
@Headliner it actually does not matter how many CPU you have. The point of delegating this work to a thread (and the whole point of asyncio) is for IO bound operations. The thread will simply by idle ("waiting") for the response to retrieved from the socket. asyncio enables to actually handle many concurrent (not parallel!) requests with no threads at all (well, just one). However, requests does not support asyncio so you need to create threads to get concurrency.Simpkins
C
9

Considering that aiohttp is fully featured web framework, I’d suggest to use something more light weighted like httpx (https://www.python-httpx.org/) which supports async requests. It has almost identical api to requests:

>>> async with httpx.AsyncClient() as client:
...     r = await client.get('https://www.example.com/')
...
>>> r
<Response [200 OK]>
Cittern answered 1/10, 2021 at 17:16 Comment(1)
There is a nice article covering this topic blog.jonlu.ca/posts/async-python-httpCittern
O
3

python-requests does not natively support asyncio yet. Going with a library that natively supports asyncio, like httpx would be the most beneficial approach.

However, if your use cases heavily relies on using python-requests you can wrap the sync calls with asyncio.to_thread and asyncio.gather and follow the asyncio programming patterns.

import asyncio
import requests

async def main():
    res = await asyncio.gather(asyncio.to_thread(requests.get("YOUR_URL"),)

if __name__ == "__main__":
    asyncio.run(main())

For concurrency/parallelization of the network requests:

import asyncio
import requests

urls = ["URL_1", "URL_2"]

async def make_request(url: string):
    response = await asyncio.gather(asyncio.to_thread(requests.get(url),)
    return response

async def main():
    responses = await asyncio.gather((make_request(url) for url in urls))
    for response in responses:
        print(response)

if __name__ == "__main__":
    asyncio.run(main())
Oleary answered 16/11, 2023 at 14:18 Comment(0)
G
2

DISCLAMER: Following code creates different threads for each function.

This might be useful for some of the cases as it is simpler to use. But know that it is not async but gives illusion of async using multiple threads, even though decorator suggests that.

To make any function non blocking, simply copy the decorator and decorate any function with a callback function as parameter. The callback function will receive the data returned from the function.

import asyncio
import requests


def run_async(callback):
    def inner(func):
        def wrapper(*args, **kwargs):
            def __exec():
                out = func(*args, **kwargs)
                callback(out)
                return out

            return asyncio.get_event_loop().run_in_executor(None, __exec)

        return wrapper

    return inner


def _callback(*args):
    print(args)


# Must provide a callback function, callback func will be executed after the func completes execution !!
@run_async(_callback)
def get(url):
    return requests.get(url)


get("https://google.com")
print("Non blocking code ran !!")
Grijalva answered 28/12, 2020 at 16:20 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.