How do I get Python to send as many concurrent HTTP requests as possible?
Asked Answered
S

5

6

I'm trying to send HTTPS requests as quickly as possible. I know this would have to be concurrent requests due to my goal being 150 to 500+ requests a second. I've searched everywhere, but get no Python 3.11+ answer or one that doesn't give me errors. I'm trying to avoid AIOHTTP as the rigmarole of setting it up was a pain, which didn't even work.

The input should be an array or URLs and the output an array of the html string.

Simonsimona answered 24/11, 2022 at 23:56 Comment(7)
Unrelated: I've tried the identical thing in PHP, using Multi-CURL, to some success. I was able to average 50/sec. However, as time went on the speed would exponentially slow down. After 30 minutes, it would go from 50/sec to <0.1/sec. This python script will be running for literal weeks, as well.Simonsimona
What's the RTT between your host and the requested host? Take a look at concurrent.futures.ThreadPoolExecutor and concurrent.futures.ProcessPoolExecutor. They're easy to use and a good place to start with concurrency. Prefer threads over processes since this is an I/O-bound task but be aware that you'll probably need multiple processes running multiple threads to hit your throughput target.Veracruz
Take a look at this answer which achieved this 750 packets/sec. It's packets and sockets rather than https, but it may help you come up with a solution.Veracruz
@MichaelRuth Thank you, I looked into ThreadPoolExecutor and that has seemed to work. See my response. my target site is getting about 250/sec averaging 30Mbps. Is there a wat to fix the bottleneck via code so it may potentially go up to 500Mbps?Simonsimona
Profile the code, see where the bottleneck is. After that, try adding processes. For a dedicated host running only this application, cpu cores - 1 processes is a good place to start. Each process should use your ThreadPoolExecutor code. If you have 500/30 + 1 = 17.667 ~ 18 cores, and your network can handle the load, you could get close to 500Mbps. This is all back of the envelope calculations though, and not many folks have 18 cores to work with. Your best bet is to move this app into a cloud provider that can scale.Veracruz
@MichaelRuth Thank you for the insight. I'll look into this for sure.Simonsimona
Future readers might find this answer helpful as well.Penalty
S
4

This works, getting around 250+ requests a second. This solution does work on Windows 10. You may have to pip install for concurrent and requests.

import time
import requests
import concurrent.futures

start = int(time.time()) # get time before the requests are sent

urls = [] # input URLs/IPs array
responses = [] # output content of each request as string in an array

# create an list of 5000 sites to test with
for y in range(5000):urls.append("https://example.com")

def send(url):responses.append(requests.get(url).content)

with concurrent.futures.ThreadPoolExecutor(max_workers=10000) as executor:
    futures = []
    for url in urls:futures.append(executor.submit(send, url))
        
end = int(time.time()) # get time after stuff finishes
print(str(round(len(urls)/(end - start),0))+"/sec") # get average requests per second

Output: 286.0/sec

Note: If your code requires something extremely time dependent, replace the middle part with this:

with concurrent.futures.ThreadPoolExecutor(max_workers=10000) as executor:
    futures = []
    for url in urls:
        futures.append(executor.submit(send, url))
    for future in concurrent.futures.as_completed(futures):
        responses.append(future.result())

This is a modified version of what this site showed in an example.

The secret sauce is the max_workers=10000. Otherwise, it would average about 80/sec. Although, when setting it to beyond 1000, there wasn't any boost in speed.

Simonsimona answered 27/11, 2022 at 9:44 Comment(0)
T
8

It's quite unfortunate that you couldn't setup AIOHTTP properly because this is one of the most efficient way to do asynchronous requests in Python.

Setup is not that hard:

import asyncio
import aiohttp
from time import perf_counter


def urls(n_reqs: int):
    for _ in range(n_reqs):
        yield "https://python.org"

async def get(session: aiohttp.ClientSession, url: str):
    async with session.get(url) as response:
        _ = await response.text()
             
async def main(n_reqs: int):
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[get(session, url) for url in urls(n_reqs)]
        )


if __name__ == "__main__":
    n_reqs = 10_000
    
    start = perf_counter()
    asyncio.run(main(n_reqs))
    end = perf_counter()
    
    print(f"{n_reqs / (end - start)} req/s")

You basically need to create a single ClientSession which you then reuse to send the get requests. The requests are made concurrently with to asyncio.gather(). You could also use the newer asyncio.TaskGroup:

async def main(n_reqs: int):
    async with aiohttp.ClientSession() as session:
        async with asyncio.TaskGroup() as group:
            for url in urls(n_reqs):
                group.create_task(get(session, url))

This easily achieves 500+ requests per seconds on my 7+ years old bi-core computer. Contrary to what other answers suggested, this solution does not require to spawn thousands of threads, which are expensive.

You may improve the speed even more my using a custom connector in order to allow more concurrent connections (default is 100) in a single session:

async def main(n_reqs: int):
    connector = aiohttp.TCPConnector(limit=0)
    async with aiohttp.ClientSession(connector=connector) as session:
        ...

Tartan answered 2/12, 2022 at 16:36 Comment(1)
I'm flabbergasted. That worked amazingly. In my earlier testing, I would have to download all the things mentioned in this video just to get aiohttp or asyncio to work. But you code just worked instantly! Thank you!!Simonsimona
S
4

This works, getting around 250+ requests a second. This solution does work on Windows 10. You may have to pip install for concurrent and requests.

import time
import requests
import concurrent.futures

start = int(time.time()) # get time before the requests are sent

urls = [] # input URLs/IPs array
responses = [] # output content of each request as string in an array

# create an list of 5000 sites to test with
for y in range(5000):urls.append("https://example.com")

def send(url):responses.append(requests.get(url).content)

with concurrent.futures.ThreadPoolExecutor(max_workers=10000) as executor:
    futures = []
    for url in urls:futures.append(executor.submit(send, url))
        
end = int(time.time()) # get time after stuff finishes
print(str(round(len(urls)/(end - start),0))+"/sec") # get average requests per second

Output: 286.0/sec

Note: If your code requires something extremely time dependent, replace the middle part with this:

with concurrent.futures.ThreadPoolExecutor(max_workers=10000) as executor:
    futures = []
    for url in urls:
        futures.append(executor.submit(send, url))
    for future in concurrent.futures.as_completed(futures):
        responses.append(future.result())

This is a modified version of what this site showed in an example.

The secret sauce is the max_workers=10000. Otherwise, it would average about 80/sec. Although, when setting it to beyond 1000, there wasn't any boost in speed.

Simonsimona answered 27/11, 2022 at 9:44 Comment(0)
G
0

Hope this helps, this question asked What is the fastest way to send 10000 http requests

I observed 15000 requests in 10s, using wireshark to trap on localhost and saved packets to CSV, only counted packets that had GET in them.

FILE: a.py

from treq import get
from twisted.internet import reactor

def done(response):
   if response.code == 200:
       get("http://localhost:3000").addCallback(done)

get("http://localhost:3000").addCallback(done)

reactor.callLater(10, reactor.stop)
reactor.run()

Run test like this:

pip3 install treq
python3 a.py  # code from above

Setup test website like this, mine was on port 3000

mkdir myapp
cd myapp
npm init
npm install express
node app.js

FILE: app.js

const express = require('express')
const app = express()
const port = 3000

app.get('/', (req, res) => {
  res.send('Hello World!')
})

app.listen(port, () => {
  console.log(`Example app listening on port ${port}`)
})

OUTPUT

grep GET wireshark.csv  | head
"5","0.000418","::1","::1","HTTP","139","GET / HTTP/1.1 "
"13","0.002334","::1","::1","HTTP","139","GET / HTTP/1.1 "
"17","0.003236","::1","::1","HTTP","139","GET / HTTP/1.1 "
"21","0.004018","::1","::1","HTTP","139","GET / HTTP/1.1 "
"25","0.004803","::1","::1","HTTP","139","GET / HTTP/1.1 "

grep GET wireshark.csv  | tail
"62145","9.994184","::1","::1","HTTP","139","GET / HTTP/1.1 "
"62149","9.995102","::1","::1","HTTP","139","GET / HTTP/1.1 "
"62153","9.995860","::1","::1","HTTP","139","GET / HTTP/1.1 "
"62157","9.996616","::1","::1","HTTP","139","GET / HTTP/1.1 "
"62161","9.997307","::1","::1","HTTP","139","GET / HTTP/1.1 "

Gifford answered 25/11, 2022 at 0:58 Comment(2)
When I try to install treq, everything goes fine until I get this install error: Building wheel for twisted-iocpsupport (pyproject.toml) ... error error: subprocess-exited-with-error Any way I can fix this?Simonsimona
Maybe an answer is installation failed building wheel for twisted in windows 10 python 3 , otherwise you can try the twisted community links. Unfortunately, I don't use Windows so limited in how I can help.Gifford
H
0

I have created a package called unparallel that fits your use case. You can use it as follows:

import asyncio

from unparallel import up


async def main():
    urls = [
        "https://www.google.com/",
        "https://www.youtube.com/",
        "https://www.facebook.com/",
        "https://www.wikipedia.org/"
    ]

    # Do GET requests and return the content for all URLs
    responses = await up(urls, response_fn=lambda x: x.text)

    # Iterate over the responses and print the content
    for url, content in zip(urls[:10], responses):
        print(url, content[:100])


if __name__ == "__main__":
    asyncio.run(main())

Here is my the output I get if I run the above:

❯ python docs/examples/multiple_websites.py 
Making async requests: 100%|█████████████████| 4/4 [00:00<00:00,  9.19it/s]
https://www.google.com/: '<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="de-AT"><head><meta cont'
https://www.youtube.com/: '<!DOCTYPE html><html style="font-size: 10px;font-family: Roboto, Arial, sans-serif;" lang="de-DE" da'
https://www.facebook.com/: '<!DOCTYPE html>\n<html lang="de" id="facebook" class="no_js">\n<head><meta charset="utf-8" /><meta nam'
https://www.wikipedia.org/: '<!DOCTYPE html>\n<html lang="en" class="no-js">\n<head>\n<meta charset="utf-8">\n<title>Wikipedia</title'

You can check the docs to find out more about how to parametrize up().

Hooghly answered 25/3 at 15:6 Comment(0)
C
0

use httpx.AsyncClient can do efficient async request, there is connection pool, so parallel request can be sent, and the connection is persistent so can be reused.

The httpx.Client & httpx.AysncClient is also used by openai python sdk for the ChatGPT client.

I use it just like ab:

ab -c 300 -n 10000 http://127.0.0.1:8000/
python ab.py -c 300 -n 10000 http://127.0.0.1:8000/

install

pip install httpx -U

ab.py

import httpx
import time
import argparse
import asyncio

# split requests list into batches
def batches(*, data:list, batch_size:int =30):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(data), batch_size):
        yield data[i:i + batch_size]

# client create is expensive, we use only one client
async def do_tasks_one_client(urls, batch_size = 30):
    async with httpx.AsyncClient() as client:
        for batch in batches(data=urls, batch_size=batch_size):
            tasks = [client.get(url) for url in batch]
            result = await asyncio.gather(*tasks)

def main(url: str, numbers:int, batch_size = 30):
    urls = [url] * numbers
    asyncio.run(do_tasks_one_client(urls=urls, batch_size=batch_size))


if __name__ == "__main__":
    
    parser = argparse.ArgumentParser()
    parser.add_argument('-n', '--number', type=int, default=100)
    parser.add_argument('-c', '--connection', type=int, default=30)
    parser.add_argument('url', type=str)
    args = parser.parse_args()
    
    start = time.time()
    main(args.url, number=args.number, batch_size=args.connection)
    end = time.time()

    print("Took {} seconds".format(end - start))
    
Cystocele answered 27/7 at 13:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.