Python Requests vs PyCurl Performance
Asked Answered
I

4

60

How does the Requests library compare with the PyCurl performance wise?

My understanding is that Requests is a python wrapper for urllib whereas PyCurl is a python wrapper for libcurl which is native, so PyCurl should get better performance, but not sure by how much.

I can't find any comparing benchmarks.

Irresponsible answered 17/3, 2013 at 14:41 Comment(0)
B
138

I wrote you a full benchmark, using a trivial Flask application backed by gUnicorn/meinheld + nginx (for performance and HTTPS), and seeing how long it takes to complete 10,000 requests. Tests are run in AWS on a pair of unloaded c4.large instances, and the server instance was not CPU-limited.

TL;DR summary: if you're doing a lot of networking, use PyCurl, otherwise use requests. PyCurl finishes small requests 2x-3x as fast as requests until you hit the bandwidth limit with large requests (around 520 MBit or 65 MB/s here), and uses from 3x to 10x less CPU power. These figures compare cases where connection pooling behavior is the same; by default, PyCurl uses connection pooling and DNS caches, where requests does not, so a naive implementation will be 10x as slow.

Combined-chart-RPS CPU Time by request size detailed

Just HTTP throughput Just HTTP RPS

Note that double log plots are used for the below graph only, due to the orders of magnitude involved HTTP & HTTPS throughput HTTP & HTTPS RPS

  • pycurl takes about 73 CPU-microseconds to issue a request when reusing a connection
  • requests takes about 526 CPU-microseconds to issue a request when reusing a connection
  • pycurl takes about 165 CPU-microseconds to open a new connection and issue a request (no connection reuse), or ~92 microseconds to open
  • requests takes about 1078 CPU-microseconds to open a new connection and issue a request (no connection reuse), or ~552 microseconds to open

Full results are in the link, along with the benchmark methodology and system configuration.

Caveats: although I've taken pains to ensure the results are collected in a scientific way, it's only testing one system type and one operating system, and a limited subset of performance and especially HTTPS options.

Bourgeon answered 2/10, 2015 at 3:1 Comment(13)
Your benchmark is nice, but localhost has no network layer overhead whatsoever. If you could cap the data transfer speed at actual network speeds, using realistic response sizes (pong is not realistic), and including a mix of content-encoding modes (with and without compression), and then produce timings based on that, then you'd have benchmark data with actual meaning.Coenurus
I also note that you moved the setup for pycurl out of the loop (setting the URL and writedata target should arguably be part of the loop), and don't read out the cStringIO buffer; the non-pycurl tests all have to produce the response as a Python string object.Coenurus
@MartijnPieters Lack of network overhead is intentional; the intent here is to test the client in isolation. The URL is pluggable there, so you can test it against a real, live server of your choice (by default it doesn't, because I don't want to hammer someone's system). Key note: the later test of pycurl reads out the response body via body.getvalue, and performance is very similar. PRs are welcome for the code if you can suggest improvements.Bourgeon
@MartijnPieters I did try testing with external servers, but... with this many connection requests, it triggers DoS prevention measures unfortunately. If you've got notions on how to avoid that, be my guest.Bourgeon
I was talking about using a network interface throttle (see some sample applications that achieve this) plus some real-world data loads to see how much of a difference pycurl makes to different scenarios.Coenurus
@MartijnPieters Please, if you know a good way to do this, submit a PR! I wanted to get something out there to get actual numbers, but didn't have the time to invest in designing a full framework. As it stands, the benchmark is definitely open to improvement and enhancement, and would welcome any contributions!Bourgeon
I'm sorry, I don't have the time right now either, nor do I have a network conditioner ready to go.Coenurus
This is not a good benchmark for using Requests. This creates a new connection with every single request. You should be using a session.Imaginary
Okay, I cleaned up the benchmarks, with connection-reuse: Requests: 4.47s. Urllib3: 2.9s. PyCurl: 0.639351.Imaginary
@KennethReitz Yeah, it's a fairly rough benchmark, and if you've got improvements ready to go, I'd welcome a PR (and can rerun on the original system for apples-to-apples comparison)! We really should have benchmark coverage with and without connection reuse for all cases. This is because one might be issuing requests to different servers or a string of requests to the same one. Based on your figures, I think we're still not wrong in saying pycurl is between 3x and 10x faster with the same connection behavior.Bourgeon
@KennethReitz I've merged your PR, integrated against a rework of the command line execution and test format, and am investigating to see if the (now) anomalously bad pycurl performance is real or a result of bad implementation.Bourgeon
@KennethReitz Thank you, fancy graphs including your PR are now available on Github (and a much-refined test script + Docker image).Bourgeon
@Martijn_Pieters You may want to take a look again, I've updated with a benchmark with full network overheads in AWS.Bourgeon
C
22

First and foremost, requests is built on top of the urllib3 library, the stdlib urllib or urllib2 libraries are not used at all.

There is little point in comparing requests with pycurl on performance. pycurl may use C code for its work but like all network programming, your execution speed depends largely on the network that separates your machine from the target server. Moreover, the target server could be slow to respond.

In the end, requests has a far more friendly API to work with, and you'll find that you'll be more productive using that friendlier API.

Coenurus answered 17/3, 2013 at 14:59 Comment(5)
I agree that for most applications the clean API of requests matters most; but for network-intensive applications, there's no excuse not to use pycurl. The overhead may matter (especially within a data center).Bourgeon
@BobMcGee: if the network speeds are so high that the overhead is going to matter, you should not be using Python for the whole application anymore.Coenurus
@Martijn_Pieters Disagree -- python performance isn't that bad, and in general it's pretty easy to delegate the performance-sensitive bits to native libraries (which pycurl is a perfect example of). DropBox can make it work, and yum internally uses pycurl (since a lot of its work is simply network fetches, which need to be as fast as possible).Bourgeon
@BobMcGee: yes, for specialist codebases like yum it can be worth the pain of having to deal with the pycurl API; for the vast majority of URL processing needs however the tradeoff lies heavily in favour of requests. In other words, most projects will not need to go through the pain of using pycurl; in my opinion you need to be pretty network-heavy before it is worth giving up the requests API; the difference in ease of development is huge.Coenurus
@MarijnPieters: Totally agree with that! Requests should be the default go-to unless network performance is critical (or you need low-level curl functionality). To complete that picture we now have a benchmark that someone can use to test for themself.Bourgeon
O
8

It seems there is a new kid on the block: - requests interface for pycurl.

Thank You for the bench mark - it was nice - I like curl and it seems to be able to do a bit more than http.

https://github.com/dcoles/pycurl-requests

Ochone answered 6/4, 2020 at 20:11 Comment(0)
L
2

Focussing on Size -

  1. On my Mac Book Air with 8GB of RAM and a 512GB SSD, for a 100MB file coming in at 3 kilobytes a second (from the internet and wifi), pycurl, curl and the requests library's get function (regardless of chunking or streaming) are pretty much the same.

  2. On a smaller Quad core Intel Linux box with 4GB RAM, over localhost (from Apache on the same box), for a 1GB file, curl and pycurl are 2.5x faster than the 'requests' library. And for requests chunking and streaming together give a 10% boost (chunk sizes above 50,000).

I thought I was going to have to swap requests out for pycurl, but not so as the application I'm making isn't going to have client and server that close.

Legroom answered 30/7, 2017 at 1:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.