Reusing the same curl handle. Big performance increase?
Asked Answered
C

6

40

In a PHP script I am doing a lot of different curl GET requests (a hundred) to different URLs.

Will reusing the same handle from curl_init improve the performance, or is it negligible compare to the response time of the requests?

I am asking that because in the current architecture it would be not easy to keep the same handle.

Civics answered 24/9, 2010 at 12:21 Comment(3)
Have you looked into curl_multi_init ?Bracing
Yes but I need to do synchronous curl requests.Civics
Be carefull to use this! See the WARNING in my answer belowDiane
H
17

It depends on if the urls are on same servers or not. If they are, concurrent requests to same server will reuse the connection. see CURLOPT_FORBID_REUSE.

If the urls are sometimes on same server you need to sort the urls as the default connection cache is limited to ten or twenty connections.

If they are on different servers there is no speed advantage on using the same handle.

With curl_multi_exec you can connect to different servers at a same time (parallel). Even then you need some queuing to not use thousands of simultaneous connections.

Holzer answered 19/1, 2011 at 15:10 Comment(3)
This answer is ambiguous. It didn't explicitly answer the user's question - Is reusing the same curl handle ... improve the performance? And the statement "If they are, concurrent requests to same server will reuse the connection." <- this phrase can be assuming using same curl handle, or not using same curl. If it is not, better explicitly claims that "If they are, concurrent requests to same server will reuse the connection, no matter reuse same curl handle or not"Infest
Agree with @JohnnyWong.Ethical
A more accurate improvement for first sentence should be: It depends on if the urls are on same servers or not. If they are, concurrent requests to same server will reuse the connection, **if same curl handle is reused**. see CURLOPT_FORBID_REUSE.Infest
C
53

Crossposted from Should I close cURL or not? because I think it's relevant here too.

I tried benching curl with using a new handle for each request and using the same handle with the following code:

ob_start(); //Trying to avoid setting as many curl options as possible
$start_time = microtime(true);
for ($i = 0; $i < 100; ++$i) {
    $rand = rand();
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, "http://www.google.com/?rand=" . $rand);
    curl_exec($ch);
    curl_close($ch);
}
$end_time = microtime(true);
ob_end_clean();
echo 'Curl without handle reuse: ' . ($end_time - $start_time) . '<br>';

ob_start(); //Trying to avoid setting as many curl options as possible
$start_time = microtime(true);
$ch = curl_init();
for ($i = 0; $i < 100; ++$i) {
    $rand = rand();
    curl_setopt($ch, CURLOPT_URL, "http://www.google.com/?rand=" . $rand);
    curl_exec($ch);
}
curl_close($ch);
$end_time = microtime(true);
ob_end_clean();
echo 'Curl with handle reuse: ' . ($end_time - $start_time) . '<br>';

and got the following results:

Curl without handle reuse: 8.5690529346466
Curl with handle reuse: 5.3703031539917

So reusing the same handle actually provides a substantial performance increase when connecting to the same server multiple times. I tried connecting to different servers:

$url_arr = array(
    'http://www.google.com/',
    'http://www.bing.com/',
    'http://www.yahoo.com/',
    'http://www.slashdot.org/',
    'http://www.stackoverflow.com/',
    'http://github.com/',
    'http://www.harvard.edu/',
    'http://www.gamefaqs.com/',
    'http://www.mangaupdates.com/',
    'http://www.cnn.com/'
);
ob_start(); //Trying to avoid setting as many curl options as possible
$start_time = microtime(true);
foreach ($url_arr as $url) {
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_exec($ch);
    curl_close($ch);
}
$end_time = microtime(true);
ob_end_clean();
echo 'Curl without handle reuse: ' . ($end_time - $start_time) . '<br>';

ob_start(); //Trying to avoid setting as many curl options as possible
$start_time = microtime(true);
$ch = curl_init();
foreach ($url_arr as $url) {
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_exec($ch);
}
curl_close($ch);
$end_time = microtime(true);
ob_end_clean();
echo 'Curl with handle reuse: ' . ($end_time - $start_time) . '<br>';

And got the following result:

Curl without handle reuse: 3.7672290802002
Curl with handle reuse: 3.0146431922913

Still quite a substantial performance increase.

Carolynecarolynn answered 4/8, 2013 at 20:24 Comment(3)
out of curiosity, what is the 'rand()' call doing in the second test? Seems like that could introduce a substantial difference between the benchmarks being compared.Warden
@Warden Good point. It's not needed in the second test. But since the second test is only 10 iterations and we're dealing with times in the seconds, its impact isn't substantial.Carolynecarolynn
As this post is fairly old, I'd like to add, that reusing the handle when dealing with SSL connections can bus even more performant, as you don't need an SSL handshake at each request.Catalectic
H
17

It depends on if the urls are on same servers or not. If they are, concurrent requests to same server will reuse the connection. see CURLOPT_FORBID_REUSE.

If the urls are sometimes on same server you need to sort the urls as the default connection cache is limited to ten or twenty connections.

If they are on different servers there is no speed advantage on using the same handle.

With curl_multi_exec you can connect to different servers at a same time (parallel). Even then you need some queuing to not use thousands of simultaneous connections.

Holzer answered 19/1, 2011 at 15:10 Comment(3)
This answer is ambiguous. It didn't explicitly answer the user's question - Is reusing the same curl handle ... improve the performance? And the statement "If they are, concurrent requests to same server will reuse the connection." <- this phrase can be assuming using same curl handle, or not using same curl. If it is not, better explicitly claims that "If they are, concurrent requests to same server will reuse the connection, no matter reuse same curl handle or not"Infest
Agree with @JohnnyWong.Ethical
A more accurate improvement for first sentence should be: It depends on if the urls are on same servers or not. If they are, concurrent requests to same server will reuse the connection, **if same curl handle is reused**. see CURLOPT_FORBID_REUSE.Infest
S
13

I have a similar scenario where I post data to a server. It is chunked into requests of ~100 lines, so it produces a lot of requests. In a benchmark-run I compared two approaches for 12.614 Lines (127 requests needed) plus authentication and another housekeeping request (129 requests total).

The requests go over a network to a server in the same country, not on-site. They are secured by TLS 1.2 (the handshake will also take its toll, but given that HTTPS is becoming more and more a default choice, this might even make it more similar to your scenario).

With cURL reuse: one $curlHandle that is curl_init()'ed once, and then only modified with CURLOPT_URL and CURLOPT_POSTFIELDS

Run  1: ~42.92s
Run  3: ~41.52s
Run  4: ~53.17s
Run  5: ~53.93s
Run  6: ~55.51s
Run 11: ~53.59s
Run 12: ~53.76s
Avg: 50,63s / Std.Dev: 5,8s
TCP-Conversations / SSL Handshakes: 5 (Wireshark)

Without cURL reuse: one curl_init per request

Run  2: ~57.67s
Run  7: ~62.13s
Run  8: ~71.59s
Run  9: ~70.70s
Run 10: ~59.12s
Avg: 64,24s / Std. Dev: 6,5s
TCP-Conversations / SSL Handshakes: 129 (Wireshark)

It isn't the largest of datasets, but one can say that all of the "reused" runs are faster than all of the "init" runs. The average times show a difference of almost 14 seconds.

Sannyasi answered 13/9, 2015 at 22:6 Comment(1)
Very interesting.Ultramicrochemistry
A
2

It depends how many requests you will be making - the overhead for closing & reopening each is negligable, but when doing a thousand? Could be a few seconds or more.

I believe curl_multi_init would be the fastest method.

The whole thing depends on how many requests you need to do.

Advert answered 24/9, 2010 at 12:24 Comment(1)
I cannot use curl_multi_init because my curl requests need to be synchronous. I will have a hundred of request each time.Civics
D
2

Although this question is answered correctly, I would like to add a WARNING to NOT reuse the curl for POST or PUT requests, for the resetting is not always done fully.

I just had the following issue which resulted in corrupt data in my database. :-(

Due to some corrupted ascii codes in some records, the request-post remained empty and my script did not check that :-( (I will fix this of course) The curl seemed to have the request-post from the previous record and just passed that on. No error was returned.

This would not have happened if the curl was initialized for each request. In that case there would have not been any pre-loaded data available, therefore the server would have responded with an empty-error.

So my advice, better safe than fast: always use a new curl instance, except for getting external data.

UPDATE: I just found out that I did not use the php-function curl_reset(). According to the manual that would reset everything. For now I prefer to use curl_close() and curl_init() ;-)

I hope I explained it well enough, please ask if it is not clear! Greetz

Diane answered 26/4, 2021 at 12:14 Comment(0)
R
0

check this out too


try {
    $pool = new HttpRequestPool(
        new HttpRequest($q1),
        new HttpRequest($qn)
    );
    $pool->send();

    foreach($pool as $request) {

      $out[] = $request->getResponseBody();

    }
} catch (HttpException $e) {
    echo $e;
}


Reconnoiter answered 24/9, 2010 at 12:28 Comment(2)
I don't see the point of your answer in relation with my question... Could you be more precise?Civics
well, it's a different approach to the problem. if you need to have tons of curl GET requests, you can use the HttpRequestPool of php which has been designed exactly for this purpose: pecl.php.net/package/pecl_httpReconnoiter

© 2022 - 2024 — McMap. All rights reserved.