How can I use cURL to open multiple URLs simultaneously with PHP?
Asked Answered
T

1

10

Here is my current code:

    $SQL = mysql_query("SELECT url FROM urls") or die(mysql_error()); //Query the urls table
while($resultSet = mysql_fetch_array($SQL)){ //Put all the urls into one variable

                // Now for some cURL to run it.
            $ch = curl_init($resultSet['url']); //load the urls
            curl_setopt($ch, CURLOPT_TIMEOUT, 2); //No need to wait for it to load. Execute it and go.
            curl_exec($ch); //Execute
            curl_close($ch); //Close it off 
        } //While loop

I'm relatively new to cURL. By relatively new, I mean this is my first time using cURL. Currently it loads one for two seconds, then loads the next one for 2 seconds, then the next. however, I want to make it load ALL of them at the same time. I'm sure its possible, I'm just unsure as to how. If someone could point me in the right direction, I'd appreciate it.

Thulium answered 22/4, 2010 at 16:36 Comment(1)
Do you need to do anything with the results curl loads?Prandial
P
8

You set up each cURL handle in the same way, then add them to a curl_multi_ handle. The functions to look at are the curl_multi_* functions documented here. In my experience, though, there were issues with trying to load too many URLs at once (though I can't find my notes on it at the moment), so the last time I used curl_mutli_, I set it up to do batches of 5 URLs at a time.

edit: Here is a reduced version of the code I have using curl_multi_:

edit: Slightly rewritten and lots of added comments, which hopefully will help.

// -- create all the individual cURL handles and set their options
$curl_handles = array();
foreach ($urls as $url) {
    $curl_handles[$url] = curl_init();
    curl_setopt($curl_handles[$url], CURLOPT_URL, $url);
    // set other curl options here
}

// -- start going through the cURL handles and running them
$curl_multi_handle = curl_multi_init();

$i = 0; // count where we are in the list so we can break up the runs into smaller blocks
$block = array(); // to accumulate the curl_handles for each group we'll run simultaneously

foreach ($curl_handles as $a_curl_handle) {
    $i++; // increment the position-counter

    // add the handle to the curl_multi_handle and to our tracking "block"
    curl_multi_add_handle($curl_multi_handle, $a_curl_handle);
    $block[] = $a_curl_handle;

    // -- check to see if we've got a "full block" to run or if we're at the end of out list of handles
    if (($i % BLOCK_SIZE == 0) or ($i == count($curl_handles))) {
        // -- run the block

        $running = NULL;
        do {
            // track the previous loop's number of handles still running so we can tell if it changes
            $running_before = $running;

            // run the block or check on the running block and get the number of sites still running in $running
            curl_multi_exec($curl_multi_handle, $running);

            // if the number of sites still running changed, print out a message with the number of sites that are still running.
            if ($running != $running_before) {
                echo("Waiting for $running sites to finish...\n");
            }
        } while ($running > 0);

        // -- once the number still running is 0, curl_multi_ is done, so check the results
        foreach ($block as $handle) {
            // HTTP response code
            $code = curl_getinfo($handle,  CURLINFO_HTTP_CODE);

            // cURL error number
            $curl_errno = curl_errno($handle);

            // cURL error message
            $curl_error = curl_error($handle);

            // output if there was an error
            if ($curl_error) {
                echo("    *** cURL error: ($curl_errno) $curl_error\n");
            }

            // remove the (used) handle from the curl_multi_handle
            curl_multi_remove_handle($curl_multi_handle, $handle);
        }

        // reset the block to empty, since we've run its curl_handles
        $block = array();
    }
}

// close the curl_multi_handle once we're done
curl_multi_close($curl_multi_handle);

Given that you don't need anything back from the URLs, you probably don't need a lot of what's there, but this is how I chunked the requests into blocks of BLOCK_SIZE, waited for each block to run before moving on, and caught errors from cURL.

Paramour answered 22/4, 2010 at 16:41 Comment(7)
Well all I'm going to have it do is load each url (and the urls it will be loading are blank pages, accessing the urls only start a script and make it run for a preset amount of time) and not save or output any data. Do you think it will cause any problems in this case?Thulium
My guess is that it won't be a problem in that case, but I don't know for sure--if it fails to run or errors out when you try to load all of them at once, you could put a counter in your while loop and whenever counter % batch_size == 0 inside the loop, run the batch and clear it.Paramour
Woah. hate to bother you with this, but could you please comment some stuff in that code so I can see what everything does exactly?Thulium
No problem (and my fault for not documenting the code when I originally wrote it). If any of it is still unclear, please let me know (and tell me which parts).Paramour
Very nice, thank you. Just one more question if you don't mind helping some more, how could I implement my current while loop into this?Thulium
Your current while loop should be an almost-direct replacement for the first foreach loop--that's where each curl_handle is created and has its options set. (My code was used on an array of URLs $urls generated from an XML file, but pulling it from a database and using the while loop you have ought to work pretty much the same way.)Paramour
Thank you, you've been a lot of help.Thulium

© 2022 - 2024 — McMap. All rights reserved.