PHP cURL, read remote file and write contents to local file
Asked Answered
C

6

19

I want to connect to a remote file and writing the output from the remote file to a local file, this is my function:

function get_remote_file_to_cache()
{

    $the_site="http://facebook.com";

    $curl = curl_init();
    $fp = fopen("cache/temp_file.txt", "w");
    curl_setopt ($curl, CURLOPT_URL, $the_site);
    curl_setopt($curl, CURLOPT_FILE, $fp);

    curl_setopt($curl,  CURLOPT_RETURNTRANSFER, TRUE);

    curl_exec ($curl);

    $httpCode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
    if($httpCode == 404) {
        touch('cache/404_err.txt');
    }else
    {
        touch('cache/'.rand(0, 99999).'--all_good.txt');
    }

    curl_close ($curl);
}

It creates the two files in the "cache" directory, but the problem is it does not write the data into the "temp_file.txt", why is that?

Containment answered 1/11, 2011 at 13:55 Comment(1)
I dont think you can set CURLOPT_FILE and CURLOPT_RETURNTRANSFER in the same operation.Pseudonym
S
11

You need to explicitly write to the file using fwrite, passing it the file handle you created earlier:

if ( $httpCode == 404 ) {
    ...
} else {
    $contents = curl_exec($curl);
    fwrite($fp, $contents);
}

curl_close($curl);
fclose($fp);
Shockley answered 1/11, 2011 at 13:57 Comment(3)
You'll run into memory limitations with large files. Check the response by doublehelix, it's safer.Dropwort
@JonGauthier This does not resolve the issue where you have a memory limitation and you want to avoid loading the whole file to the memory, and you want to just dump it to a local file.Gustafson
Please go and vote for the only actual correct answer on this page: https://mcmap.net/q/628173/-php-curl-read-remote-file-and-write-contents-to-local-file In the code the OP posted, the cURL code is correct except that the RETURNTRANSFER option is after the FILE option. In that case cURL ignores the FILE option and the downloaded file is returned as the response. That is why all the other answers about using fwrite seem like working solutions, because they are starting from the failure of the FILE option and work with the file in the response (it's also why they have to deal with memory errors).Glove
F
28

Actually, using fwrite is partially true. In order to avoid memory overflow problems with large files (Exceeded maximum memory limit of PHP), you'll need to setup a callback function to write to the file.

NOTE: I would recommend creating a class specifically to handle file downloads and file handles etc. rather than EVER using a global variable, but for the purposes of this example, the following shows how to get things up and running.

so, do the following:

# setup a global file pointer
$GlobalFileHandle = null;

function saveRemoteFile($url, $filename) {
  global $GlobalFileHandle;

  set_time_limit(0);

  # Open the file for writing...
  $GlobalFileHandle = fopen($filename, 'w+');

  $ch = curl_init();
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_FILE, $GlobalFileHandle);
  curl_setopt($ch, CURLOPT_HEADER, 0);
  curl_setopt($ch, CURLOPT_USERAGENT, "MY+USER+AGENT"); //Make this valid if possible
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
  curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
  curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); # optional
  curl_setopt($ch, CURLOPT_TIMEOUT, -1); # optional: -1 = unlimited, 3600 = 1 hour
  curl_setopt($ch, CURLOPT_VERBOSE, false); # Set to true to see all the innards

  # Only if you need to bypass SSL certificate validation
  curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
  curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);

  # Assign a callback function to the CURL Write-Function
  curl_setopt($ch, CURLOPT_WRITEFUNCTION, 'curlWriteFile');

  # Exceute the download - note we DO NOT put the result into a variable!
  curl_exec($ch);

  # Close CURL
  curl_close($ch);

  # Close the file pointer
  fclose($GlobalFileHandle);
}

function curlWriteFile($cp, $data) {
  global $GlobalFileHandle;
  $len = fwrite($GlobalFileHandle, $data);
  return $len;
}

You can also create a progress callback to show how much / how fast you're downloading, however that's another example as it can be complicated when outputting to the CLI.

Essentially, this will take each block of data downloaded, and dump it to the file immediately, rather than downloading the ENTIRE file into memory first.

Much safer way of doing it! Of course, you must make sure the URL is correct (convert spaces to %20 etc.) and that the local file is writeable.

Cheers, James.

Forcier answered 18/6, 2014 at 5:24 Comment(3)
In modern PHP, can this be made more compact with: "curl_setopt($ch, CURLOPT_WRITEFUNCTION, function ($cp, $data) use ($fp) { return fwrite($fp, $data); });" (where "$GlobalFileHandle" becomes "$fp")? It seems to be working for me, but I want to check the behaviour is the same.Orthogonal
You don't need the callback when you specify CURLOPT_FILE. I just tried it. It writes directly into the file, without reading the whole content into memory first.Jeannettejeannie
Try pointing to this file. You'll see the hard memory limit reached. ipv4.download.thinkbroadband.com/1GB.zip Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 65015808 bytes) Keeping in mind that different environments will have different default memory limits per process. This also differs between Windows/Linux.Forcier
C
18

Let's try sending GET request to http://facebook.com:

$ curl -v http://facebook.com
* Rebuilt URL to: http://facebook.com/
* Hostname was NOT found in DNS cache
*   Trying 69.171.230.5...
* Connected to facebook.com (69.171.230.5) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.35.0
> Host: facebook.com
> Accept: */*
> 
< HTTP/1.1 302 Found
< Location: https://facebook.com/
< Vary: Accept-Encoding
< Content-Type: text/html
< Date: Thu, 03 Sep 2015 16:26:34 GMT
< Connection: keep-alive
< Content-Length: 0
< 
* Connection #0 to host facebook.com left intact

What happened? It appears that Facebook redirected us from http://facebook.com to secure https://facebook.com/. Note what is response body length:

Content-Length: 0

It means that zero bytes will be written to xxxx--all_good.txt. This is why the file stays empty.

Your solution is absolutelly correct:

$fp = fopen('file.txt', 'w');
curl_setopt($handle, CURLOPT_FILE, $fp);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);

All you need to do is change URL to https://facebook.com/.

Regarding other answers:

  • @JonGauthier: No, there is no need to use fwrite() after curl_exec()
  • @doublehelix: No, you don't need CURLOPT_WRITEFUNCTION for such a simple operation which is copying contents to file.
  • @ScottSaunders: touch() creates empty file if it doesn't exists. I think it was intention of OP.

Seriously, three answers and every single one is invalid?

Channel answered 19/2, 2015 at 13:17 Comment(3)
You are right, it's as simple as that. Just remember to create "file.txt" in advance, and set its permissions (e.g. 777).Dyewood
Do not set permissions to 777, its a security risk to give all permissions to everyone. Try not to use CURLOPT_RETURNTRANSFER with CURLOPT_FILE as Andre said. I was getting 302 return code and I tried CURLOPT_FOLLOWLOCATION only with CURLOPT_FILE and now no empty files, I got data written to file.Rivkarivkah
I had to remove curl_setopt($handle, CURLOPT_RETURNTRANSFER, true); for this to work.Tacket
S
11

You need to explicitly write to the file using fwrite, passing it the file handle you created earlier:

if ( $httpCode == 404 ) {
    ...
} else {
    $contents = curl_exec($curl);
    fwrite($fp, $contents);
}

curl_close($curl);
fclose($fp);
Shockley answered 1/11, 2011 at 13:57 Comment(3)
You'll run into memory limitations with large files. Check the response by doublehelix, it's safer.Dropwort
@JonGauthier This does not resolve the issue where you have a memory limitation and you want to avoid loading the whole file to the memory, and you want to just dump it to a local file.Gustafson
Please go and vote for the only actual correct answer on this page: https://mcmap.net/q/628173/-php-curl-read-remote-file-and-write-contents-to-local-file In the code the OP posted, the cURL code is correct except that the RETURNTRANSFER option is after the FILE option. In that case cURL ignores the FILE option and the downloaded file is returned as the response. That is why all the other answers about using fwrite seem like working solutions, because they are starting from the failure of the FILE option and work with the file in the response (it's also why they have to deal with memory errors).Glove
P
5

In your question you have

    curl_setopt($curl, CURLOPT_FILE, $fp);

    curl_setopt($curl,  CURLOPT_RETURNTRANSFER, TRUE);

but from PHP's curl_setopt documentation notes...

It appears that setting CURLOPT_FILE before setting CURLOPT_RETURNTRANSFER doesn't work, presumably because CURLOPT_FILE depends on CURLOPT_RETURNTRANSFER being set.

So do this:

<?php
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FILE, $fp);
?>

not this:

<?php
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
?>

...stating "CURLOPT_FILE depends on CURLOPT_RETURNTRANSFER being set".

Reference: https://www.php.net/manual/en/function.curl-setopt.php#99082

Palish answered 28/6, 2020 at 2:55 Comment(1)
Note that this is true even when using curl_setopt_array - you must list CURLOPT_RETURNTRANSFER before CURLOPT_FILE in the array.Glove
H
4

To avoid memory leak problems:

I was confronted with this problem as well. It's really stupid to say but the solution is to set CURLOPT_RETURNTRANSFER before CURLOPT_FILE!

it seems CURLOPT_FILE depends on CURLOPT_RETURNTRANSFER.

$curl = curl_init();
$fp = fopen("cache/temp_file.txt", "w+");
curl_setopt($curl,  CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($curl, CURLOPT_FILE, $fp);
curl_setopt($curl, CURLOPT_URL, $url);
curl_exec ($curl);
curl_close($curl);
fclose($fp);
Halfhearted answered 10/6, 2019 at 16:27 Comment(1)
You don't need CURLOPT_RETURNTRANSFER at all. CURLOPT_RETURNTRANSFER sets the return value to a single string; CURLOPT_FILE changes that behaviour and instead of storing the return to a single string, it prints out to file as it goes. This is why it works to have CURLOPT_FILE after CURLOPT_RETURNTRANSFER... but in fact you don't need CURLOPT_RETURNTRANSFER at all.Execrate
M
2

The touch() function doesn't do anything to the contents of the file. It just updates the modification time. Look at the file_put_contents() function.

Madeleinemadelena answered 1/11, 2011 at 13:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.