How can one check to see if a remote file exists using PHP?
Asked Answered
V

24

99

The best I could find, an if fclose fopen type thing, makes the page load really slowly.

Basically what I'm trying to do is the following: I have a list of websites, and I want to display their favicons next to them. However, if a site doesn't have one, I'd like to replace it with another image rather than display a broken image.

Vidovic answered 11/6, 2009 at 15:52 Comment(4)
I think you can use CURL and check its return codes. But if it's the speed that is a problem, just do it offline and cache.Singles
Yes, but I would still recommend using an offline script (run from cron) that parses the list of websites, checks if they've got favicons and cache that data for the frontend. If you don't/can't use cron, at least cache the results for every new URL you check.Singles
For replacing a broken image with a placeholder image in browser, kindly consider a client-side solution using onerror of image e.g. a solution using jQueryLatticework
Possible duplicate of PHP: How to check if image file exists?Turnpike
B
151

You can instruct curl to use the HTTP HEAD method via CURLOPT_NOBODY.

More or less

$ch = curl_init("http://www.example.com/favicon.ico");

curl_setopt($ch, CURLOPT_NOBODY, true);
curl_exec($ch);
$retcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
// $retcode >= 400 -> not found, $retcode = 200, found.
curl_close($ch);

Anyway, you only save the cost of the HTTP transfer, not the TCP connection establishment and closing. And being favicons small, you might not see much improvement.

Caching the result locally seems a good idea if it turns out to be too slow. HEAD checks the time of the file, and returns it in the headers. You can do like browsers and get the CURLINFO_FILETIME of the icon. In your cache you can store the URL => [ favicon, timestamp ]. You can then compare the timestamp and reload the favicon.

Bedder answered 11/6, 2009 at 16:8 Comment(4)
just a note: retcode errors on all 400 codes so the validation would be >= not just >Cant
Some sites block access if you don't provide a user agent string, so I suggest following this guide to add CURLOPT_USERAGENT in addition to CURLOPT_NOBODY: davidwalsh.name/set-user-agent-php-curl-spoofComponent
@Lyth 3XX retcodes aren't an error, but a redirection. Those should be either handled manually or using CURLOPT_FOLLOWLOCATION.Bedder
Use curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); as well to make sure same code works for URL's starting with HTTPS!Commentator
F
67

As Pies say you can use cURL. You can get cURL to only give you the headers, and not the body, which might make it faster. A bad domain could always take a while because you will be waiting for the request to time-out; you could probably change the timeout length using cURL.

Here is example:

function remoteFileExists($url) {
    $curl = curl_init($url);

    //don't fetch the actual page, you only want to check the connection is ok
    curl_setopt($curl, CURLOPT_NOBODY, true);

    //do request
    $result = curl_exec($curl);

    $ret = false;

    //if request did not fail
    if ($result !== false) {
        //if request was ok, check response code
        $statusCode = curl_getinfo($curl, CURLINFO_HTTP_CODE);  

        if ($statusCode == 200) {
            $ret = true;   
        }
    }

    curl_close($curl);

    return $ret;
}

$exists = remoteFileExists('http://stackoverflow.com/favicon.ico');
if ($exists) {
    echo 'file exists';
} else {
    echo 'file does not exist';   
}
Ferric answered 11/6, 2009 at 16:12 Comment(1)
remoteFileExists('stackoverflow.com/') this will also returns true, but its just a link. This function not checking is the link content type are file.Reichsmark
C
40

CoolGoose's solution is good but this is faster for large files (as it only tries to read 1 byte):

if (false === file_get_contents("http://example.com/path/to/image",0,null,0,1)) {
    $image = $default_image;
}
Christa answered 22/1, 2010 at 23:0 Comment(4)
+1. Is there what are the drawbacks for this solution against the CURL one?Idette
you can just use fopen - if the request return code is 404, fopen returns false.Glossy
this is really slow and did not work for me (meaning it still displayed a broken image if the file path was not correct)Figge
This approach doesnt work if the server makes a redirection whenever an image or file doesnt exist. This happens when a site uses mod_rewrite or some sort of other "rules" how requests should be handled.Rosen
C
31

This is not an answer to your original question, but a better way of doing what you're trying to do:

Instead of actually trying to get the site's favicon directly (which is a royal pain given it could be /favicon.png, /favicon.ico, /favicon.gif, or even /path/to/favicon.png), use google:

<img src="http://www.google.com/s2/favicons?domain=[domain]">

Done.

Chaechaeronea answered 16/5, 2010 at 9:10 Comment(1)
The syntax make a bit confusion. So here one example: <img src="google.com/s2/favicons?domain=stackoverflow.com">Upon
B
21

A complete function of the most voted answer:

function remote_file_exists($url)
{
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_NOBODY, 1);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); # handles 301/2 redirects
    curl_exec($ch);
    $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);
    if( $httpCode == 200 ){return true;}
}

You can use it like this:

if(remote_file_exists($url))
{
    //file exists, do something
}
Bendicty answered 19/5, 2016 at 16:31 Comment(4)
Oh! I've been away for the last couple of days but the beginning of the month was almost 24/7. Thank you for letting me know!Bendicty
This doesnt work if server doesnt respond any HTTP code (or cUrl doesnt catch it). Which is hapenning to me quite often. Eg. in case of images.Plow
what if url is redirected to another URL or https version ? In that case this curl code won't be able to do the job. the best way is to get header information and search for case-insensitive string "200 ok".Marucci
@Infoconic You can add curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);. I've updated the answer to handle 302 redirects .Bendicty
B
20

If you are dealing with images, use getimagesize. Unlike file_exists, this built-in function supports remote files. It will return an array that contains the image information (width, height, type..etc). All you have to do is to check the first element in the array (the width). use print_r to output the content of the array

$imageArray = getimagesize("http://www.example.com/image.jpg");
if($imageArray[0])
{
    echo "it's an image and here is the image's info<br>";
    print_r($imageArray);
}
else
{
    echo "invalid image";
}
Bracci answered 4/5, 2012 at 9:22 Comment(3)
Results in 404 warning when the remote resource is not available. For the time being, I handled it by suppressing error using @ in front of getimagesize, but feeling guilty for this hack.Latticework
In my case this was the best approach, because I get redirected whenever an image/file doesnt exist. I second that the suppressing errors with @ is a no go but in this case it was neccessary.Rosen
I figured out that we could also use exif_imagetype, and it's much faster https://mcmap.net/q/112390/-how-can-one-check-to-see-if-a-remote-file-exists-using-phpCentromere
C
9

PHP's inbuilt functions may not work for checking URL if allow_url_fopen setting is set to off for security reasons. Curl is a better option as we would not need to change our code at later stage. Below is the code I used to verify a valid URL:

$url = str_replace(' ', '%20', $url);
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); 
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_exec($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);  
curl_close($ch);
if($httpcode>=200 && $httpcode<300){  return true; } else { return false; } 

Kindly note the CURLOPT_SSL_VERIFYPEER option which also verify the URL's starting with HTTPS.

Commentator answered 23/4, 2014 at 10:2 Comment(0)
B
8
if (false === file_get_contents("http://example.com/path/to/image")) {
    $image = $default_image;
}

Should work ;)

Batsheva answered 11/6, 2009 at 16:2 Comment(1)
add @ before functionBirthday
G
8

This can be done by obtaining the HTTP Status code (404 = not found) which is possible with file_get_contentsDocs making use of context options. The following code takes redirects into account and will return the status code of the final destination (Demo):

$url = 'http://example.com/';
$code = FALSE;

$options['http'] = array(
    'method' => "HEAD",
    'ignore_errors' => 1
);

$body = file_get_contents($url, NULL, stream_context_create($options));

foreach($http_response_header as $header)
    sscanf($header, 'HTTP/%*d.%*d %d', $code);

echo "Status code: $code";

If you don't want to follow redirects, you can do it similar (Demo):

$url = 'http://example.com/';
$code = FALSE;

$options['http'] = array(
    'method' => "HEAD",
    'ignore_errors' => 1,
    'max_redirects' => 0
);

$body = file_get_contents($url, NULL, stream_context_create($options));

sscanf($http_response_header[0], 'HTTP/%*d.%*d %d', $code);

echo "Status code: $code";

Some of the functions, options and variables in use are explained with more detail on a blog post I've written: HEAD first with PHP Streams.

Gelsenkirchen answered 17/9, 2011 at 16:43 Comment(6)
Related: PHP: get_headers set temporary stream_contextGelsenkirchen
Related: What is the best way to check if a URL exists in PHP? (Dec 14 '10)Gelsenkirchen
For more on PHP's $http_response_header see php.net/manual/en/reserved.variables.httpresponseheader.php.Already
The second variant worked for me and compared to the default file_get_contents call (no custom stream_context) it was 50% faster, i.e. from 3,4s to 1,7s for a request.Rosen
@ErikČerpnjak: If there is "no custom" stream_context, it's the default one. You can get the options from the default context and take a look how them vary from your custom context. This should give you some insight why timings differ. - php.net/stream-context-get-default and php.net/stream-context-get-optionsGelsenkirchen
@hakre: Exactly my point. The default stream_context does get the whole page if I am not mistaking, but in your example only the head is fetched and I think that is where the difference lies. What I wanted to say is that the second option did the job for me as I dont want redirects to happen and is quicker than the default stream_context options set.Rosen
C
8

To check for the existence of images, exif_imagetype should be preferred over getimagesize, as it is much faster.

To suppress the E_NOTICE, just prepend the error control operator (@).

if (@exif_imagetype($filename)) {
  // Image exist
}

As a bonus, with the returned value (IMAGETYPE_XXX) from exif_imagetype we could also get the mime-type or file-extension with image_type_to_mime_type / image_type_to_extension.

Centromere answered 10/7, 2016 at 18:47 Comment(0)
J
4

A radical solution would be to display the favicons as background images in a div above your default icon. That way, all overhead would be placed on the client while still not displaying broken images (missing background images are ignored in all browsers AFAIK).

Juju answered 11/6, 2009 at 16:17 Comment(1)
+1 if you're not checking multiple locations for their favicon (favicon.ico, favicon.gif, favicon.png) this seems to be the best solutionCasandracasanova
S
4

You could use the following:

$file = 'http://mysite.co.za/images/favicon.ico';
$file_exists = (@fopen($file, "r")) ? true : false;

Worked for me when trying to check if an image exists on the URL

Salmon answered 24/5, 2016 at 7:9 Comment(0)
T
3
function remote_file_exists($url){
   return(bool)preg_match('~HTTP/1\.\d\s+200\s+OK~', @current(get_headers($url)));
}  
$ff = "http://www.emeditor.com/pub/emed32_11.0.5.exe";
    if(remote_file_exists($ff)){
        echo "file exist!";
    }
    else{
        echo "file not exist!!!";
    }
Tourcoing answered 9/2, 2012 at 1:22 Comment(0)
S
3

This works for me to check if a remote file exist in PHP:

$url = 'https://cdn.sstatic.net/Sites/stackoverflow/img/favicon.ico';
    $header_response = get_headers($url, 1);

    if ( strpos( $header_response[0], "404" ) !== false ) {
        echo 'File does NOT exist';
        } else {
        echo 'File exists';
        }
Stroll answered 21/12, 2017 at 8:3 Comment(0)
K
2

You can use :

$url=getimagesize(“http://www.flickr.com/photos/27505599@N07/2564389539/”);

if(!is_array($url))
{
   $default_image =”…/directoryFolder/junal.jpg”;
}
Kennykeno answered 8/6, 2012 at 4:36 Comment(0)
A
2

If you're using the Laravel framework or guzzle package, there is also a much simpler way using the guzzle client, it also works when links are redirected:

$client = new \GuzzleHttp\Client(['allow_redirects' => ['track_redirects' => true]]);
try {
    $response = $client->request('GET', 'your/url');
    if ($response->getStatusCode() != 200) {
        // not exists
    }
} catch (\GuzzleHttp\Exception\GuzzleException $e) {
    // not exists
}

More in Document : https://docs.guzzlephp.org/en/latest/faq.html#how-can-i-track-redirected-requests

Abbatial answered 3/8, 2021 at 6:16 Comment(0)
L
1

You should issue HEAD requests, not GET one, because you don't need the URI contents at all. As Pies said above, you should check for status code (in 200-299 ranges, and you may optionally follow 3xx redirects).

The answers question contain a lot of code examples which may be helpful: PHP / Curl: HEAD Request takes a long time on some sites

Landan answered 11/6, 2009 at 16:10 Comment(0)
L
1

There's an even more sophisticated alternative. You can do the checking all client-side using a JQuery trick.

$('a[href^="http://"]').filter(function(){
     return this.hostname && this.hostname !== location.hostname;
}).each(function() {
    var link = jQuery(this);
    var faviconURL =
      link.attr('href').replace(/^(http:\/\/[^\/]+).*$/, '$1')+'/favicon.ico';
    var faviconIMG = jQuery('<img src="favicon.png" alt="" />')['appendTo'](link);
    var extImg = new Image();
    extImg.src = faviconURL;
    if (extImg.complete)
      faviconIMG.attr('src', faviconURL);
    else
      extImg.onload = function() { faviconIMG.attr('src', faviconURL); };
});

From http://snipplr.com/view/18782/add-a-favicon-near-external-links-with-jquery/ (the original blog is presently down)

Linders answered 11/9, 2009 at 20:25 Comment(0)
B
1

all the answers here that use get_headers() are doing a GET request. It's much faster/cheaper to just do a HEAD request.

To make sure that get_headers() does a HEAD request instead of a GET you should add this:

stream_context_set_default(
    array(
        'http' => array(
            'method' => 'HEAD'
        )
    )
);

so to check if a file exists, your code would look something like this:

stream_context_set_default(
    array(
        'http' => array(
            'method' => 'HEAD'
        )
    )
);
$headers = get_headers('http://website.com/dir/file.jpg', 1);
$file_found = stristr($headers[0], '200');

$file_found will return either false or true, obviously.

Belfry answered 28/2, 2015 at 13:1 Comment(0)
T
1

If the file is not hosted external you might translate the remote URL to an absolute Path on your webserver. That way you don't have to call CURL or file_get_contents, etc.

function remoteFileExists($url) {

    $root = realpath($_SERVER["DOCUMENT_ROOT"]);
    $urlParts = parse_url( $url );

    if ( !isset( $urlParts['path'] ) )
        return false;

    if ( is_file( $root . $urlParts['path'] ) )
        return true;
    else
        return false;

}

remoteFileExists( 'https://www.yourdomain.com/path/to/remote/image.png' );

Note: Your webserver must populate DOCUMENT_ROOT to use this function

Tayib answered 1/7, 2019 at 7:42 Comment(0)
L
0

Don't know if this one is any faster when the file does not exist remotely, is_file(), but you could give it a shot.

$favIcon = 'default FavIcon';
if(is_file($remotePath)) {
   $favIcon = file_get_contents($remotePath);
}
London answered 11/6, 2009 at 16:9 Comment(3)
From the docs: "As of PHP 5.0.0, this function can also be used with some URL wrappers. Refer to Supported Protocols and Wrappers to determine which wrappers support stat() family of functionality."London
Do you mean this could work if you register a stream wrapper? Edit your question to show a working example and I'll remove my downvote (and upvote you if I can). But for the moment, I tested is_file from the php cli with a remote file, and got false.Bride
no working example: var_dump(is_file('http://cdn.sstatic.net/stackoverflow/img/sprites.png')); bool(false)Bride
V
0

If you're using the Symfony framework, there is also a much simpler way using the HttpClientInterface:

private function remoteFileExists(string $url, HttpClientInterface $client): bool {
    $response = $client->request(
        'GET',
        $url //e.g. http://example.com/file.txt
    );

    return $response->getStatusCode() == 200;
}

The docs for the HttpClient are also very good and maybe worth looking into if you need a more specific approach: https://symfony.com/doc/current/http_client.html

Vascular answered 19/8, 2020 at 9:12 Comment(0)
B
-1

You can use the filesystem: use Symfony\Component\Filesystem\Filesystem; use Symfony\Component\Filesystem\Exception\IOExceptionInterface;

and check $fileSystem = new Filesystem(); if ($fileSystem->exists('path_to_file')==true) {...

Binate answered 19/2, 2019 at 16:13 Comment(0)
F
-1

Please check this URL. I believe it will help you. They provide two ways to overcome this with a bit of explanation.

Try this one.

// Remote file url
$remoteFile = 'https://www.example.com/files/project.zip';

// Initialize cURL
$ch = curl_init($remoteFile);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_exec($ch);
$responseCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);

// Check the response code
if($responseCode == 200){
    echo 'File exists';
}else{
    echo 'File not found';
}

or visit the URL

https://www.codexworld.com/how-to/check-if-remote-file-exists-url-php/#:~:text=The%20file_exists()%20function%20in,a%20remote%20server%20using%20PHP.

Fatty answered 25/1, 2022 at 8:24 Comment(1)
An almost identical curl based solution was provided in other answers, can you add details to why this answer differentiates from the others?Gherardo

© 2022 - 2024 — McMap. All rights reserved.