What is the best way to check if a URL exists in PHP?
Asked Answered
C

5

12

What is the best way to see a URL exists and the response is not a 404 ?

Cosmos answered 14/12, 2010 at 8:30 Comment(0)
C
25

You can use get_headers($url)

Example 2 from Manual:

<?php
// By default get_headers uses a GET request to fetch the headers. If you
// want to send a HEAD request instead, you can do so using a stream context:
stream_context_set_default(
    array(
        'http' => array(
            'method' => 'HEAD'
        )
    )
);
print_r(get_headers('http://example.com'));

// gives
Array
(
    [0] => HTTP/1.1 200 OK 
    [Date] => Sat, 29 May 2004 12:28:14 GMT
    [Server] => Apache/1.3.27 (Unix)  (Red-Hat/Linux)
    [Last-Modified] => Wed, 08 Jan 2003 23:11:55 GMT
    [ETag] => "3f80f-1b6-3e1cb03b"
    [Accept-Ranges] => bytes
    [Content-Length] => 438
    [Connection] => close
    [Content-Type] => text/html
)

The first array element will contain the HTTP Response Status code. You have to parse that.

Note that the get_headers function in the example will issue an HTTP HEAD request, which means it will not fetch the body of the URL. This is more efficient than using a GET request which will also return the body.

Also note that by setting a default context, any subsequent calls using an http stream context, will now issue HEAD requests. So make sure to reset the default context to use GET again when done.

PHP also provides the variable $http_response_header

The $http_response_header array is similar to the get_headers() function. When using the HTTP wrapper, $http_response_header will be populated with the HTTP response headers. $http_response_header will be created in the local scope.

If you want to download the content of a remote resource, you don't want to do two requests (one to see if the resource exists and one to fetch it), but just one. In that case, use something like file_get_contents to fetch the content and then inspect the headers from the variable.

Colner answered 14/12, 2010 at 8:32 Comment(6)
Related: How can one check to see if a remote file exists using PHP? (via: HEAD first with PHP Streams)Uzzia
Add a @ character at the beginning to suppress the php warning for when the url being tested does not exist. That way you can throw your own custom exception.Inexperienced
@FranciscoLuz I consider using error suppression a no-go and favor using proper error handlers.Colner
Hi @Colner you marked another question a duplicate of this one, which I see that you have the top answer for, but it is not a duplicate. Would you mind undoing that? Thank you for your answer, it is very helpful.. but I needed to know both the answer to this question AND to the other question and they are not the same.Soracco
@Soracco Care to link me to the question in question?Colner
@Gordon: you rock.Arianism
R
1
public function isLink($url)
{
    $result = false;
    if (!filter_var($url, FILTER_VALIDATE_URL) === false) {
        $getHeaders = get_headers($url);
        $result = strpos($getHeaders[0], '200') !== false;
    }
    return $result;
}
Rosa answered 22/6, 2015 at 13:57 Comment(0)
P
0

@Gordon - Here is a more complete library routine based on your answer. It includes some preliminary checking for URL validity, some more error handling, and parsing of the returned headers. It also follows any redirect chains for a reasonable number of steps.

class cLib {
    static $lasterror = 'No error set yet';
    /**
     * @brief See with a URL is valid - i.e. a page can be successfully retrieved from it without error
     * @param string $url The URL to be checked
     * @param int $nredirects The number of redirects check so far
     * @return boolean True if OK, false if the URL cannot be fetched
     */
    static function checkUrl($url, $nredirects = 0) {
        // First, see if the URL is sensible
        if (filter_var($url, FILTER_VALIDATE_URL) === false) {
            self::$lasterror = sprintf('URL "%s" did not validate', $url);
            return false;
        }
        // Now try to fetch it
        $headers = @get_headers($url);
        if ($headers == false) {
            $error = error_get_last();
            self::$lasterror = sprintf('URL "%s" could not be read: %s', $url, $error['message']);
            return false;
        }
        $status = $headers[0];
        $rbits = explode(' ', $status);
        if (count($rbits) < 2) {
            self::$lasterror = sprintf('Cannot parse status "%s" from URL "%s"', $status, $url);
            return false;
        }
        if (in_array($rbits[1], array(301, 302, 304, 307, 308))) {
            // This URL has been redirected. Follow the redirection chain
            foreach ($headers as $header) {
                if (cLib::startsWith($header, 'Location:')) {
                    if (++$nredirects > 10) {
                        self::$lasterror = sprintf('URL "%s" was redirected over 10 times: abandoned check', $url);
                        return false;
                    }
                    return self::checkUrl(trim(substr($header, strlen('Location:'))), $nredirects);
                }
            }
            self::$lasterror = sprintf('URL "%s" was redirected but location could not be identified', $url);
            return false;
        } 
        if ($rbits[1] != 200) {
            self::$lasterror = sprintf('URL "%s" returned status "%s"', $url, $status);
            return false;
        }
        return true;
    }
}

With apologies to @FranciscoLuz - if you're expecting errors based on user input, the "@ and error_get_last" method seems perfectly sensible to me - I don't see that there's anything more proper about using set_error_handler.

BTW, not sure if I should have done this as an edit to @Gordon's answer rather than as a separate answer. Can someone advise?

Politesse answered 27/3, 2014 at 13:1 Comment(0)
A
0

I'm using this function as it also validates and returns the protocol of the URL if not found.

$theUrl = 'google.com';

function isValidURL($url) { 
    $urlRegex = '@(http(s)?)?(://)?(([a-zA-Z])([-\w]+\.)+([^\s\.]+[^\s]*)+[^,.\s])@';
    if(preg_match($urlRegex, $url)){
        return preg_replace($urlRegex, "http$2://$4", $url);
    } else {
        return false;
    }
}

var_dump(isValidURL($theUrl));
Arcane answered 20/6, 2019 at 10:22 Comment(0)
W
0

A way I have developed to identify whether a URL is indeed existing or not is the following scrypt. It can be improved by more finely analyzing error returns. There I performed a simple error return by estimating that only URLs with "could not resolve host" are wrong.

function URL_EXIST($pUrl)
{
    $etat = true;
    $ch = curl_init($pUrl);
    curl_setopt($ch, CURLOPT_FAILONERROR, true);
    curl_setopt($ch, CURLOPT_NOBODY, true);
    if (curl_exec($ch) === false)
    {
        $mes = strtolower(curl_error($ch));
        $cdt_wrong = preg_match('#could not resolve host#',$mes);
        $cdt_wrong |= preg_match('#404 not found#',$mes);
        if($cdt_wrong==true)
        {
            $etat = false;
        }
    }
    curl_close($ch);

    return $etat;
}

with some exemples, it is working good

Walkin answered 14/1, 2022 at 17:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.