Remote file size without downloading file
Asked Answered
M

15

97

Is there a way to get the size of a remote file http://my_url/my_file.txt without downloading the file?

Mauve answered 8/4, 2010 at 18:55 Comment(1)
Leaving this as a comment, not an answer as I realize the questions is PHP not WordPress... but if you happen to be working in WP, try this: wp_get_http_headers( $file )->offsetGet( 'content-length' )Evacuate
I
111

Found something about this here:

Here's the best way (that I've found) to get the size of a remote file. Note that HEAD requests don't get the actual body of the request, they just retrieve the headers. So making a HEAD request to a resource that is 100MB will take the same amount of time as a HEAD request to a resource that is 1KB.

<?php
/**
 * Returns the size of a file without downloading it, or -1 if the file
 * size could not be determined.
 *
 * @param $url - The location of the remote file to download. Cannot
 * be null or empty.
 *
 * @return The size of the file referenced by $url, or -1 if the size
 * could not be determined.
 */
function curl_get_file_size( $url ) {
  // Assume failure.
  $result = -1;

  $curl = curl_init( $url );

  // Issue a HEAD request and follow any redirects.
  curl_setopt( $curl, CURLOPT_NOBODY, true );
  curl_setopt( $curl, CURLOPT_HEADER, true );
  curl_setopt( $curl, CURLOPT_RETURNTRANSFER, true );
  curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, true );
  curl_setopt( $curl, CURLOPT_USERAGENT, get_user_agent_string() );

  $data = curl_exec( $curl );
  curl_close( $curl );

  if( $data ) {
    $content_length = "unknown";
    $status = "unknown";

    if( preg_match( "/^HTTP\/1\.[01] (\d\d\d)/", $data, $matches ) ) {
      $status = (int)$matches[1];
    }

    if( preg_match( "/Content-Length: (\d+)/", $data, $matches ) ) {
      $content_length = (int)$matches[1];
    }

    // http://en.wikipedia.org/wiki/List_of_HTTP_status_codes
    if( $status == 200 || ($status > 300 && $status <= 308) ) {
      $result = $content_length;
    }
  }

  return $result;
}
?>

Usage:

$file_size = curl_get_file_size( "https://mcmap.net/q/217097/-remote-file-size-without-downloading-file" );
Ib answered 8/4, 2010 at 18:58 Comment(14)
i was reading that earlier, wasn't sure if content-length meant the length or file sizeMauve
well if the request returns a file, the request size is the file sizeRetributive
But keep in mind that there can be responses without Content-length.Untried
Wouldn't it be better to use curl_getinfo, like @macki suggests?Intransigent
i allways get 'unknown' as response?!Nadene
@Svish, yes, because that approach actually works. The approach presented here fails on redirected URLs, since it grabs the first Content-Length which is not (necessarily?) the final Content-Length. In my experience.Catabolism
This did not work for me as get_user_agent_string() was not defined. Removing the entire line made the whole thing work.Goliath
this fails when tested with: http://www.dailymotion.com/rss/user/dialhainaut/ see SO: #36761877Heine
This answer omits CURL_TIMEOUT. It is possible for the target server to exploit this. As a sidenote to anyone using the above function, if $url is provided by user input, make sure that it is a valid URL before passing it to cURL.Lilongwe
if the server does not support HEAD ,it will return 405Glossary
Like @Goliath I got an error message for get_user_agent_string() which was likely a local function left out of code. It works when line commented out, but maybe instead of function use$_SERVER['HTTP_USER_AGENT']Conversationalist
After disabling selinux this worked, but only wthj image and PDF. It doesn't give a result for mp4.Spital
after $data = curl_exec( $curl ); use $info = curl_getinfo($curl); and gather $result = $info['download_content_length']; simple !Jospeh
Transfer-Encoding: chunked does not require the presence of Content-Length so in this scenario, this will failAthelstan
M
69

Try this code

function retrieve_remote_file_size($url){
     $ch = curl_init($url);

     curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
     curl_setopt($ch, CURLOPT_HEADER, TRUE);
     curl_setopt($ch, CURLOPT_NOBODY, TRUE);

     $data = curl_exec($ch);
     $size = curl_getinfo($ch, CURLINFO_CONTENT_LENGTH_DOWNLOAD);

     curl_close($ch);
     return $size;
}
Mutism answered 16/11, 2011 at 22:9 Comment(8)
If this doesn’t work for you, you might want to add curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);.Stoneman
Doesn't work for me for an image. I do have CURLOPT_FOLLOWLOCATION set to true.Coray
@Nadene add this parameter. curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);Aaron
@Davinder Kumar: thanks so much, adding your code make the above code works.Lollard
Your Welcome! @TrungLeNguyenNhatAaron
What does ch stand for? Channel?Selfopinionated
Works like a charm for me, to retrieve image and video sizes from the Google Photos API, using the file baseUrl as the parameter for this function. (Function shared to answer a related question here.)Procreant
@Selfopinionated - I'd assume ch is short for "curl handle" but there's no reason you couldn't use a different variable name in the 7 places it is used in this function.Procreant
M
32

As mentioned a couple of times, the way to go is to retrieve the information from the response header's Content-Length field.

However, you should note that

  • the server you're probing not necessarily implements the HEAD method(!)
  • there's absolutely no need to manually craft a HEAD request (which, again, might not even be supported) using fopen or alike or even to invoke the curl library, when PHP has get_headers() (remember: K.I.S.S.)

Use of get_headers() follows the K.I.S.S. principle and works even if the server you're probing does not support the HEAD request.

So, here's my version (gimmick: returns human-readable formatted size ;-)):

Gist: https://gist.github.com/eyecatchup/f26300ffd7e50a92bc4d (curl and get_headers version)
get_headers()-Version:

<?php     
/**
 *  Get the file size of any remote resource (using get_headers()), 
 *  either in bytes or - default - as human-readable formatted string.
 *
 *  @author  Stephan Schmitz <[email protected]>
 *  @license MIT <http://eyecatchup.mit-license.org/>
 *  @url     <https://gist.github.com/eyecatchup/f26300ffd7e50a92bc4d>
 *
 *  @param   string   $url          Takes the remote object's URL.
 *  @param   boolean  $formatSize   Whether to return size in bytes or formatted.
 *  @param   boolean  $useHead      Whether to use HEAD requests. If false, uses GET.
 *  @return  string                 Returns human-readable formatted size
 *                                  or size in bytes (default: formatted).
 */
function getRemoteFilesize($url, $formatSize = true, $useHead = true)
{
    if (false !== $useHead) {
        stream_context_set_default(array('http' => array('method' => 'HEAD')));
    }
    $head = array_change_key_case(get_headers($url, 1));
    // content-length of download (in bytes), read from Content-Length: field
    $clen = isset($head['content-length']) ? $head['content-length'] : 0;

    // cannot retrieve file size, return "-1"
    if (!$clen) {
        return -1;
    }

    if (!$formatSize) {
        return $clen; // return size in bytes
    }

    $size = $clen;
    switch ($clen) {
        case $clen < 1024:
            $size = $clen .' B'; break;
        case $clen < 1048576:
            $size = round($clen / 1024, 2) .' KiB'; break;
        case $clen < 1073741824:
            $size = round($clen / 1048576, 2) . ' MiB'; break;
        case $clen < 1099511627776:
            $size = round($clen / 1073741824, 2) . ' GiB'; break;
    }

    return $size; // return formatted size
}

Usage:

$url = 'http://download.tuxfamily.org/notepadplus/6.6.9/npp.6.6.9.Installer.exe';
echo getRemoteFilesize($url); // echoes "7.51 MiB"

Additional note: The Content-Length header is optional. Thus, as a general solution it isn't bullet proof!


Mashhad answered 17/9, 2014 at 21:2 Comment(3)
This should be the accepted answer. True, Content-Length is optional, but it's the only way to get the file size without downloading it - and get_headers is the best way to get content-length.Commute
Be aware that this will change the preference for the request method to HEAD inside all subsequent HTTP requests for this PHP process. Use stream_context_create to create a separate context to use for the call to get_headers (7.1+).Squashy
just adding, that if your URL or DOCUMENT file name has spaces in it, this will return a -1Crocidolite
C
22

Php function get_headers() works for me to check the content-length as

$headers = get_headers('http://example.com/image.jpg', 1);
$filesize = $headers['Content-Length'];

For More Detail : PHP Function get_headers()

Chokeberry answered 20/4, 2017 at 12:59 Comment(1)
For me (with nginx) the header was Content-LengthRenascence
L
15

Sure. Make a headers-only request and look for the Content-Length header.

Laritalariviere answered 8/4, 2010 at 18:56 Comment(0)
L
9

one line best solution :

echo array_change_key_case(get_headers("http://.../file.txt",1))['content-length'];

php is too delicius

function urlsize($url):int{
   return array_change_key_case(get_headers($url,1))['content-length'];
}

echo urlsize("http://.../file.txt");
Leitmotif answered 19/12, 2017 at 10:17 Comment(0)
F
8

I'm not sure, but couldn't you use the get_headers function for this?

$url     = 'http://example.com/dir/file.txt';
$headers = get_headers($url, true);

if ( isset($headers['Content-Length']) ) {
   $size = 'file size:' . $headers['Content-Length'];
}
else {
   $size = 'file size: unknown';
}

echo $size;
Frustule answered 25/5, 2014 at 10:25 Comment(1)
With this example, it is possible for the target server at $url to exploit get_headers into keeping the connection open until the PHP process times out (by returning the headers very slowly, while not slowly enough to let the connection go stale). Since total PHP processes might be limited by FPM, this can allow a type of slow loris attack when multiple "users" access your get_headers script simultaneously.Lilongwe
R
3

The simplest and most efficient implementation:

function remote_filesize($url, $fallback_to_download = false)
{
    static $regex = '/^Content-Length: *+\K\d++$/im';
    if (!$fp = @fopen($url, 'rb')) {
        return false;
    }
    if (isset($http_response_header) && preg_match($regex, implode("\n", $http_response_header), $matches)) {
        return (int)$matches[0];
    }
    if (!$fallback_to_download) {
        return false;
    }
    return strlen(stream_get_contents($fp));
}
Rouleau answered 5/5, 2014 at 7:49 Comment(7)
OP indicated "without downloading the file." This method loads the file into memory from the remote server (eg: downloading). Even with fast connections between servers, this can easily time out or take way too long on large files. Note: You never closed $fp which is not in global scopeCristal
This function DOES NOT download body as long as possible; if it contains Content-Length header. And explicit $fp closing is NOT NECESSARY; it is automatically released on expire. php.net/manual/en/language.types.resource.phpRouleau
You can easily confirm the above using nc -l localhost 8080Rouleau
Actually the most of *close functions are not necessary in modern PHP. They are from two historical reasons: implementation restriction and mimicking the C language.Rouleau
Headers are unreliable and fallback download goes against OP. Finally, if you open a file, simply close it. Garbage collectors are no excuse for lazy developers saving a single line of code.Cristal
Your opinion about GC is proper when variable scope is unclear from its context. In this example, however, the scope is definitely enclosed and it MUST be automatically, instantly released. So your business have no means other than keeping your strict coding rule.Rouleau
BTW, the modern OOP-style stream interface SplFileObject has no method for closing internal resource handle. Procedural closing is inconsistent with modern interface at least about streamings.Rouleau
O
2

Since this question is already tagged "php" and "curl", I'm assuming you know how to use Curl in PHP.

If you set curl_setopt(CURLOPT_NOBODY, TRUE) then you will make a HEAD request and can probably check the "Content-Length" header of the response, which will be only headers.

Outdo answered 8/4, 2010 at 18:59 Comment(0)
B
2

Try the below function to get Remote file size

function remote_file_size($url){
    $head = "";
    $url_p = parse_url($url);

    $host = $url_p["host"];
    if(!preg_match("/[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*/",$host)){

        $ip=gethostbyname($host);
        if(!preg_match("/[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*/",$ip)){

            return -1;
        }
    }
    if(isset($url_p["port"]))
    $port = intval($url_p["port"]);
    else
    $port    =    80;

    if(!$port) $port=80;
    $path = $url_p["path"];

    $fp = fsockopen($host, $port, $errno, $errstr, 20);
    if(!$fp) {
        return false;
        } else {
        fputs($fp, "HEAD "  . $url  . " HTTP/1.1\r\n");
        fputs($fp, "HOST: " . $host . "\r\n");
        fputs($fp, "User-Agent: http://www.example.com/my_application\r\n");
        fputs($fp, "Connection: close\r\n\r\n");
        $headers = "";
        while (!feof($fp)) {
            $headers .= fgets ($fp, 128);
            }
        }
    fclose ($fp);

    $return = -2;
    $arr_headers = explode("\n", $headers);
    foreach($arr_headers as $header) {

        $s1 = "HTTP/1.1";
        $s2 = "Content-Length: ";
        $s3 = "Location: ";

        if(substr(strtolower ($header), 0, strlen($s1)) == strtolower($s1)) $status = substr($header, strlen($s1));
        if(substr(strtolower ($header), 0, strlen($s2)) == strtolower($s2)) $size   = substr($header, strlen($s2));
        if(substr(strtolower ($header), 0, strlen($s3)) == strtolower($s3)) $newurl = substr($header, strlen($s3));  
    }

    if(intval($size) > 0) {
        $return=intval($size);
    } else {
        $return=$status;
    }

    if (intval($status)==302 && strlen($newurl) > 0) {

        $return = remote_file_size($newurl);
    }
    return $return;
}
Bibliomancy answered 2/1, 2013 at 13:39 Comment(1)
This is the only one that worked for me on Ubuntu Linux apache server. I did have to init $size and $status at beginning of function, otherwise worked as is.Interpreter
G
2

Here is another approach that will work with servers that do not support HEAD requests.

It uses cURL to make a request for the content with an HTTP range header asking for the first byte of the file.

If the server supports range requests (most media servers will) then it will receive the response with the size of the resource.

If the server does not response with a byte range, it will look for a content-length header to determine the length.

If the size is found in a range or content-length header, the transfer is aborted. If the size is not found and the function starts reading the response body, the transfer is aborted.

This could be a supplementary approach if a HEAD request results in a 405 method not supported response.

/**
 * Try to determine the size of a remote file by making an HTTP request for
 * a byte range, or look for the content-length header in the response.
 * The function aborts the transfer as soon as the size is found, or if no
 * length headers are returned, it aborts the transfer.
 *
 * @return int|null null if size could not be determined, or length of content
 */
function getRemoteFileSize($url)
{
    $ch = curl_init($url);

    $headers = array(
        'Range: bytes=0-1',
        'Connection: close',
    );

    $in_headers = true;
    $size       = null;

    curl_setopt($ch, CURLOPT_HEADER, 1);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2450.0 Iron/46.0.2450.0');
    curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
    curl_setopt($ch, CURLOPT_VERBOSE, 0); // set to 1 to debug
    curl_setopt($ch, CURLOPT_STDERR, fopen('php://output', 'r'));

    curl_setopt($ch, CURLOPT_HEADERFUNCTION, function($curl, $line) use (&$in_headers, &$size) {
        $length = strlen($line);

        if (trim($line) == '') {
            $in_headers = false;
        }

        list($header, $content) = explode(':', $line, 2);
        $header = strtolower(trim($header));

        if ($header == 'content-range') {
            // found a content-range header
            list($rng, $s) = explode('/', $content, 2);
            $size = (int)$s;
            return 0; // aborts transfer
        } else if ($header == 'content-length' && 206 != curl_getinfo($curl, CURLINFO_HTTP_CODE)) {
            // found content-length header and this is not a 206 Partial Content response (range response)
            $size = (int)$content;
            return 0;
        } else {
            // continue
            return $length;
        }
    });

    curl_setopt($ch, CURLOPT_WRITEFUNCTION, function($curl, $data) use ($in_headers) {
        if (!$in_headers) {
            // shouldn't be here unless we couldn't determine file size
            // abort transfer
            return 0;
        }

        // write function is also called when reading headers
        return strlen($data);
    });

    $result = curl_exec($ch);
    $info   = curl_getinfo($ch);

    return $size;
}

Usage:

$size = getRemoteFileSize('http://example.com/video.mp4');
if ($size === null) {
    echo "Could not determine file size from headers.";
} else {
    echo "File size is {$size} bytes.";
}
Giaimo answered 10/11, 2015 at 17:42 Comment(3)
Your answer really helped me. Always returns the answer. Even if Content-Length is not available.Halvorson
Hi, thanks for looking and commenting. I'm really glad you found it helpful!Giaimo
This worked for me after I disabled selinux. Tested on remote image, PDF and mp4. mp4 gives a result but "22" is not the correct file size.Spital
S
2

If you using laravel 7 <=

use Illuminate\Support\Facades\Http;

Http::head($url)->header('Content-Length');
Steady answered 19/8, 2021 at 21:7 Comment(0)
L
1

Most answers here uses either CURL or are basing on reading headers. But in some certain situations you can use a way easier solution. Consider note on filesize()'s docs on PHP.net. You'll find there a tip saying: "As of PHP 5.0.0, this function can also be used with some URL wrappers. Refer to Supported Protocols and Wrappers to determine which wrappers support stat() family of functionality".

So, if your server and PHP parser is properly configured, you can simply use filesize() function, fed it with full URL, pointing to a remote file, which size you want to get, and let PHP do the all magic.

Lifelike answered 17/9, 2013 at 9:54 Comment(0)
S
1

Try this: I use it and got good result.

    function getRemoteFilesize($url)
{
    $file_headers = @get_headers($url, 1);
    if($size =getSize($file_headers)){
return $size;
    } elseif($file_headers[0] == "HTTP/1.1 302 Found"){
        if (isset($file_headers["Location"])) {
            $url = $file_headers["Location"][0];
            if (strpos($url, "/_as/") !== false) {
                $url = substr($url, 0, strpos($url, "/_as/"));
            }
            $file_headers = @get_headers($url, 1);
            return getSize($file_headers);
        }
    }
    return false;
}

function getSize($file_headers){

    if (!$file_headers || $file_headers[0] == "HTTP/1.1 404 Not Found" || $file_headers[0] == "HTTP/1.0 404 Not Found") {
        return false;
    } elseif ($file_headers[0] == "HTTP/1.0 200 OK" || $file_headers[0] == "HTTP/1.1 200 OK") {

        $clen=(isset($file_headers['Content-Length']))?$file_headers['Content-Length']:false;
        $size = $clen;
        if($clen) {
            switch ($clen) {
                case $clen < 1024:
                    $size = $clen . ' B';
                    break;
                case $clen < 1048576:
                    $size = round($clen / 1024, 2) . ' KiB';
                    break;
                case $clen < 1073741824:
                    $size = round($clen / 1048576, 2) . ' MiB';
                    break;
                case $clen < 1099511627776:
                    $size = round($clen / 1073741824, 2) . ' GiB';
                    break;
            }
        }
        return $size;

    }
    return false;
}

Now, test like these:

echo getRemoteFilesize('http://mandasoy.com/wp-content/themes/spacious/images/plain.png').PHP_EOL;
echo getRemoteFilesize('http://bookfi.net/dl/201893/e96818').PHP_EOL;
echo getRemoteFilesize('https://mcmap.net/q/218886/-downloading-files-as-attachment-filesize-incorrect').PHP_EOL;

Results:

24.82 KiB

912 KiB

101.85 KiB

Schalles answered 11/6, 2018 at 5:39 Comment(0)
A
1

To cover the HTTP/2 request, the function provided here https://mcmap.net/q/217097/-remote-file-size-without-downloading-file needs to be changed a bit:

<?php
/**
 * Returns the size of a file without downloading it, or -1 if the file
 * size could not be determined.
 *
 * @param $url - The location of the remote file to download. Cannot
 * be null or empty.
 *
 * @return The size of the file referenced by $url, or -1 if the size
 * could not be determined.
 */
function curl_get_file_size( $url ) {
  // Assume failure.
  $result = -1;

  $curl = curl_init( $url );

  // Issue a HEAD request and follow any redirects.
  curl_setopt( $curl, CURLOPT_NOBODY, true );
  curl_setopt( $curl, CURLOPT_HEADER, true );
  curl_setopt( $curl, CURLOPT_RETURNTRANSFER, true );
  curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, true );
  curl_setopt( $curl, CURLOPT_USERAGENT, get_user_agent_string() );

  $data = curl_exec( $curl );
  curl_close( $curl );

  if( $data ) {
    $content_length = "unknown";
    $status = "unknown";

    if( preg_match( "/^HTTP\/1\.[01] (\d\d\d)/", $data, $matches ) ) {
      $status = (int)$matches[1];
    } elseif( preg_match( "/^HTTP\/2 (\d\d\d)/", $data, $matches ) ) {
      $status = (int)$matches[1];
    }

    if( preg_match( "/Content-Length: (\d+)/", $data, $matches ) ) {
      $content_length = (int)$matches[1];
    } elseif( preg_match( "/content-length: (\d+)/", $data, $matches ) ) {
        $content_length = (int)$matches[1];
    }

    // http://en.wikipedia.org/wiki/List_of_HTTP_status_codes
    if( $status == 200 || ($status > 300 && $status <= 308) ) {
      $result = $content_length;
    }
  }

  return $result;
}
?>
Appealing answered 29/4, 2020 at 11:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.