PHP x86 How to get filesize of > 2 GB file without external program?
Asked Answered
A

13

26

I need to get the file size of a file over 2 GB in size. (testing on 4.6 GB file). Is there any way to do this without an external program?

Current status:

  • filesize(), stat() and fseek() fails
  • fread() and feof() works

There is a possibility to get the file size by reading the file content (extremely slow!).

$size = (float) 0;
$chunksize = 1024 * 1024;
while (!feof($fp)) {
    fread($fp, $chunksize);
    $size += (float) $chunksize;
}
return $size;

I know how to get it on 64-bit platforms (using fseek($fp, 0, SEEK_END) and ftell()), but I need solution for 32-bit platform.


Solution: I've started open-source project for this.

Big File Tools

Big File Tools is a collection of hacks that are needed to manipulate files over 2 GB in PHP (even on 32-bit systems).

Ant answered 31/3, 2011 at 14:28 Comment(8)
Well, if you can't do it in C code on x86, then it's pretty much unsolvable from within PHP. It's a systemic limitation that you won't overcome with your constraints.Johnnyjohnnycake
Yep, there is problem with integer max value. I know. And what about float?Otten
Float becomes a bit inexact at some point. Don't know at which on php x86. It would work better if you manually manage the upper and lower 24 bits of the result. if($size>=0x1000000) { $upper+=1; $size-=0x1000000 }. Your file reading approach is certainly functioning, but not practical. Sadly PHPs fseek(SEEK_CUR) interface does not return the amount skipped, else it would be easier.Johnnyjohnnycake
I reopened this issue, because I think, that there can be some less dirty solutions. And why I think? Becase there is a function disk_free_space (returns float) and it works with realy big numbers without issues.Otten
The float type has an inherent loss of precision. Period. Look up a good reference on computing and numerical storage if you want to know why this is so. disk_free_space() DOES have skew errors in on large numbers, however, due to its nature, its not possible to be 100% precise anyway. Individual filesystem implementations, cluster sizes, etc, may affect the ACTUAL usable space. So, disk_free_space() suffers from the inescapable float skew, but it doesn't NEED to be accurate at that level. File sizes are exact numbers, no error tolerance. Screw up the file size, and you will lose data.Crispy
Ok, now I understand you where is a problem! Thank you very much, afternoon I will post update.Otten
Originally you did not want to use something external. So you started an external project to do it. Lol... It reminds me of xkcd.com/927. BTW, did you take the time to file a bug report with PHP so they might consider adding native support for everyone to use?Henry
Hi, I meant external executable. External executables usually complicate things. Complicates development (is not multiplatform anymore), complicates workflow (needs setup on dev machines) and deployment (need properly setup right, open_basedir, etc...). So starting PHP project which allows you to get exact file size without external dependency (on most systems) is I think the good-enough solution. There is no reason why to report current behaviour as a bug as it is the expected behaviour of 32-bit integers (x86 platform). See more on project wiki: github.com/jkuchar/BigFileToolsOtten
A
8

I've started project called Big File Tools. It is proven to work on Linux, Mac and Windows (even 32-bit variants). It provides byte-precise results even for huge files (>4GB). Internally it uses brick/math - arbitrary-precision arithmetic library.

Install it using composer.

composer install jkuchar/BigFileTools

and use it:

<?php
$file = BigFileTools\BigFileTools::createDefault()->getFile(__FILE__);
echo $file->getSize() . " bytes\n";

Result is BigInteger so you can compute with results

$sizeInBytes = $file->getSize();
$sizeInMegabytes = $sizeInBytes->toBigDecimal()->dividedBy(1024*1024, 2, \Brick\Math\RoundingMode::HALF_DOWN);    
echo "Size is $sizeInMegabytes megabytes\n";

Big File Tools internally uses drivers to reliably determine exact file size on all platforms. Here is list of available drivers (updated 2016-02-05)

| Driver           | Time (s) ↓          | Runtime requirements | Platform 
| ---------------  | ------------------- | --------------       | ---------
| CurlDriver       | 0.00045299530029297 | CURL extension       | -
| NativeSeekDriver | 0.00052094459533691 | -                    | -
| ComDriver        | 0.0031449794769287  | COM+.NET extension   | Windows only
| ExecDriver       | 0.042937040328979   | exec() enabled       | Windows, Linux, OS X
| NativeRead       | 2.7670161724091     | -                    | -

You can use BigFileTools with any of these or fastest available is chosen by default (BigFileTools::createDefault())

 use BigFileTools\BigFileTools;
 use BigFileTools\Driver;
 $bigFileTools = new BigFileTools(new Driver\CurlDriver());
Ant answered 5/2, 2016 at 21:8 Comment(1)
Congratulations!! Great project! ... I report a issue: github.com/jkuchar/BigFileTools/issues/26, about PHP 7.3 and 7.4, in older versions work fine.Windage
C
23

Here's one possible method:

It first attempts to use a platform-appropriate shell command (Windows shell substitution modifiers or *nix/Mac stat command). If that fails, it tries COM (if on Windows), and finally falls back to filesize().

/*
 * This software may be modified and distributed under the terms
 * of the MIT license.
 */

function filesize64($file)
{
    static $iswin;
    if (!isset($iswin)) {
        $iswin = (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN');
    }

    static $exec_works;
    if (!isset($exec_works)) {
        $exec_works = (function_exists('exec') && !ini_get('safe_mode') && @exec('echo EXEC') == 'EXEC');
    }

    // try a shell command
    if ($exec_works) {
        $cmd = ($iswin) ? "for %F in (\"$file\") do @echo %~zF" : "stat -c%s \"$file\"";
        @exec($cmd, $output);
        if (is_array($output) && ctype_digit($size = trim(implode("\n", $output)))) {
            return $size;
        }
    }

    // try the Windows COM interface
    if ($iswin && class_exists("COM")) {
        try {
            $fsobj = new COM('Scripting.FileSystemObject');
            $f = $fsobj->GetFile( realpath($file) );
            $size = $f->Size;
        } catch (Exception $e) {
            $size = null;
        }
        if (ctype_digit($size)) {
            return $size;
        }
    }

    // if all else fails
    return filesize($file);
}
Crispy answered 31/3, 2011 at 15:28 Comment(9)
Wow, this is exactly what I was searching for. ;) I will make little improvements and than I will post it here. (support for >2GB files without exec and COM support)Otten
How to tick more aswers as solution?Otten
You can only tick a single answer. I'm curious, do you have a problem with this answer that you keep looking for alternatives?Crispy
No, it is super! But it is only part of solution, because question was how to get filesize without external program... (exec is not allowed on all webservers)Otten
If it is possible to tick, only one answer i will tick yours, because it is best aswer here. (but for other question ;))Otten
Ahh, my bad, I didn't see that on your original post.Crispy
Really I used other solution (more solutions in one package) #5501951Otten
This is wrong and will break your app in subtle ways if you use it. The problem is that on 32 bit systems filesize() returns negative if it overflows... until the size is greater than 4GB at which point it returns a positive number again, but one that is way too small. So the code above will, for example, return a size that is under 2 gigabytes for a file with size between 4 gigabytes and 6 gigabytes on a 32 bit system.Overburden
@MarkMaunder - Overflow bug is fixed.Crispy
A
8

I've started project called Big File Tools. It is proven to work on Linux, Mac and Windows (even 32-bit variants). It provides byte-precise results even for huge files (>4GB). Internally it uses brick/math - arbitrary-precision arithmetic library.

Install it using composer.

composer install jkuchar/BigFileTools

and use it:

<?php
$file = BigFileTools\BigFileTools::createDefault()->getFile(__FILE__);
echo $file->getSize() . " bytes\n";

Result is BigInteger so you can compute with results

$sizeInBytes = $file->getSize();
$sizeInMegabytes = $sizeInBytes->toBigDecimal()->dividedBy(1024*1024, 2, \Brick\Math\RoundingMode::HALF_DOWN);    
echo "Size is $sizeInMegabytes megabytes\n";

Big File Tools internally uses drivers to reliably determine exact file size on all platforms. Here is list of available drivers (updated 2016-02-05)

| Driver           | Time (s) ↓          | Runtime requirements | Platform 
| ---------------  | ------------------- | --------------       | ---------
| CurlDriver       | 0.00045299530029297 | CURL extension       | -
| NativeSeekDriver | 0.00052094459533691 | -                    | -
| ComDriver        | 0.0031449794769287  | COM+.NET extension   | Windows only
| ExecDriver       | 0.042937040328979   | exec() enabled       | Windows, Linux, OS X
| NativeRead       | 2.7670161724091     | -                    | -

You can use BigFileTools with any of these or fastest available is chosen by default (BigFileTools::createDefault())

 use BigFileTools\BigFileTools;
 use BigFileTools\Driver;
 $bigFileTools = new BigFileTools(new Driver\CurlDriver());
Ant answered 5/2, 2016 at 21:8 Comment(1)
Congratulations!! Great project! ... I report a issue: github.com/jkuchar/BigFileTools/issues/26, about PHP 7.3 and 7.4, in older versions work fine.Windage
T
4
<?php
  ######################################################################
  # Human size for files smaller or bigger than 2 GB on 32 bit Systems #
  # size.php - 1.1 - 17.01.2012 - Alessandro Marinuzzi - www.alecos.it #
  ######################################################################
  function showsize($file) {
    if (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN') {
      if (class_exists("COM")) {
        $fsobj = new COM('Scripting.FileSystemObject');
        $f = $fsobj->GetFile(realpath($file));
        $file = $f->Size;
      } else {
        $file = trim(exec("for %F in (\"" . $file . "\") do @echo %~zF"));
      }
    } elseif (PHP_OS == 'Darwin') {
      $file = trim(shell_exec("stat -f %z " . escapeshellarg($file)));
    } elseif ((PHP_OS == 'Linux') || (PHP_OS == 'FreeBSD') || (PHP_OS == 'Unix') || (PHP_OS == 'SunOS')) {
      $file = trim(shell_exec("stat -c%s " . escapeshellarg($file)));
    } else {
      $file = filesize($file);
    }
    if ($file < 1024) {
      echo $file . ' Byte';
    } elseif ($file < 1048576) {
      echo round($file / 1024, 2) . ' KB';
    } elseif ($file < 1073741824) {
      echo round($file / 1048576, 2) . ' MB';
    } elseif ($file < 1099511627776) {
      echo round($file / 1073741824, 2) . ' GB';
    } elseif ($file < 1125899906842624) {
      echo round($file / 1099511627776, 2) . ' TB';
    } elseif ($file < 1152921504606846976) {
      echo round($file / 1125899906842624, 2) . ' PB';
    } elseif ($file < 1180591620717411303424) {
      echo round($file / 1152921504606846976, 2) . ' EB';
    } elseif ($file < 1208925819614629174706176) {
      echo round($file / 1180591620717411303424, 2) . ' ZB';
    } else {
      echo round($file / 1208925819614629174706176, 2) . ' YB';
    }
  }
?>

Use as follow:

<?php include("php/size.php"); ?>

And where you want:

<?php showsize("files/VeryBigFile.rar"); ?>

If you want improve it you are welcome!

Tetrachord answered 16/1, 2012 at 14:10 Comment(2)
You should never ever use that, OS specific scripting, without piping stdout/in/err is a big no-no.Elfreda
for %F in (\"" . $file . "\") do @echo %~zF What does this do?Glioma
I
4
$file_size=sprintf("%u",filesize($working_dir."\\".$file));

This works for me on a Windows Box.

I was looking through the bug log here: https://bugs.php.net/bug.php?id=63618 and found this solution.

Iene answered 31/12, 2015 at 5:40 Comment(1)
This is wrong, because it still use 32-bit integer. So it will print correct value only until 4 GB on 32-bit system.Otten
R
2

I found a nice slim solution for Linux/Unix only to get the filesize of large files with 32-bit php.

$file = "/path/to/my/file.tar.gz";
$filesize = exec("stat -c %s ".$file);

You should handle the $filesize as string. Trying to casting as int results in a filesize = PHP_INT_MAX if the filesize is larger than PHP_INT_MAX.

But although handled as string the following human readable algo works:

formatBytes($filesize);

public function formatBytes($size, $precision = 2) {
    $base = log($size) / log(1024);
    $suffixes = array('', 'k', 'M', 'G', 'T');
    return round(pow(1024, $base - floor($base)), $precision) . $suffixes[floor($base)];
}

so my output for a file larger than 4 Gb is:

4.46G
Ripplet answered 27/3, 2015 at 8:40 Comment(2)
This solution uses exec(), that is not allways allowed.Otten
Won't this break on files with spaces in their name? You're creating a flat string with args and the filename only separated by spaces. stat itself is a nice utility, but it's part of GNU coreutils, not even specified by POSIX.Colicroot
W
1

Well easyest way to do that would be to simply add a max value to your number. This means on x86 platform long number add 2^32:

if($size < 0) $size = pow(2,32) + $size;

example: Big_File.exe - 3,30Gb (3.554.287.616 b) your function returns -740679680 so you add 2^32 (4294967296) and get 3554287616.

You get negative number because your system reserves one bit of memory to the negative sign, so you are left with 2^31 (2.147.483.648 = 2G) maximum value of either negative or positive number. When system reaches this maximum value it doesn't stop but simply overwrites that last reserved bit and your number is now forced to negative. In simpler words, when you exceed maximum positive number you will be forced to maximum negative number, so 2147483648 + 1 = -2147483648. Further addition goes towards zero and again towards maximum number.

As you can see it is like a circle with highest and lowest numbers closing the loop.

Total maximum number that x86 architecture can "digest" in one tick is 2^32 = 4294967296 = 4G, so as long as your number is lower than that, this simple trick will always work. In higher numbers you must know how many times you have passed the looping point and simply multiply it by 2^32 and add it to your result:

$size = pow(2,32) * $loops_count + $size;

Ofcourse in basic PHP functions this is quite hard to do, because no function will tell you how many times it has passed the looping point, so this won't work for files over 4Gigs.

Welby answered 9/1, 2016 at 18:52 Comment(1)
This is quite theoretical answer. And as you told this works only for files under 4G. Look into BigFileTools implementation. There are few hacks with big numbers implemented. (and they work for any file size)Otten
P
0

One option would be to seek to the 2gb mark and then read the length from there...

function getTrueFileSize($filename) {
    $size = filesize($filename);
    if ($size === false) {
        $fp = fopen($filename, 'r');
        if (!$fp) {
            return false;
        }
        $offset = PHP_INT_MAX - 1;
        $size = (float) $offset;
        if (!fseek($fp, $offset)) {
            return false;
        }
        $chunksize = 8192;
        while (!feof($fp)) {
            $size += strlen(fread($fp, $chunksize));
        }
    } elseif ($size < 0) {
        // Handle overflowed integer...
        $size = sprintf("%u", $size);
    }
    return $size;
}

So basically that seeks to the largest positive signed integer representable in PHP (2gb for a 32 bit system), and then reads from then on using 8kb blocks (which should be a fair tradeoff for best memory efficiency vs disk transfer efficiency).

Also note that I'm not adding $chunksize to size. The reason is that fread may actually return more or fewer bytes than $chunksize depending on a number of possibilities. So instead, use strlen to determine the length of the parsed string.

Perinephrium answered 31/3, 2011 at 15:4 Comment(6)
It think yes, this looks like solution. Small bug to be fixed: on Windows filesize returns overflowed filesize. So we will must use fseek($fp, 0, SEEK_END) === -1 instead of $size === false.Otten
@Honza: Not really, since === false is still different than overflowed. So to fix the overflow, just do return sprintf('%u', $size) to force back to signed...Perinephrium
I think you are true but only a litle, because when I call filesize on 4.6GB file it returns to me int(41385984). So really only solution is fseek...Otten
Really not working on Windows. Because it overflows twice. ;) It returns for 4.6GB file that it has 39MB. ;)Otten
There is bug in this code. If file is small then 4GB you get non-sense.Otten
filesize() is not reliable it returns non-sense (not false)Otten
B
0

If you have an FTP server you could use fsockopen:

$socket = fsockopen($hostName, 21);
$t = fgets($socket, 128);
fwrite($socket, "USER $myLogin\r\n");
$t = fgets($socket, 128);
fwrite($socket, "PASS $myPass\r\n");
$t = fgets($socket, 128);
fwrite($socket, "SIZE $fileName\r\n");
$t = fgets($socket, 128);
$fileSize=floatval(str_replace("213 ","",$t));
echo $fileSize;
fwrite($socket, "QUIT\r\n");
fclose($socket); 

(Found as a comment on the ftp_size page)

Bab answered 31/3, 2011 at 15:9 Comment(4)
Thank you for your solution. Yes, this is also way, but it it not generally usable. I need reusability of code. Because it is used as addon for Nette Framework.Otten
True. I'd use this as a fallback if the system doesn't let you use exec.Bab
Ok, and how to convert file path to ftp url? ;)Otten
Depends on your server. $hostName would be $_SERVER['HTTP_HOST'] in many cases. $fileName could be different, depending on the FTP root. WordPress can use an FTP server for updates.Bab
L
0

you may want to add some alternatives to the function you use such as calling system functions such as "dir" / "ls" and get the information from there. They are subject of security of course, things you can check and eventually revert to the slow method as a last resort only.

Layout answered 31/3, 2011 at 15:18 Comment(0)
D
0

When IEEE double is used (very most of systems), file sizes below ~4EB (etabytes = 10^18 bytes) do fit into double as precise numbers (and there should be no loss of precision when using standard arithmetic operations).

Dumond answered 1/4, 2011 at 14:16 Comment(2)
And is there any way to know if it is safe to use float?Otten
Actually, depending on the system, it may be less than that. PHP for instance (the language in question) only gives a precision of "roughly 14 digits", but possibly as low as 8 digits. For any system that actually uses the filesize for anything other than cosmetic purposes, data loss will ensue.Crispy
O
0

You can't reliably get the size of a file on a 32 bit system by checking if filesize() returns negative, as some answers suggest. This is because if a file is between 4 and 6 gigs on a 32 bit system filesize will report a positive number, then negative from 6 to 8 then positive from 8 to 10 and so on. It loops, in a manner of speaking.

So you're stuck using an external command that works reliably on your 32 bit system.

However, one very useful tool is the ability to check if the file size is bigger than a certain size and you can do this reliably on even very big files.

The following seeks to 50 megs and tries to read one byte. It is very fast on my low spec test machine and works reliably even when the size is much greater than 2 gigs.

You can use this to check if a file is greater than 2147483647 bytes (2147483648 is max int on 32 bit systems) and then handle the file differently or have your app issue a warning.

function isTooBig($file){
        $fh = @fopen($file, 'r');
        if(! $fh){ return false; }
        $offset = 50 * 1024 * 1024; //50 megs
        $tooBig = false;
        if(fseek($fh, $offset, SEEK_SET) === 0){
                if(strlen(fread($fh, 1)) === 1){
                        $tooBig = true;
                }
        } //Otherwise we couldn't seek there so it must be smaller

        fclose($fh);
        return $tooBig;
}
Overburden answered 8/7, 2012 at 16:22 Comment(1)
Yes, this is part of my solution... #5501951Otten
P
0

Below code works OK for any filesize on any version of PHP / OS / Webserver / Platform.

// http head request to local file to get file size
$opts = array('http'=>array('method'=>'HEAD'));
$context = stream_context_create($opts);

// change the URL below to the URL of your file. DO NOT change it to a file path.
// you MUST use a http:// URL for your file for a http request to work
// SECURITY - you must add a .htaccess rule which denies all requests for this database file except those coming from local ip 127.0.0.1.
// $tmp will contain 0 bytes, since its a HEAD request only, so no data actually downloaded, we only want file size
$tmp= file_get_contents('http://127.0.0.1/pages-articles.xml.bz2', false, $context);

$tmp=$http_response_header;
foreach($tmp as $rcd) if( stripos(trim($rcd),"Content-Length:")===0 )  $size= floatval(trim(str_ireplace("Content-Length:","",$rcd)));
echo "File size = $size bytes";

// example output
File size = 10082006833 bytes
Pasahow answered 17/8, 2013 at 11:29 Comment(1)
Ok, this is kind of solution, however I need something that accepts filepath and returns filesize. Because this solution uses .htaccess it is webserver-dependent, when mobed to IIS, creates se security issue. I've found hack with curl, that allows to do the same thing with local file (so no http req. and no need to set up environment - url translation; it's part of my solution)Otten
V
-1

I wrote an function which returns the file size exactly and is quite fast:

function file_get_size($file) {
    //open file
    $fh = fopen($file, "r"); 
    //declare some variables
    $size = "0";
    $char = "";
    //set file pointer to 0; I'm a little bit paranoid, you can remove this
    fseek($fh, 0, SEEK_SET);
    //set multiplicator to zero
    $count = 0;
    while (true) {
        //jump 1 MB forward in file
        fseek($fh, 1048576, SEEK_CUR);
        //check if we actually left the file
        if (($char = fgetc($fh)) !== false) {
            //if not, go on
            $count ++;
        } else {
            //else jump back where we were before leaving and exit loop
            fseek($fh, -1048576, SEEK_CUR);
            break;
        }
    }
    //we could make $count jumps, so the file is at least $count * 1.000001 MB large
    //1048577 because we jump 1 MB and fgetc goes 1 B forward too
    $size = bcmul("1048577", $count);
    //now count the last few bytes; they're always less than 1048576 so it's quite fast
    $fine = 0;
    while(false !== ($char = fgetc($fh))) {
        $fine ++;
    }
    //and add them
    $size = bcadd($size, $fine);
    fclose($fh);
    return $size;
}
Viafore answered 24/7, 2013 at 22:49 Comment(2)
On which OS have you tested this?Otten
Raspbian (Debian) on Raspberry Pi (not the fastest machine)Viafore

© 2022 - 2024 — McMap. All rights reserved.