Fastest way to calculate the size of an file opened inside the code (PHP)
Asked Answered
L

4

8

I know there quite a bit of in-built functions available in PHP to get size of the file, some of them are: filesize, stat, ftell, etc.

My question lies around ftell which is quite interesting, it returns you the integer value of the file-pointer from the file.

Is it possible to get the size of the file using ftell function? If yes, then tell me how?

Scenario:

  1. System (code) opens a existing file with mode "a" to append the contents.
  2. File pointer points to the end of line.
  3. System updates the content into the file.
  4. System uses ftell to calculate the size of the file.
Listlessness answered 13/7, 2011 at 5:58 Comment(0)
C
16

fstat determines the file size without any acrobatics:

$f = fopen('file', 'r+');
$stat = fstat($f);
$size = $stat['size'];

ftell can not be used when the file has been opened with the append("a") flag. Also, you have to seek to the end of the file with fseek($f, 0, SEEK_END) first.

Carolus answered 13/7, 2011 at 6:1 Comment(1)
Thank you, info on fseek did the trick. I used fseek in conjunction with ftell and it was the fastest.Listlessness
G
2

ftell() can tell you how many bytes are supposed to be in the file, but not how many actually are. Sparse files take up less space on disk than the value seeking to the end and telling will return.

Gideon answered 13/7, 2011 at 6:2 Comment(1)
compressed files and inline files also take up less space on disk.Sashasashay
C
0

I wrote a benchmark to improve this topic, and to avoid people arguing there's some kind of php/cache, I create unique files in another process.

This is a new benchmark I did to remain no doubt.

Tests ignore fopen and close time, since user asks the fastest way to calculate the size of an already opened file. Each test is run with 200 files.

The code which creates files in a separate process is the first comment of this post.

<?php
class timeIt
{
    static private $times   = [];
    static function new()
    {
        self::$times[] = hrtime(true);
    }
    static function stop()
    {
        self::$times[] = -1;
    }
    static function dif()
    {
        $dif    = 0;
        $sum    = 0;
        $i      = count(self::$times) - 1;

        if (self::$times[$i] === -1)
            unset(self::$times[$i--]);
        
        for ($i = count(self::$times) - 1; $i > 0; --$i) {
            if (self::$times[$i - 1] === -1) {
                $sum    += $dif;
                $dif    = 0;
                --$i;
                continue;
            }
            $dif    += self::$times[$i] - self::$times[$i - 1];
        }
        return $sum + $dif;
    }
    static function printNReset()
    {
        echo "diffTime:" . self::dif() . "\n\n";
        self::reset();
    }
    static function reset()
    {
        self::$times    = [];
    }
}
function fseek_size_from_current($handle)
{
    $current  = ftell($handle);
    fseek($handle, 0, SEEK_END);
    $size   = ftell($handle);
    fseek($handle, $current);
    
    return $size;
}
function fseek_size_from_start($handle)
{
    fseek($handle, 0, SEEK_END);
    $size   = ftell($handle);
    fseek($handle, 0);
    
    return $size;
}

function uniqueProcessId()
{
    return (string) hrtime(true);
}

function getUniqueForeignProcessFiles($quantity, $size)
{
    $returnedFilenames   = $filenames = [];
    while ($quantity--){
        $filename   = uniqueProcessId();
        $filenames[$filename]   = $size;
        $returnedFilenames[]    = __DIR__ . DIRECTORY_SEPARATOR . $filename;
    }

    $data       = base64_encode(json_encode($filenames));
    $foreignCgi = __DIR__ . DIRECTORY_SEPARATOR . "createFileByNames.php";
    $command    = "php $foreignCgi $data";
    if (shell_exec($command) !== 'ok')
        die("An error ocurred");

    return $returnedFilenames;
}
const FILESIZE  = 20 * 1024 * 1024;

foreach(getUniqueForeignProcessFiles(200, FILESIZE) as $filename){
    $handle = fopen($filename, 'r');
    timeIt::new();
    $size   = fstat($handle)['size'];
    timeIt::new();
    timeIt::stop();
    fclose($handle);
    unlink($filename);
}
echo "**fstat**\n";
timeIt::printNReset();

foreach(getUniqueForeignProcessFiles(200, FILESIZE) as $filename){
    $handle = fopen($filename, 'r');
    timeIt::new();
    $size   = fseek_size_from_start($handle);
    timeIt::new();
    timeIt::stop();
    fclose($handle);
    unlink($filename);
}
echo "**fseek with static/defined**\n";
timeIt::printNReset();


foreach(getUniqueForeignProcessFiles(200, FILESIZE) as $filename){
    $handle = fopen($filename, 'r');
    timeIt::new();
    $size   = fseek_size_from_current($handle);
    timeIt::new();
    timeIt::stop();
    fclose($handle);
    unlink($filename);
}
echo "**fseek with current offset**\n";
timeIt::printNReset();


foreach(getUniqueForeignProcessFiles(200, FILESIZE) as $filename){
    $handle = fopen($filename, 'r');
    timeIt::new();
    $size   = filesize($filename);
    timeIt::new();
    timeIt::stop();
    fclose($handle);
    unlink($filename);
}
echo "**filesize after fopen**\n";
timeIt::printNReset();

foreach(getUniqueForeignProcessFiles(200, FILESIZE) as $filename){
    timeIt::new();
    $size   = filesize($filename);
    timeIt::new();
    timeIt::stop();
    unlink($filename);
}
echo "**filesize no fopen**\n";
timeIt::printNReset();

Results with 20MB files, times in nanoseconds

fstat diffTime:2745700

fseek with static/defined diffTime:1267400

fseek with current offset diffTime:983500

filesize after fopen diffTime:283052500

filesize no fopen diffTime:4259203800

Results with 1MB file, times in nanoseconds:

fstat diffTime:1490400

fseek with static/defined diffTime:706800

fseek with current offset diffTime:837900

filesize after fopen diffTime:22763300

filesize no fopen diffTime:216512800

Previously this answer had another benchmark, which I removed the algorithm to let this answer cleaner. That algorithm used file created by own process and the assumption was:

ftell + fseek is half the time of fstat['size'], even inside another function and calling both functions twice. fstat is slower because it has a lot more information than just the file size, so if you need the other infos alongside your code, to check for changes, just stick to fstat.

Current benchmark shows that assumption to be valid, which is: **fseek + ftell++ is 2-2.8x faster than fstat for files of 1-20MB.

Feel free to run your benchmarks and share your results.

Camembert answered 14/1, 2020 at 17:16 Comment(1)
<?php $files = json_decode(base64_decode($argv[1])); function createFilled($fileName, $mbyte, $byte) { $mbyte = $mbyte * 1024 * 1024; file_put_contents(DIR . DIRECTORY_SEPARATOR . $fileName, str_repeat('a', $mbyte + $byte)); } foreach($files as $filename => $size) createFilled($filename, 0, $size); echo 'ok';Camembert
L
-1

Thanks @Phihag, with your info on fseek along with ftell I am able to calculate the size in a much better way. See the code here: http://pastebin.com/7XCqu0WR

<?php
$fp = fopen("/tmp/temp.rock", "a+");

fwrite($fp, "This is the contents");

echo "Time taken to calculate the size by filesize function: ";
$t = microtime(true);
$ts1 = filesize("/tmp/temp.rock") . "\n";
echo microtime(true) - $t . "\n";

echo "Time taken to calculate the size by fstat function:";
$t = microtime(true);
$ts1 = fstat($fp) . "\n";
$size = $ts1["size"];
echo microtime(true) - $t . "\n";

echo "Time taken to calculate the size by fseek and ftell function: ";
$t = microtime(true);
fseek($fp, 0, SEEK_END);
$ts2 = ftell($fp) . "\n";
echo microtime(true) - $t . "\n";

fclose($fp);

/**
OUTPUT:

Time taken to calculate the size by filesize function:2.4080276489258E-5
Time taken to calculate the size by fstat function:2.9802322387695E-5
Time taken to calculate the size by fseek and ftell function:1.2874603271484E-5

*/
?>
Listlessness answered 13/7, 2011 at 9:11 Comment(6)
@S Rakesh Note that this benchmark is totally wrong: Firstly, you measure something barely measurable. If at all, you should measure a 100K+ runs. 10^-5s is larger than the resolution of some timers. Also, you don't count the time the fopen takes. (By the way, what I suggested was not filesize, but fstat, which operates on handles, too), so you're comparing apples to oranges(accessing a handle vs a path in the filesystem) anyway. Additionally, you don't take caching into account. Naturally, by the time you call fseek, the filesize has already been determined and cached.Carolus
@Phihag I agree the timers are long, but I am just trying to figure out the best way to calculate the file size which is opened. I did run a benchmark of 100 req. with 100 concurrent level, I still find ftell is the best.Listlessness
@S Rakesh Best does not have to equal fastest (if you were out for speed, you wouldn't write php code). Note that you're still comparing apples to oranges in the form of filesize to fstat/ftell. Also note that the results depend on general computer and caching setup. For example, if your CPU caches are already warmed up and contain the inode structure, filesize (which does have to find the inode in the first place) will be magnitudes slower. On the other hand, if you are in a setup where the file's inode (and possibly containing directories') have to be read from an HDD, ...Carolus
... filesize will be faster than fopen+fseek+ftell. All in all, since you're writing php anyways, I'd suggest you go for clarity of code, and performance only if it matters.Carolus
I have checked on a bigger set and the results were very similar: filesize - 43.9655s / fstat - 44.6673s / ftell - 44.5511sReseat
I don't understand why some people are asking to measure de fopen time... Why are you asking to include the fopen time? The question states clearly that the file is already opened.Rockfish

© 2022 - 2024 — McMap. All rights reserved.