Using a loop of fgets()
calls is fine solution and the most straightforward to write, however:
even though internally the file is read using a buffer of 8192 bytes, your code still has to call that function for each line.
it's technically possible that a single line may be bigger than the available memory if you're reading a binary file.
This code reads a file in chunks of 8kB each and then counts the number of newlines within that chunk.
function getLines($file)
{
$f = fopen($file, 'rb');
$lines = 0;
while (!feof($f)) {
$lines += substr_count(fread($f, 8192), "\n");
}
fclose($f);
return $lines;
}
If the average length of each line is at most 4kB, you will already start saving on function calls, and those can add up when you process big files.
Benchmark
I ran a test with a 1GB file; here are the results:
+-------------+------------------+---------+
| This answer | Dominic's answer | wc -l |
+------------+-------------+------------------+---------+
| Lines | 3550388 | 3550389 | 3550388 |
+------------+-------------+------------------+---------+
| Runtime | 1.055 | 4.297 | 0.587 |
+------------+-------------+------------------+---------+
Time is measured in seconds real time, see here what real means
True line count
While the above works well and returns the same results as wc -l
, if the file ends without a newline, the line number will be off by one; if you care about this particular scenario, you can make it more accurate by using this logic:
function getLines($file)
{
$f = fopen($file, 'rb');
$lines = 0; $buffer = '';
while (!feof($f)) {
$buffer = fread($f, 8192);
$lines += substr_count($buffer, "\n");
}
fclose($f);
if (strlen($buffer) > 0 && $buffer[-1] != "\n") {
++$lines;
}
return $lines;
}