Is there a way in Matlab to determine the number of lines in a file without looping through each line?
Asked Answered
G

6

25

Obviously one could loop through a file using fgetl or similar function and increment a counter, but is there a way to determine the number of lines in a file without doing such a loop?

Gide answered 29/8, 2012 at 11:6 Comment(3)
In Linux, it's just wc -l <your_file> :)Subversion
See also #8540771Wiggins
One could use dbtype and then parse the output.Acey
M
35

I like to use the following code for exactly this task

fid = fopen('someTextFile.txt', 'rb');
%# Get file size.
fseek(fid, 0, 'eof');
fileSize = ftell(fid);
frewind(fid);
%# Read the whole file.
data = fread(fid, fileSize, 'uint8');
%# Count number of line-feeds and increase by one.
numLines = sum(data == 10) + 1;
fclose(fid);

It is pretty fast if you have enough memory to read the whole file at once. It should work for both Windows- and Linux-style line endings.

Edit: I measured the performance of the answers provided so far. Here is the result for determining the number of lines of a text file containing 1 million double values (one value per line). Average of 10 tries.

 Author           Mean time +- standard deviation (s)
------------------------------------------------------
 Rody Oldenhuis      0.3189 +- 0.0314
 Edric (2)           0.3282 +- 0.0248
 Mehrwolf            0.4075 +- 0.0178
 Jonas               1.0813 +- 0.0665
 Edric (1)          26.8825 +- 0.6790

So fastest are the approaches using Perl and reading all the file as binary data. I would not be surprised, if Perl internally also read large blocks of the file at once instead of looping through it line by line (just a guess, do not know anything about Perl).

Using a simple fgetl()-loop is by a factor of 25-75 slower than the other approaches.

Edit 2: Included Edric's 2nd approach, which is much faster and on-par with the Perl solution, I'd say.

Mullens answered 29/8, 2012 at 11:22 Comment(2)
Thanks! Although all were good answers, I'm picking Mehrwolf's as the accepted one, since he compares all the other answers. I will probably actually use Edric's 2nd answer because I prefer to keep everything inside Matlab.Gide
It should be noted that Edric (2) will be off by one in the event that the final line is not terminated with \n. For example, countLines('countLines.m') returns 8 when there are 9 lines in the file. While your increase by 1 accounts for this in most cases, it returns a result inconsistent with the system command (on Windows, at least, can't test Linux) when the final line is blank. See this gist for a MCVE of both cases.Tangleberry
H
16

I think a loop is in fact the best - all other options so far suggested either rely on external programs (need to error-check; need str2num; harder to debug / run cross-platform etc.) or read the whole file in one go. Loops aren't so bad. Here's my variant

function count = countLines(fname)
  fh = fopen(fname, 'rt');
  assert(fh ~= -1, 'Could not read: %s', fname);
  x = onCleanup(@() fclose(fh));
  count = 0;
  while ischar(fgetl(fh))
    count = count + 1;
  end
end

EDIT: Jonas rightly points out that the above loop is really slow. Here's a faster version.

function count = countLines(fname)
fh = fopen(fname, 'rt');
assert(fh ~= -1, 'Could not read: %s', fname);
x = onCleanup(@() fclose(fh));
count = 0;
while ~feof(fh)
    count = count + sum( fread( fh, 16384, 'char' ) == char(10) );
end
end

It's still not as fast as wc -l, but it's not a disaster either.

Haftarah answered 29/8, 2012 at 12:0 Comment(3)
The problem with the loop is that you need to access the file at each iteration. File access is notoriously slow in Matlab; doing it many times in a loop is going to hurt.Phelloderm
It should be noted that the second method will be off by one in the event that the final line is not terminated with \n. For example, countLines('countLines.m') returns 8 when there are 9 lines in the file.Tangleberry
See this gist for a MCVETangleberry
S
12

I found a nice trick here:

if (isunix) %# Linux, mac
    [status, result] = system( ['wc -l ', 'your_file'] );
    numlines = str2num(result);

elseif (ispc) %# Windows
    numlines = str2num( perl('countlines.pl', 'your_file') );

else
    error('...');

end

where 'countlines.pl' is a perl script, containing

while (<>) {};
print $.,"\n";
Subversion answered 29/8, 2012 at 11:14 Comment(1)
system( ['wc -l ', 'your_file'] ) will also output the filename into result. This can be avoided by using system( ['wc -l <', 'your_file'] ).Seam
P
4

You can read the entire file at once, and then count how many lines you've read.

fid = fopen('yourFile.ext');

allText = textscan(fid,'%s','delimiter','\n');

numberOfLines = length(allText{1});

fclose(fid)
Phelloderm answered 29/8, 2012 at 11:15 Comment(2)
This could give memory issues for large files, since allText will have to contain, well, all text in the file.Subversion
@RodyOldenhuis: Yes, memory is certainly an issue. How much memory does your Perl solution require? Does it read the file line-by-line, in chucks, or at whole?Mullens
S
0

I would recommend using an external tool for this. For example an app called cloc, which you can download here for free.

On linux you then simply type cloc <repository path> and get

YourPC$ cloc <directory_path>
      87 text files.
      81 unique files.                              
      23 files ignored.

http://cloc.sourceforge.net v 1.60  T=0.19 s (311.7 files/s, 51946.9 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
MATLAB                          59           1009           1074           4993
HTML                             1              0              0             23
-------------------------------------------------------------------------------
SUM:                            60           1009           1074           5016
-------------------------------------------------------------------------------

They also claim it should work on windows.

Sheffield answered 14/4, 2016 at 8:57 Comment(0)
E
0

The issue with the miscounting of lines in Edric’s answer can be solved with this.

 function count = countlines(fname)
    fid = fopen(fname, 'r');
    assert(fid ~= -1, 'Could not read: %s', fname);
    x = onCleanup(@() fclose(fid));
    count = 0;
    % while ~feof(fid)
    %     count = count + sum( fread( fid, 16384, 'char' ) == char(10) );
    % end
    while ~feof(fid)
        [~] = fgetl(fid);
        count = count + 1;
    end
end
Elisabethelisabethville answered 25/5, 2023 at 20:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.