Get Large File Size in C
Asked Answered
S

6

14

Before anyone complains of "duplicate", I've been checking SO quite thoroughly, but there seem to be no clean answer yet, although the question looks quite simple.

I'm looking for a portable C code, which is able to provide the size of a file, even if such a file is bigger than 4GB.

The usual method (fseek, ftell) works fine, as long as the file remains < 2GB. It's fairly well supported everywhere, so I'm trying to find something equivalent.

Unfortunately, the updated methods (fseeko, ftello) are not supported by all compilers. For example, MinGW miss it (and obviously MSVC). Furthermore, some comments make me believe that the new return type (off_t) does not necessarily support size > 2GB, it may depend on some external parameters, to be checked.

The unambiguous methods (fseeko64, ftello64) are not supported by MSVC. MS provides their equivalent, _fseeki64 & _ftelli64. This is already bad, but it becomes worse : some Linux configurations seem to badly support these functions during run time. For example, my Debian Squeeze on PowerPC, using GCC 4.4, will produce a "filesize" method using fseeko64 which always return 0 (while it works fine for Ubuntu64). MinGW seems to answer some random garbage above 2GB.

Well, I'm a bit clueless as far as portability is concerned. And if I need to make a bunch of #if #else, then why not go straight to the OS & compilers specifics methods in the first place, such as GetFileSize() for MSVC for example.

Scriptorium answered 26/1, 2012 at 23:15 Comment(1)
Well, what is your definition of "portable"? There are many systems that can't even open files. Even more that cannot open files over 4 GB in size.Kummerbund
D
10

You said it: there's no portable method; if I were you I'd just go with GetFileSize on Windows and stat on POSIX.

Dufresne answered 26/1, 2012 at 23:19 Comment(2)
You could use _stat64 on Windows to keep the code sorta the same.Hermy
@sixlettervariables: correct, although I don't know if every compiler on Windows implement it (while GetFileSize is part of the Windows API, so it should always be available).Dufresne
L
9

You should be able to use stat64 on Linux and _stat64 on Windows to get file size information for files over 2 GBs, and both functions are very similar in usage. You can also use a couple of #defines to use stat64 on Windows too:

#if __WIN32__
#define stat64 _stat64
#endif

However, although this should work, it should be noted that the _stat family of functions on Windows is really just a wrapper around other functions, and will add additonal resources and time overhead.

Libriform answered 26/1, 2012 at 23:27 Comment(0)
C
6
int ch;
FILE *f = fopen("file_to_analyse", "rb");
/* error checking ommited for brevity */
unsigned long long filesize = 0; /* or unsigned long for C89 compatability*/
while ((ch = fgetc(f)) != EOF) filesize++;
fclose(f);
/* error checking ommited for brevity */
Clough answered 26/1, 2012 at 23:19 Comment(6)
Ok, it's the only standard compliant way, but I hope you are being sarcastic: reading a whole file, possibly 2+ GB big, one character at time just to know its size (which on current filesystems is simply an attribute of the file) is plain stupid...Dufresne
Oh, no, no, no... please tell me you're kidding. On the other hand, the question is about a portable way, not about an efficient one. This is a portable way indeed.Moonshine
It's event-driven, which is the reason why it's so fast.Glyph
Why is this so bad? How else would you count all the bytes, you would have to iterate over them and actually count them to find out, right?Static
@Static Because the filesystem counts bytes when they are written, and then stores the value away. That's how it knows what EOF is. Reading massive files in their entirety to determine size is slow, reading a pre-calculated field stored in the filesystem is fast.Decasyllabic
@Static Do you have to read every page of a book to find out how many pages it has? And if you did, would you ask why that's so bad?Schipperke
R
3

I have implemented and tested the following:

#if __WIN32__
#define stat64 _stat64
#endif

using MinGW64 gcc compiler 4.8.1 and Linux gcc 4.6.3 compiles and works.

On OSX, no redefinition of stat required.

for lstat and fstat functions I expect similar macro #defines to work.

Rann answered 2/9, 2013 at 13:36 Comment(0)
S
1
#include sys/stat.h

off_t fsize(const char *filename) {
    struct stat st; 

    if (stat(filename, &st) == 0)
        return st.st_size;

    return -1; 
}
Sori answered 26/1, 2012 at 23:21 Comment(0)
F
1

What about using lseek() (or _lseek()) with SEEK_END? It returns the offset sought to.

Under linux _FILE_OFFSET_BITS needs to be defined to 64 for lseek() to return 64bit values (which should be the default anyhow).

Feil answered 27/1, 2012 at 11:19 Comment(2)
I've not tried it yet. It seems lseek() might have the same sort of problem as fseeko() : the used type (off_t) may or may not support values above 2GB, depending on some external configuration.Scriptorium
@Attract: I tested this under 32/64bit linux using gcc and under 32bit win-vista using VC10.Feil

© 2022 - 2024 — McMap. All rights reserved.