How can I get a file's size in C? [duplicate]
Asked Answered
Z

7

470

How can I find out the size of a file I opened with an application written in C?

I would like to know the size because I want to put the content of the loaded file into a string, which I allocate using malloc().

Just writing malloc(10000*sizeof(char)); is IMHO a bad idea.

Zomba answered 26/10, 2008 at 20:54 Comment(9)
Note that sizeof(char) is 1, by definition.Selfcontradiction
Ya, but some esoteric platform's compiler might define char as 2 bytes - then the program allocates more than is necessary. One can never be too sure.Dellinger
@George an "esoteric platform's compiler" where sizeof(char) != 1 is not a true C compiler. Even if a character is 32 bits, it will still return 1.Atlante
@George: The C (and C++) standard guarantees that sizeof(char)==1. See e.g.parashift.com/c++-faq-lite/intrinsic-types.html#faq-26.1Olvera
I actually prefer malloc(x*sizeof(char)); to malloc(x); when allocating x characters. Yes, they always compile to the same thing, but I like consistency with other memory allocations.Salaidh
I would hope the optimizer can figure this out and do the right thing, thus using sizeof is safer and equivalentRandellrandene
@Ben: writing more than you need is not safer, it can be more dangerous. More code presents a greater surface for bugs to infect. If you really want safer, then use p = malloc(N * sizeof (*p)) - don't hardcode the type where the compiler can't check it for you.Myra
You can use fstat with fileno if you have FILE*: fstat(fileno(f), &stat)Psychosis
It's worth remembering that the C standard redefines the word byte to mean a char, so it's best to just avoid talking about bytes in a C context at all. (Try octets instead. AFAIK the standard hasn't changed those.)Escribe
A
641

You need to seek to the end of the file and then ask for the position:

fseek(fp, 0L, SEEK_END);
sz = ftell(fp);

You can then seek back, e.g.:

fseek(fp, 0L, SEEK_SET);

or (if seeking to go to the beginning)

rewind(fp);
Abruzzi answered 26/10, 2008 at 20:57 Comment(22)
@camh - Thanks man. This comment solved a problem I had with a file sizing algorithm. For the record, one opens a file in binary mode by putting a 'b' at the end of fopen's mode string.Cestode
LOL, yeah right, Windows inherited this stupid text/binary mode nonsense from DOS. This is easily forgotten nowadays. Actually the POSIX standard even mandates that any POSIX system must be able to cope with the "b" flag in fopen calls (to be compatible with the C standard!), but on the same hand it mandates, that the implementation must ignore it entirely, since this flag has no effect on POSIX systems (those don't know any such thing as a text mode and always open in binary mode).Brightness
Yo uh, use rewind before people forget what it meansWindsor
Returns a signed int, so limited to 2 GB. But on the plus side your file could be negative 2 billion bytes long, and they are prepared for that.Eudoca
length = lseek(fd, 0, SEEK_END)+1;Ilailaire
From fseek documentation "Library implementations are allowed to not meaningfully support SEEK_END (therefore, code using it has no real standard portability)."Deceitful
>2GB prob could be avoided using fseeko and ftello. If possible edit the answer.!!Rupee
@MikaHaarahiltunen At least if you are working on a POSIX system, that is definitely not the case. (And I wouldn't trust cplusplus.com at all)Encratia
fseek returns the file pointer offset, so you don't need to use ftell. Just say "sz = fseek(fp, 0L, SEEK_END);".Freeness
THIS IS NOT PORTABLE. DON'T USE THIS. IT'S NOT POSIX COMPLIANTKramer
@RobWalker : securecoding.cert.org/confluence/display/c/…Torino
@Brightness because Windows was first a DOS extension instead of a real operating systemToupee
Note fseek(fp, 0L, SEEK_END); on a binary stream is not strictly-conforming, portable C code. Per footnote 268 of the C standard: "Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream...", and ftell() on a text stream won't work: "For a text stream, its file position indicator contains unspecified information ... not necessarily a meaningful measure of the number of characters written or read."Marji
wiki.sei.cmu.edu/confluence/display/c/…Pfaff
@MichealJohnson did you mean lseek() instead of fseek()? The man page I'm referencing says, "Upon successful completion, fgetpos(), fseek(), fsetpos() return 0".Saga
@Saga You would appear to be correct. I'm assuming I got the two confused as the answer was using fseek. Indeed the manual page for fseek says that it will return 0, while lseek will return the offset. Also, referring back to the code where I have used this technique myself to determine the size of a file, I have indeed used lseek and not fseek.Freeness
@Saga Apart from the return value, the other main difference between these functions seems to be that one takes a FILE* file handle while the other takes an int file handle. This doesn't tend to matter on Linux (as long as you're consistent with which functions you use e.g. fopen vs open) but I've had issues with porting software to other platforms where the FILE* functions are supported but the int ones are not. I don't know the details behind this but I'm guessing one is a C standard and the other is a Linux (or POSIX? but I thought POSIX was supported on Windows) extension.Freeness
@MichealJohnson: yes, fopen() is in the Standard C Library and open came from POSIX. https://mcmap.net/q/81223/-c-fopen-vs-open does a good job of discussing the differences.Saga
@VolodymyrM.Lisivka Why do you add one to the value returned by lseek? I have tested it without adding one, and it still equal to the output of stat. Yeah, bowelling outdated comments is my hobby :)Plica
This answer is absolutely incorrect , and will silently break on esoteric platforms. See wiki.sei.cmu.edu/confluence/display/c/…Himself
@Eudoca Re: "Returns a signed int, so limited to 2 GB": Why does fseek have "long int offset" instead of "long long int offset"?.Decollate
For anyone seeing it here, you don't need ftell, lseek returns current position with one less syscallPrismatic
S
458

Using standard library:

Assuming that your implementation meaningfully supports SEEK_END:

fseek(f, 0, SEEK_END); // seek to end of file
size = ftell(f); // get current file pointer
fseek(f, 0, SEEK_SET); // seek back to beginning of file
// proceed with allocating memory and reading the file

Linux/POSIX:

You can use stat (if you know the filename), or fstat (if you have the file descriptor).

Here is an example for stat:

#include <sys/stat.h>
struct stat st;
stat(filename, &st);
size = st.st_size;

Win32:

You can use GetFileSize or GetFileSizeEx.

Semination answered 26/10, 2008 at 20:59 Comment(15)
Please note that I have omitted error checking in the interest of clarity.Semination
You don't need the filename - you can use fstat for that.Extravascular
You need to point stat the address of the struct. The second line should be: stat(filename, &st);Towage
can also use rewind(f) to move file pointer back to start of fileRanzini
I have omitted error checking in the interest of -FATAL ERROR, EXITING.Piling
The second option is the only one that can show files sizes larger than 2GBEudoca
replace fseek and ftell with fseeko and ftello. Then it works for files greater than 2 GB as well..!!Rupee
You can get the file descriptor from a FILE* with fileno.Riyal
Nice POSIX solution. Go for that!Hold
Is need "free" struct stat st ?Acetum
The second option is much better !Muck
IMHO, omitting error checking in this simple case makes the code dangerously incorrect. For instance, if the filename doesn't exist, stat() will fail, and it is by no means clear that the st structure's st_size field doesn't have stack garbage in it. At the very least, if stat() fails, the size delivered ought to be 0. And you can include the error check without even changing the line count of the example: size = (stat(filename, &st) == 0) ? st.st_size : 0;.Garton
Note: sys/stat.h is also available on Windows (at least for me on Visual Studio 2019).Horowitz
@Eudoca I cant see why is that, the field st_size is of type off_t which is the same as the returned value of seek, and it is signed 32 bit as far as I have seen. can you explain me this issue?Abortion
@Abortion because 2^31 = 2GB with the 32nd bit used for sign.Eudoca
G
131

If you have the file descriptor fstat() returns a stat structure which contain the file size.

#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>

// fd = fileno(f); //if you have a stream (e.g. from fopen), not a file descriptor.
struct stat buf;
fstat(fd, &buf);
off_t size = buf.st_size;
Gurney answered 26/10, 2008 at 21:17 Comment(7)
Add "fd = fileno(f);" if you have a stream (e.g. from fopen), not a file descriptor. Needs error checking.Medley
Of course it needs error checking - that would just complicate the example.Gurney
this is in my opinion the best real answer, and i think we all have our training wheels off for the most part in C, do we really need error checking and other unnecessary code in our examples, its bad enough M$DN does it in theirs, lets not follow suit, instead just say at the end 'make sure to add error checking' and be done with it.Sato
If you call this with fileno(), it may be inaccurate due to file caching. I'm not aware of a method to get a FILE's length without causing the buffer to flush.Lunalunacy
a LOT of the users of SO are students of C, not past masters. Therefore, the code given in the answers should show the error checking, so the student learns the right way to code.Code
there is the detail that (f)stat() returns the block allocation total bytes while fseek() / ftell() sequence returns the number of bytes before EOF is encountered.Code
@user3629249: stat gives you both numbers. st_size is the real length, with byte granularity. st_blocks is the number of 512-byte disk blocks used by the file (including extra blocks for metadata, attributes, and even block-lists or extent-lists for large files where the list of blocks or extents doesn't fit in the inode itself.) Whether the FS actually allocates in 512B blocks or not, that's the unit stat uses. (man7.org/linux/man-pages/man2/lstat.2.html). For most filesystems, st_size is accurate, but not on Linux /proc and /sysToffey
M
28

I ended up just making a short and sweet fsize function(note, no error checking)

int fsize(FILE *fp){
    int prev=ftell(fp);
    fseek(fp, 0L, SEEK_END);
    int sz=ftell(fp);
    fseek(fp,prev,SEEK_SET); //go back to where we were
    return sz;
}

It's kind of silly that the standard C library doesn't have such a function, but I can see why it'd be difficult as not every "file" has a size(for instance /dev/null)

Maki answered 27/3, 2011 at 2:9 Comment(3)
Good point for restoring previous position indicator of the file stream.Hazen
ftell(fp) returns long. No need to possible shorted to int and lose information.Townscape
For anyone seeing it here, you don't need ftell, lseek returns current position with one less syscallPrismatic
L
19

How to use lseek/fseek/stat/fstat to get filesize ?

#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>

void
fseek_filesize(const char *filename)
{
    FILE *fp = NULL;
    long off;

    fp = fopen(filename, "r");
    if (fp == NULL)
    {
        printf("failed to fopen %s\n", filename);
        exit(EXIT_FAILURE);
    }

    if (fseek(fp, 0, SEEK_END) == -1)
    {
        printf("failed to fseek %s\n", filename);
        exit(EXIT_FAILURE);
    }

    off = ftell(fp);
    if (off == -1)
    {
        printf("failed to ftell %s\n", filename);
        exit(EXIT_FAILURE);
    }

    printf("[*] fseek_filesize - file: %s, size: %ld\n", filename, off);

    if (fclose(fp) != 0)
    {
        printf("failed to fclose %s\n", filename);
        exit(EXIT_FAILURE);
    }
}

void
fstat_filesize(const char *filename)
{
    int fd;
    struct stat statbuf;

    fd = open(filename, O_RDONLY, S_IRUSR | S_IRGRP);
    if (fd == -1)
    {
        printf("failed to open %s\n", filename);
        exit(EXIT_FAILURE);
    }

    if (fstat(fd, &statbuf) == -1)
    {
        printf("failed to fstat %s\n", filename);
        exit(EXIT_FAILURE);
    }

    printf("[*] fstat_filesize - file: %s, size: %lld\n", filename, statbuf.st_size);

    if (close(fd) == -1)
    {
        printf("failed to fclose %s\n", filename);
        exit(EXIT_FAILURE);
    }
}

void
stat_filesize(const char *filename)
{
    struct stat statbuf;

    if (stat(filename, &statbuf) == -1)
    {
        printf("failed to stat %s\n", filename);
        exit(EXIT_FAILURE);
    }

    printf("[*] stat_filesize - file: %s, size: %lld\n", filename, statbuf.st_size);

}

void
seek_filesize(const char *filename)
{
    int fd;
    off_t off;

    if (filename == NULL)
    {
        printf("invalid filename\n");
        exit(EXIT_FAILURE);
    }

    fd = open(filename, O_RDONLY, S_IRUSR | S_IRGRP);
    if (fd == -1)
    {
        printf("failed to open %s\n", filename);
        exit(EXIT_FAILURE);
    }

    off = lseek(fd, 0, SEEK_END);
    if (off == -1)
    {
        printf("failed to lseek %s\n", filename);
        exit(EXIT_FAILURE);
    }

    printf("[*] seek_filesize - file: %s, size: %lld\n", filename, (long long) off);

    if (close(fd) == -1)
    {
        printf("failed to close %s\n", filename);
        exit(EXIT_FAILURE);
    }
}

int
main(int argc, const char *argv[])
{
    int i;

    if (argc < 2)
    {
        printf("%s <file1> <file2>...\n", argv[0]);
        exit(0);
    }

    for(i = 1; i < argc; i++)
    {
        seek_filesize(argv[i]);
        stat_filesize(argv[i]);
        fstat_filesize(argv[i]);
        fseek_filesize(argv[i]);
    }

    return 0;
}
Lyssa answered 26/10, 2008 at 20:55 Comment(5)
or if(off == (-1L)) no need for (long)Tica
ftell returns a long, unfortunately. You need ftello to return an off_t. (Or apparently on Windows, _ftelli64(), because it seems they love to make it harder to write portable code.) See discussion on another answerToffey
fstat only makes sense if you already have an open file, or as part of the process of opening it. Your fstat_filesize isn't something you'd ever want to use in that form, only if you were going to actually keep that fd around and read from it or something. open/fstat/close has zero advantage over stat; I'd have written that function to take a FILE *fp (use fileno()) or int fd. I guess your functions aren't intended to be used as-is because they only printf the results instead of returning them, though.Toffey
Also, since you're not passing O_CREAT to open, the 3rd arg is unused. S_IRUSR | S_IRGRP is not meaningful there. If open was going to create the file, that would give it 0440 aka r--r----- permissions (which would stop anything else from opening and writing to it), but it won't without O_CREAT so the int open(const char *pathname, int flags); form of the prototype applies. man7.org/linux/man-pages/man2/open.2.htmlToffey
Other than the design of fstat_filesize, yeah this is a useful example of how to do error checking. Except you should fprintf(stderr, ... with your error messages. And in the functions using POSIX stat and friends, you should be using strerror as part of that to get an actual reason for the failure, like "no such file or directory" for ENOENT or "Permission Denied" for EPERM. That's much more useful and the standard way to report errors in Unix programs. (System call and file name is better than nothing, the user might not be thinking of permissions if you don't tell them.)Toffey
A
9

Have you considered not computing the file size and just growing the array if necessary? Here's an example (with error checking ommitted):

#define CHUNK 1024

/* Read the contents of a file into a buffer.  Return the size of the file 
 * and set buf to point to a buffer allocated with malloc that contains  
 * the file contents.
 */
int read_file(FILE *fp, char **buf) 
{
  int n, np;
  char *b, *b2;

  n = CHUNK;
  np = n;
  b = malloc(sizeof(char)*n);
  while ((r = fread(b, sizeof(char), CHUNK, fp)) > 0) {
    n += r;
    if (np - n < CHUNK) { 
      np *= 2;                      // buffer is too small, the next read could overflow!
      b2 = malloc(np*sizeof(char));
      memcpy(b2, b, n * sizeof(char));
      free(b);
      b = b2;
    }
  }
  *buf = b;
  return n;
}

This has the advantage of working even for streams in which it is impossible to get the file size (like stdin).

Argentic answered 29/10, 2009 at 13:38 Comment(5)
Maybe the realloc function could be used here instead of using an intermediate pointer and having to free().Kinlaw
This has the very real disadvantage of being O(n^2) ... the size of the thing you have to copy grows. OK for small files, TERRIBLE for big ones. If you have a 1k chunk and a 100M file, you end up copying (if I did my math right) roughly 1E17 bytes. That may be a pathological example, but it demonstrates why you should not do this.Excitability
Unless I am misreading, the size being stored into doubles each time. The run-time is therefore O(n) rather than O(n^2). This is the same allocation strategy that is typically used for std::vector and its ilk. Regardless, reallocations are still less efficient than querying the file size and reading all at once.Cypher
This is doubling on each reallocation. Any constant factor resize greater than one is sufficient to get the O(n) bound, literal doubling is maybe overkill, to scale by 1.75 e.g. use np += (np / 2) + (np / 4); - all integer, intermediate results don't overflow "early". I'd more likely use 1.5, but 1.75 shows the idea better. Of course watch out for overflow, and particularly any multiple of the previous size may overflow when the actual size doesn't. If your file size is (2^31)-1, this will probably attempt to allocate a buffer with -(2^31) rather than 2^31 bytes.Jehad
I should probably warn that np += (np / 2) + (np / 4) doesn't give an exact multiply by 1.75 - results can be too small because no carry propagates from bits that were truncated away - but it should be good enough for this purpose. For multiplying by 1.5, np += (np / 2); should be correct.Jehad
V
8

If you're on Linux, seriously consider just using the g_file_get_contents function from glib. It handles all the code for loading a file, allocating memory, and handling errors.

Volz answered 26/10, 2008 at 21:13 Comment(6)
If you're on Linux and want to have a dependency on glib, that is.Townie
Not that bad of a problem, as glib is used by both GTK and KDE applications now. It's also available on Mac OS X and Windows, but it's not nearly as standard there.Volz
But is not glib a c++ library? The question stipulated CVachell
@DaveAppleton: No, glib is very much a plain C library, not C++.Bituminous
@BenCombee glib's not on android, last I checked.Burdened
Neither GTK, nor KDE is on android by defaultHoover

© 2022 - 2024 — McMap. All rights reserved.