Are there cases where fseek/ftell can give the wrong file size?
Asked Answered
N

4

5

In C or C++, the following can be used to return a file size:

const unsigned long long at_beg = (unsigned long long) ftell(filePtr);
fseek(filePtr, 0, SEEK_END);
const unsigned long long at_end = (unsigned long long) ftell(filePtr);
const unsigned long long length_in_bytes = at_end - at_beg;
fprintf(stdout, "file size: %llu\n", length_in_bytes);

Are there development environments, compilers, or OSes which can return the wrong file size from this code, based on padding or other information that is situation-specific? Were there changes in the C or C++ specification around 1999, which would have lead to this code no longer working in certain cases?

For this question, please assume I am adding large file support by compiling with the flags -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE=1. Thanks.

Notation answered 3/2, 2012 at 20:46 Comment(7)
Good luck opening /proc/cpuinfo like that :-) Or in other words, "not every file is a file". (Unless you work for Bell Labs.)Gloze
How about more conventional file or input streams?Notation
There are definitely cases where the file size will not fit in a long.Kindliness
Okay, I'll edit the question to make it an unsigned long long.Notation
I'm happy to use stat, but I'm curious about answers to the specific question, as posed. I see fseek/ftell posted online as an answer to the question of how to find file sizes, and I'd like to know where this fails, so that I can make an informed decision about what to do to make this code as portable as possible (which will likely involve stat). Thanks!Notation
For example, this is among the first search results on terms like ftell, file sizes, C and C++, which makes no mention of situations where fseek/ftell fails: cplusplus.com/reference/clibrary/cstdio/ftellNotation
Now, if you were trying to find a solution to this problem, and fseek/ftell is highlighted as among the first responses, as I've shown, then would you be curious to know what the gotchas are (perhaps by asking a programming forum like Stack Overflow)? Or would you use this code sight-unseen?Notation
P
7

It won't work on unseekable files like /proc/cpuinfo or /dev/stdin or /dev/tty, or pipe files gotten with popen

And it won't work if that file is written by another process at the same time.

Using the Posix stat function is probably more efficient and more reliable. Of course, this function might not be available on non Posix systems.

Politicking answered 3/2, 2012 at 20:59 Comment(1)
stat is indeed the more straightforward way, but it seems to fail on exactly the cases you mention. In these cases there's nothing better than reading to the end (if there is one).Valorize
B
4

The fseek and ftell functions are both defined by the ISO C language standard.

The following is from latest public draft of the 2011 C standard, but the 1990, 1999, and 2011 ISO C standards are all very similar in this area, if not identical.

7.21.9.4:

The ftell function obtains the current value of the file position indicator for the stream pointed to by stream. For a binary stream, the value is the number of characters from the beginning of the file. For a text stream, its file position indicator contains unspecified information, usable by the fseek function for returning the file position indicator for the stream to its position at the time of the ftell call; the difference between two such return values is not necessarily a meaningful measure of the number of characters written or read.

7.21.9.2:

The fseek function sets the file position indicator for the stream pointed to by stream. If a read or write error occurs, the error indicator for the stream is set and fseek fails.

For a binary stream, the new position, measured in characters from the beginning of the file, is obtained by adding offset to the position specified by whence. The specified position is the beginning of the file if whence is SEEK_SET, the current value of the file position indicator if SEEK_CUR, or end-of-file if SEEK_END. A binary stream need not meaningfully support fseek calls with a whence value of SEEK_END.

For a text stream, either offset shall be zero, or offset shall be a value returned by an earlier successful call to the ftell function on a stream associated with the same file and whence shall be SEEK_SET.

Violating any of the "shall" clauses makes your program's behavior undefined.

So if the file was opened in binary mode, ftell gives you the number of characters from the beginning of the file -- but an fseek relative to the end of the file (SEEK_END) is not necessarily meaningful. This accommodates systems that store binary files in whole blocks and don't keep track of how much was written to the final block.

If the file was opened in text mode, you can seek to the beginning or end of the file with an offset of 0, or you can seek to a position given by an earlier call to ftell; fseek with any other arguments has undefined behavior. This accomodates systems where the number of characters read from a text file doesn't necessarily correspond to the number of bytes in the file. For example, on Windows reading a CR-LF pair ("\r\n") reads only one character, but advances 2 bytes in the file.

In practice, on Unix-like systems text and binary modes behave the same way, and the fseek/ftell method will work. I suspect it will work on Windows (my guess is that ftell will give the byte offset, which may not be the same as the number of times you could call getchar() in text mode).

Note also that ftell() returns a result of type long. On systems where long is 32 bits, this method can't work for files that are 2 GiB or larger.

You might be better off using some system-specific method to get the size of a file. Since the fseek/ftell method is system-specific anyway, such as stat() on Unix-like systems.

On the other hand, fseek and ftell are likely to work as you expect on most systems you're likely to encounter. I'm sure there are systems where it won't work; sorry, but I don't have specifics.

If working on Linux and Windows is good enough, and you're not concerned with large files, then the fseek/ftell method is probably ok. Otherwise, you should consider using a system-specific method to determine the size of a file.

And keep in mind that anything that tells you the size of a file can only tell you its size at that moment. The file's size could change before you access it.

Belfry answered 4/2, 2012 at 3:25 Comment(0)
T
2
  1. Superficially, your code looks "OK" - I don't see any problem with it.

  2. No - there isn't any "C or C++ specification" that would affect fseek. There is a POSIX specification:

  1. If you want "file size", my first choice would probably by stat(). Here's the POSIX specification:
  1. If something's "going wrong" with your method, then my first guess would be "large file support".

    For example, many OS's had parallel fseek() and fseek64() APIs.

'Hope that helps .. PSM

Taker answered 3/2, 2012 at 20:58 Comment(2)
I believe that some functions of <stdio.h> including fopen fseek and ftell are probably part of the C99 language standard.Politicking
@Basile Starynkevitch: you're absolutely correct. thank you: open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdfTaker
L
1

POSIX defines the return value from fseek as "measured in bytes from the beginning of the file". Your at_beg will always be zero (assuming this is a newly opened file).

So, assuming that:

  1. the file is seekable
  2. there are no concurrency issues to be concerned about
  3. the file size is representable in the data type used by the fseek/ftell variant you choose

then your code should work on any POSIX-compliant system.

Lotus answered 3/2, 2012 at 21:44 Comment(1)
It will fail on "files" in /proc that are falsely reported as ordinary files by Linux, but in all other cases, it will work as expected. Using stat fails in many more cases, e.g. block devices (hard drives).Bronco

© 2022 - 2024 — McMap. All rights reserved.