Why does fseek have "long int offset" instead of "long long int offset"?
Asked Answered
C

1

6

C2x, 7.21.9.2 The fseek function:

Synopsis

#include <stdio.h>
int fseek(FILE *stream, long int offset, int whence);

Why does fseek have long int offset instead of long long int offset?

It seems that on operating systems with data model LLP64 or ILP32 (e.g. Microsoft Windows) the 2147483647 (2 GB) may be insufficient.

Note: POSIX's lseek has off_t offset, where off_t "isn't very rigorously defined".

Chalcography answered 7/2, 2022 at 15:4 Comment(9)
That's why every C library usually have 64-bit extensions, to handle 64-bit offsets. MSVC, for example, have _fseeki64. Regarding lseek, Linux have lseek64 which uses the guaranteed 64-bit type off64_t.Hubble
It's an unfortunate historical precedent. Clearly (at least, with 20/20 hindsight) it would have been better to have defined fseek and ftell in terms of off_t, or something.Boole
We're stuck with these kludges and compromises, seemingly forever. Back in the early seventies, the original seek call gave way to lseek, as Unix learned how to deal with 32-bit (!) file sizes. Fast forward to today, and we've got this litany of stat64 and _fseeki64 and lseek64 calls. ("lseek64" is a particularly ghastly misnomer; it should clearly be "seek64" or "llseek".)Boole
I can see, some 10 years from now, people asking, "Why is it long long int (64-bit) and not long long long int (128-bit)?"Infidelity
@AdrianMole Hopefully will switch to qubits before it happens.Perlman
@SteveSummit llseek might be confusing as linux already has _llseek which splits a 64 bit offset into two 32 bit args. It might be ghastly but given that we already have lseek we probably want to keep lseek as part of the replacement name(s). When I'm looking at a code base and asking the question: Where are all the places seeking is done? I'd like to be able to do a grep on lseek and get a match on either lseek or lseek64 On 64 bit systems lseek works by default. For 32 bit, we can do: #define _LARGEFILE*_SOURCE and lseek worksAbney
@SteveSummit Hence the existence of fseeko and ftello in POSIX-1.2001.Biting
long long int was added in C99, but fseek was already defined to use long int offsets before C99.Biting
@AdrianMole long long int supports 9.22 EB (exabytes). Should be enough for the next 50 years I guess. Example: 1 hour of 512K (sic!) video takes ~400 TB. Not sure though about the 512K video.Chalcography
D
4

The C Standard was formalized in 1990 when most hard drives were smaller than 2 GB. The prototype for fseek() was already in broad use with a long type offset and 32 bits seemed large enough for all purposes, especially since the corresponding system call used the same API already. They did add fgetpos() and fsetpos() for exotic file systems where a simple long offset did not carry all the necessary information for seeking, but kept the fpos_t type opaque.

After a few years, when 64-bit offsets became necessary, many operating systems added 64-bit versions of the system calls and POSIX introduced fseeko() and ftello() to provide a high level interface for larger offsets. These extensions are not necessary anymore for 64-bit versions of common operating systems (linux, OS/X) but Microsoft decided to keep it's long, or more precisely LONG, type at 32-bits, solidifying this issue and other ones too such as size_t being larger than unsigned long. This very unfortunate decision plagues C developers on Win64 platforms ever since and forces them to use non portable APIs for large files.

Changing fseek and ftell prototypes would create more problems with existing software as it would break compatibility, so it will not happen.

Some other historical shortcomings are even more surprising, such as the prototype for fgets:

char *fgets(char * restrict s, int n, FILE * restrict stream);

Why did they use int instead of size_t is a mystery: back in 1990, int and size_t had the same size on most platforms and it did not make sense to pass a negative value anyway. Again, this inconsistent API is here to stay.

Dioxide answered 7/2, 2022 at 17:0 Comment(11)
Nice point about fgets, although, anybody trying to read a line longer than 32767 characters (or, these days, 2147483647 characters!) probably has other problems, anyway. :-)Boole
@SteveSummit: text files with lines longer than 32K are common place: minified JS files for example. System generated XML files can easily break the 2GB barrier, a challenge for getline() users :)Dioxide
"plagues C developers on Win64 platforms ever since and forces them to use non portable APIs for large files." --> I suspect this is a deliberate choice.Coricoriaceous
@chux-ReinstateMonica: we are on the same page :)Dioxide
@Dioxide Files with "lines" longer than 32K are not, IMHO, text files, and no sane person (again, IMHO) reads or processes them a line at a time.Boole
@SteveSummit Agree about sane people, yet automated systems can and do create exceptionally long strings and it is those that stress code. Still I generally agree that any input larger than some N is more likely nefarious than good and deserves error handling rather than allow outside forces to compel code to handle huge strings.Coricoriaceous
@chux-ReinstateMonica Re: "a deliberate choice": thanks for the link, interesting!Chalcography
This very unfortunate decision plagues C developers on Win64 platforms ever since and forces them to use non portable APIs for large files. same to programmers on 32-bit *nix. They all have to use other solutionsLydie
@phuclv: except programmers that must deal with huge files on 32-bit unix know they are in a legacy world and can use standard POSIX functions. Modern unix systems do not have this issue.Dioxide
no that's not a legacy world, there are still plenty 32-bit MCUs running Linux and they'll never disappearLydie
@phuclv: I agree, it is an embedded world where files rarely exceed 2GB, but it is a POSIX world with a standard solution.Dioxide

© 2022 - 2024 — McMap. All rights reserved.