Why does fopen take a string as its second argument?

Asked 25/3, 2010 at 20:50 Answered 30/12, 2019 at 7:50

Solved c file-io fopen history c-standard-library

It has always struck me as strange that the C function fopen() takes a const char * as the second argument. I would think it would be easier to both read your code and implement the library if there were bit masks defined in stdio.h, like IO_READ and such, so you could do things like:

FILE *myFile = fopen("file.txt", IO_READ | IO_WRITE);

Is there a programmatic reason for the way it actually is, or is it just historic? (i.e. 'That's just the way it is.')

Hisakohisbe answered 25/3, 2010 at 20:50 Comment(1)

I have always been bothered by this in the C library – Farceur 4/9, 2022 at 19:56

One word: legacy. Unfortunately we have to live with it.

Just speculation: Maybe at the time a const char * seemed more flexible solution, because it is not limited in any way. A bit mask could only have 32 different values. Looks like a YAGNI to me now.

More speculation: Dudes were lazy and writing "rb" requires less typing than MASK_THIS | MASK_THAT :)

Bridgetbridgetown answered 25/3, 2010 at 20:58 Comment(12)

Why though? Masks seem the natural choice here, especially in the day fopen was designed. – Ramey 25/3, 2010 at 21:1

@GMan: As I speculated in my answer, I'm wondering if programmers' time was seen as the more valuable resource. But I have no real proof. – Penitence 25/3, 2010 at 21:4

@GMan: I'm not sure masks are necessarily a more natural choice than a string descriptor. Either are reasonable, defensible design options. – Alterable 25/3, 2010 at 21:17

"unfortunately"? how often does it really annoy you? – Markham 25/3, 2010 at 23:13

I'm going to check this answer, simply because it seems like the most likely explanation. I like Michael Burr's answer as well, but as he said, this is much less likely to have crossed Mike Lesk's mind I imagine. In retrospect, although this question does probably have one definitive answer, I can't really verify anything to make a correct selection! My mistake. – Hisakohisbe 26/3, 2010 at 3:33

"A bit mask could only have 32 different values" -- When C was invented, a bit mask could only have 16 different values. – Lobworm 26/3, 2010 at 4:39

@Windows programmer: No, C does not have a bit mask type. Any integral type will work (preferably unsigned), and unsigned long is a minimum of 32 bits. – Belfry 26/3, 2010 at 14:22

When C was invented, the second parameter to open() had type int, int had 16 bits, unsigned didn't exist, and long didn't exist. The second parameter to open() was an int that was used as a bit mask. Since the int was being used as a bit mask, 16 bits could represent 16 different values. – Lobworm 29/3, 2010 at 2:26

And note that back in the era when the standard I/O library was defined and described (Version 7 Unix and earlier), the open() system call only took two arguments. If open() failed because the file didn't exist, you had to call creat() with the name and mode to create the file. – Bebel 10/7, 2018 at 18:38

Note too that const was not a keyword in C for more than a decade after the (pre-standard) Standard I/O library was written. When the C standard was written, it became possible to state that the C library functions would not modify the strings passed as arguments. – Bebel 10/8, 2020 at 14:18

But the number of possible open flags can easily fit in even 16 bits – Farceur 17/1, 2023 at 17:21

@user16217248 But the number of possible open flags can easily fit in even 16 bits No they can not. There are 30 different options just for recfm there. – Kimmie 28/3, 2023 at 19:5

I believe that one of the advantages of the character string instead of a simple bit-mask is that it allows for platform-specific extensions which are not bit-settings. Purely hypothetically:

FILE *fp = fopen("/dev/something-weird", "r+,bs=4096");

For this gizmo, the open() call needs to be told the block size, and different calls can use radically different sizes, etc. Granted, I/O has been organized pretty well now (such was not the case originally — devices were enormously diverse and the access mechanisms far from unified), so it seldom seems to be necessary. But the string-valued open mode argument allows for that extensibility far better.

On IBM's mainframe MVS o/s, the fopen() function does indeed take extra arguments along the general lines described here — as noted by Andrew Henle (thank you!). The manual page includes the example call (slightly reformatted):

FILE *fp = fopen("myfile2.dat", "rb+, lrecl=80, blksize=240, recfm=fb, type=record");

The underlying open() has to be augmented by the ioctl() (I/O control) call or fcntl() (file control) or functions hiding them to achieve similar effects.

Bebel answered 26/3, 2010 at 14:19 Comment(7)

Thanks Jonathan. That's another interesting advantage of strings I was not aware of. – Hisakohisbe 27/3, 2010 at 21:7

See IBM's MVS fopen() documentation for actual examples of this. – Kimmie 10/7, 2018 at 12:3

on Windows it can receive the encoding: fopen("newfile.txt", "rt+, ccs=encoding") – Shrewmouse 19/2, 2020 at 7:57

Instead fopen could have been a variadic function that takes binary flags and optional implementation defined extra arguments. – Farceur 30/9, 2022 at 18:44

@user16217248 Instead fopen could have been a variadic function ... Not with the current API. Unlike open(), where O_CREAT being set is the flag used to indicate the presence of the mode argument, there's no way to use the strings passed to fopen() to indicate the absence or presence of an extended argument of any type without extending the contents of these string beyond the existing standard values, and it would change the function prototype for everyone, with potentially unknown consequences for already-compiled code. And if you're going to do that all that... – Kimmie 28/3, 2023 at 18:57

@AndrewHenle But could it not have been like that from the beginning? – Farceur 28/3, 2023 at 18:58

@user16217248 — yes, it could have been like that from the beginning, but it wasn't. At the moment, history is not changeable, sadly (though there are plenty of people trying to revise our interpretation of history, and altering what's printed in books, etc.). – Bebel 28/3, 2023 at 20:10

Dennis Ritchie (in 1993) wrote an article about the history of C, and how it evolved gradually from B. Some of the design decisions were motivated by avoiding source changes to existing code written in B or embryonic versions of C.

In particular, Lesk wrote a 'portable I/O package' [Lesk 72] that was later reworked to become the C `standard I/O' routines

The C preprocessor wasn't introduced until 1972/3, so Lesk's I/O package was written without it! (In very early not-yet-C, pointers fit in integers on the platforms being used, and it was totally normal to assign an implicit-int return value to a pointer.)

Many other changes occurred around 1972-3, but the most important was the introduction of the preprocessor, partly at the urging of Alan Snyder [Snyder 74]

Without #include and #define, an expression like IO_READ | IO_WRITE wasn't an option.

The options in 1972 for what fopen calls could look in typical source without CPP are:

FILE *fp = fopen("file.txt", 1);       // magic constant integer literals
FILE *fp = fopen("file.txt", 'r');     // character literals
FILE *fp = fopen("file.txt", "r");     // string literals

Magic integer literals are obviously horrible, so unfortunately the obviously most efficient option (which Unix later adopted for open(2)) was ruled out by lack of a preprocessor.

A character literal is obviously not extensible; presumably that was obvious to API designers even back then. But it would have been sufficient (and more efficient) for early implementations of fopen: They only supported single-character strings, checking for *mode being r, w, or a. (See @Keith Thompson's answer.) Apparently r+ for read+write (without truncating) came later. (See fopen(3) for the modern version.)

C did have a character data type (added to B 1971 as one of the first steps in producing embryonic C, so it was still new in 1972. Original B didn't have char, having been written for machines that pack multiple characters into a word, so char() was a function that indexed a string! See Ritchie's history article.)

Using a single-byte string is effectively passing a char by const-reference, with all the extra overhead of memory accesses because library functions can't inline. (And primitive compilers probably weren't inlining anything, even trival functions (unlike fopen) in the same compilation unit where it would shrink total code size to inline them; Modern style tiny helper functions rely on modern compilers to inline them.)

PS: Steve Jessop's answer with the same quote inspired me to write this.

Possibly related: strcpy() return value. strcpy was probably written pretty early, too.

Beefcake answered 10/7, 2018 at 11:38 Comment(4)

Why can fseek use integer constants (SEEK_SET, SEEK_CUR, SEEK_END) but not fopen? – Farceur 17/1, 2023 at 17:23

@user16217248: Good question; I'm curious whether it existed in Lesk's original code that became C stdio. If it was added later, that would be the obvious reason. Otherwise perhaps early code did at some point have to use magic constant integers before CPP existed. Or possibly it was used less and wasn't as painful to change. – Beefcake 17/1, 2023 at 17:46

Anyways I think they should have changed the fopen interface as soon as the named constants became available before it was too late (it kind of is now) – Farceur 17/1, 2023 at 18:56

@user16217248: Apparently they put a big emphasis on backwards compat with existing codebases even very very early, for code that was written before C was even standardized. Like x86, backwards compat was perhaps a reason for early success, but is now a burden whose design can't realistically be changed. – Beefcake 17/1, 2023 at 23:38