Every C programmer knows there is no way to securely use gets
unless standard input is connected to a trusted source. But why didn't the developers of C notice such a glaring mistake before it was made an official part of the C standard? And why did it take until C11 to remove it from the standard and replace it with a function that performs bounds-checking? I'm aware fgets
is typically used in its place, but that has the annoying habit of keeping the \n
at the end.
The answer is simply that C is a very old language, dating back to the early 1970s. The sort of security threats we take for granted today weren't on the horizon when the language was first developed.
For a long time, C was the in-house language at AT&T. It was difficult to find commercial compilers for C until the late 1970s. But when the UNIX operating system was rewritten in C, compilers became more readily available, and the language took off, especially after Kernighan and Ritchie's 1978 standard reference, The C Programming Language
.
Despite its widespread and growing popularity, the language itself wasn't standardized until 1989. By that point, C was nearly 20 years old and there was a lot of installed C code. The standards committee was relatively conservative; it worked on the assumption that the standard would codify existing practices rather than require new ways of doing things. The buffer overflow vulnerability of gets()
seemed trivial compared to the cost of declaring a large portion of the installed code base nonstandard.
The Morris internet worm of 1988 did make clear the need for more secure coding practises, but even so, back in the late 1980s the internet was still extremely nascent. (If I remember correctly, an early 1990s Macintosh book by David Pogue answered the question of how to connect a Mac to the Internet with something to the effect of "Don't bother, the Internet isn't worth the effort".) One can hardly fault the standards committee for misjudging the exponential growth of the Internet and attended security threats.
When the standard was revised in 1999, matters had changed, of course. However, the committee again chose to be cautious about invalidating existing code, and so to deprecate rather than remove gets()
altogether. It's debatable whether this was the right decision, but it wasn't obviously the wrong one.
Retaining gets()
in the C11 standard would obviously have been the wrong decision, and the current standard very properly eliminates it. But your question rests on the assumption that this was "always already" the right thing to do, and from a historical perspective, that assumption seems questionable.
gets
with something more fgets
-like... or something that puts "Have you done the exercises from K&Rs \"The C Programming Language\", yet?"
into the buffer. People might then notice that most solutions using stdin
are misusing the console, and needn't be so complex if they're designed consistently from the beginning. Anyway, this answer seems like the best attempt to answer the actual question. –
Oleaceous C originally came from a time before internetworking of computers was widespread. In the context of the time, if you wrote a program in C that used gets()
, and then complained that you crashed it by giving it an input that was too big, the response would just have been "well, don't do that then!". The entire concept of "untrusted input" was almost nonsense - the input was explicitly provided by the operator.
The C89 standard did not remove it because the standards committee was tasked primarily with codifying existing practice, and gets()
was definitely part of existing practice by that point.
It was deprecated in C99, as a first step towards its removal, which then happened in C11 as you note.
Whether putting gets
in the standard is controversial in the first place, but the Committee decided that gets
was useful when the programmer does have adequate control over the input.
Here's the official explanation by the Committee.
Rationale for International Standard - Programming Languages C §7.19.7.7 The
gets
function:Because
gets
does not check for buffer overrun, it is generally unsafe to use when its input is not under the programmer’s control. This has caused some to question whether it should appear in the Standard at all. The Committee decided thatgets
was useful and convenient in those special circumstances when the programmer does have adequate control over the input, and as longstanding existing practice, it needed a standard specification. In general, however, the preferred function isfgets
(see §7.19.7.2).
The mandate for the initial ANSI standard was to codify existing practice, not invent a new language.
That's made clear in the rationale documents:
The original X3J11 charter clearly mandated codifying common existing practice, and the C89 Committee held fast to precedent wherever that was clear and unambiguous. The vast majority of the language defined by C89 was precisely the same as defined in Appendix A of the first edition of The C Programming Language by Brian Kernighan and Dennis Ritchie, and as was implemented in almost all C translators of the time. (This document is hereinafter referred to as K&R.)
Hence, because gets
was part of the language, it was made part of the standard. There are other things that are unsafe that are still there, practitioners are expected to know how to use their tools wisely.
And, if you're worried by the superfluous newline, it's easy enough to fix:
{
size_t len = strlen (buffer);
if ((len > 0) && (buffer[len-1] == '\n'))
buffer[len-1] = '\0';
}
or the simpler:
buffer[strcspn (buffer, "\n")] = '\n';
You could even write your own fgets
front end to do that for you, such as this one here, apparently written by one of the more intelligent and good looking members of SO :-)
'\n'
whos existence you don't care about: size_t len = strcspn(buffer, "\n"); buffer[len] = '\0';
–
Oleaceous Space and time constraints of early computing technology did not allow for the more practical safety practices that are commonplace today. Existing flawed routines were maintained for code compatibility reasons.
© 2022 - 2024 — McMap. All rights reserved.
fgets
placing a'\n'
at the end of your lines wouldn't be so annoying, if you knew what it's purpose is. – Oleaceous