Why was gets part of the C standard in the first place?
Asked Answered
N

5

7

Every C programmer knows there is no way to securely use gets unless standard input is connected to a trusted source. But why didn't the developers of C notice such a glaring mistake before it was made an official part of the C standard? And why did it take until C11 to remove it from the standard and replace it with a function that performs bounds-checking? I'm aware fgets is typically used in its place, but that has the annoying habit of keeping the \n at the end.

Niobium answered 2/8, 2013 at 1:34 Comment(4)
Probably performance was a higher priority than code being secure back in those days.Demoiselle
Because it was 1973. The goal was to create an easy-to-use language that made fast code on a small PDP-7.Diaphone
you could say the same thing about strcpy() or alot of other commands.Araby
fgets placing a '\n' at the end of your lines wouldn't be so annoying, if you knew what it's purpose is.Oleaceous
B
5

The answer is simply that C is a very old language, dating back to the early 1970s. The sort of security threats we take for granted today weren't on the horizon when the language was first developed.

For a long time, C was the in-house language at AT&T. It was difficult to find commercial compilers for C until the late 1970s. But when the UNIX operating system was rewritten in C, compilers became more readily available, and the language took off, especially after Kernighan and Ritchie's 1978 standard reference, The C Programming Language.

Despite its widespread and growing popularity, the language itself wasn't standardized until 1989. By that point, C was nearly 20 years old and there was a lot of installed C code. The standards committee was relatively conservative; it worked on the assumption that the standard would codify existing practices rather than require new ways of doing things. The buffer overflow vulnerability of gets() seemed trivial compared to the cost of declaring a large portion of the installed code base nonstandard.

The Morris internet worm of 1988 did make clear the need for more secure coding practises, but even so, back in the late 1980s the internet was still extremely nascent. (If I remember correctly, an early 1990s Macintosh book by David Pogue answered the question of how to connect a Mac to the Internet with something to the effect of "Don't bother, the Internet isn't worth the effort".) One can hardly fault the standards committee for misjudging the exponential growth of the Internet and attended security threats.

When the standard was revised in 1999, matters had changed, of course. However, the committee again chose to be cautious about invalidating existing code, and so to deprecate rather than remove gets() altogether. It's debatable whether this was the right decision, but it wasn't obviously the wrong one.

Retaining gets() in the C11 standard would obviously have been the wrong decision, and the current standard very properly eliminates it. But your question rests on the assumption that this was "always already" the right thing to do, and from a historical perspective, that assumption seems questionable.

Bessette answered 2/8, 2013 at 1:58 Comment(2)
What they probably should have done is replace gets with something more fgets-like... or something that puts "Have you done the exercises from K&Rs \"The C Programming Language\", yet?" into the buffer. People might then notice that most solutions using stdin are misusing the console, and needn't be so complex if they're designed consistently from the beginning. Anyway, this answer seems like the best attempt to answer the actual question.Oleaceous
I read that when the Morris worm spread X3J11 was almost finished with the C standard. What is frustrating BTW is that snprintf did not make it into C89, even though I think RMS proposed it to X3J11 in 1987.Naseberry
I
4

C originally came from a time before internetworking of computers was widespread. In the context of the time, if you wrote a program in C that used gets(), and then complained that you crashed it by giving it an input that was too big, the response would just have been "well, don't do that then!". The entire concept of "untrusted input" was almost nonsense - the input was explicitly provided by the operator.

The C89 standard did not remove it because the standards committee was tasked primarily with codifying existing practice, and gets() was definitely part of existing practice by that point.

It was deprecated in C99, as a first step towards its removal, which then happened in C11 as you note.

Impressionable answered 2/8, 2013 at 1:45 Comment(1)
Yep, remember that most terminals at the time cannot even enter non-ASCII characters. And while there was redirection, stdin was not typically redirected to an untrusted source back then.Naseberry
C
3

Whether putting gets in the standard is controversial in the first place, but the Committee decided that gets was useful when the programmer does have adequate control over the input.

Here's the official explanation by the Committee.

Rationale for International Standard - Programming Languages C §7.19.7.7 The gets function:

Because gets does not check for buffer overrun, it is generally unsafe to use when its input is not under the programmer’s control. This has caused some to question whether it should appear in the Standard at all. The Committee decided that gets was useful and convenient in those special circumstances when the programmer does have adequate control over the input, and as longstanding existing practice, it needed a standard specification. In general, however, the preferred function is fgets (see §7.19.7.2).

Carpology answered 4/8, 2013 at 1:19 Comment(2)
"[...], but the Committee decided that gets was useful when the programmer doesn't have adequate control over the input." Is that what you meant to say?Niobium
@flarn2006 Oh, that's a typo, I'll fix it in a minute. The quoted part below is right.Carpology
O
2

The mandate for the initial ANSI standard was to codify existing practice, not invent a new language.

That's made clear in the rationale documents:

The original X3J11 charter clearly mandated codifying common existing practice, and the C89 Committee held fast to precedent wherever that was clear and unambiguous. The vast majority of the language defined by C89 was precisely the same as defined in Appendix A of the first edition of The C Programming Language by Brian Kernighan and Dennis Ritchie, and as was implemented in almost all C translators of the time. (This document is hereinafter referred to as K&R.)

Hence, because gets was part of the language, it was made part of the standard. There are other things that are unsafe that are still there, practitioners are expected to know how to use their tools wisely.

And, if you're worried by the superfluous newline, it's easy enough to fix:

{
    size_t len = strlen (buffer);
    if ((len > 0) && (buffer[len-1] == '\n'))
        buffer[len-1] = '\0';
}

or the simpler:

buffer[strcspn (buffer, "\n")] = '\n';

You could even write your own fgets front end to do that for you, such as this one here, apparently written by one of the more intelligent and good looking members of SO :-)

Oubre answered 2/8, 2013 at 1:43 Comment(2)
I suddenly feel as though one of the more intelligent and good looking members of SO is also one of the more conceited... If your brilliance doesn't yet know, you can more cleanly eliminate the '\n' whos existence you don't care about: size_t len = strcspn(buffer, "\n"); buffer[len] = '\0';Oleaceous
@undefinedbehaviour: touche :-) Nice code. I'll add a modified version of that to the answer.Oubre
A
0

Space and time constraints of early computing technology did not allow for the more practical safety practices that are commonplace today. Existing flawed routines were maintained for code compatibility reasons.

Araby answered 2/8, 2013 at 1:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.