Disadvantages of scanf
Asked Answered
S

9

91

I want to know the disadvantages of scanf().

In many sites, I have read that using scanf might cause buffer overflows. What is the reason for this? Are there any other drawbacks with scanf?

Spitfire answered 12/3, 2010 at 3:20 Comment(1)
See also A Beginners' Guide Away From scanf().Mcadams
M
73

The problems with scanf are (at a minimum):

  • using %s to get a string from the user, which leads to the possibility that the string may be longer than your buffer, causing overflow.
  • the possibility of a failed scan leaving your file pointer in an indeterminate location.

I very much prefer using fgets to read whole lines in so that you can limit the amount of data read. If you've got a 1K buffer, and you read a line into it with fgets you can tell if the line was too long by the fact there's no terminating newline character (last line of a file without a newline notwithstanding).

Then you can complain to the user, or allocate more space for the rest of the line (continuously if necessary until you have enough space). In either case, there's no risk of buffer overflow.

Once you've read the line in, you know that you're positioned at the next line so there's no problem there. You can then sscanf your string to your heart's content without having to save and restore the file pointer for re-reading.

Here's a snippet of code which I frequently use to ensure no buffer overflow when asking the user for information.

It could be easily adjusted to use a file other than standard input if necessary and you could also have it allocate its own buffer (and keep increasing it until it's big enough) before giving that back to the caller (although the caller would then be responsible for freeing it, of course).

#include <stdio.h>
#include <string.h>

#define OK         0
#define NO_INPUT   1
#define TOO_LONG   2
#define SMALL_BUFF 3
static int getLine (char *prmpt, char *buff, size_t sz) {
    int ch, extra;

    // Size zero or one cannot store enough, so don't even
    // try - we need space for at least newline and terminator.

    if (sz < 2)
        return SMALL_BUFF;

    // Output prompt.

    if (prmpt != NULL) {
        printf ("%s", prmpt);
        fflush (stdout);
    }

    // Get line with buffer overrun protection.

    if (fgets (buff, sz, stdin) == NULL)
        return NO_INPUT;

    // Catch possibility of `\0` in the input stream.

    size_t len = strlen(buff);
    if (len < 1)
        return NO_INPUT;

    // If it was too long, there'll be no newline. In that case, we flush
    // to end of line so that excess doesn't affect the next call.

    if (buff[len - 1] != '\n') {
        extra = 0;
        while (((ch = getchar()) != '\n') && (ch != EOF))
            extra = 1;
        return (extra == 1) ? TOO_LONG : OK;
    }

    // Otherwise remove newline and give string back to caller.
    buff[len - 1] = '\0';
    return OK;
}

And, a test driver for it:

// Test program for getLine().

int main (void) {
    int rc;
    char buff[10];

    rc = getLine ("Enter string> ", buff, sizeof(buff));
    if (rc == NO_INPUT) {
        // Extra NL since my system doesn't output that on EOF.
        printf ("\nNo input\n");
        return 1;
    }

    if (rc == TOO_LONG) {
        printf ("Input too long [%s]\n", buff);
        return 1;
    }

    printf ("OK [%s]\n", buff);

    return 0;
}

Finally, a test run to show it in action:

$ printf "\0" | ./tstprg     # Singular NUL in input stream.
Enter string>
No input

$ ./tstprg < /dev/null       # EOF in input stream.
Enter string>
No input

$ ./tstprg                   # A one-character string.
Enter string> a
OK [a]

$ ./tstprg                   # Longer string but still able to fit.
Enter string> hello
OK [hello]

$ ./tstprg                   # Too long for buffer.
Enter string> hello there
Input too long [hello the]

$ ./tstprg                   # Test limit of buffer.
Enter string> 123456789
OK [123456789]

$ ./tstprg                   # Test just over limit.
Enter string> 1234567890
Input too long [123456789]
Mariettemarigold answered 12/3, 2010 at 3:24 Comment(7)
if (fgets (buff, sz, stdin) == NULL) return NO_INPUT; Why did you use NO_INPUT as return value? fgets returns NULL only on error.Generative
@Fabio, not quite. It also returns null if the stream is closed before any input has been made. That's the case being caught here. Don't make the mistake that NO_INPUT means empty input (pressing ENTER before anything else) - the latter gives you an empty string with no NO_INPUT error code.Mariettemarigold
The latest POSIX standard allows char *buf; scanf("%ms", &buf); which will allocate enough space for you with malloc (so it must be freed later), which would help prevent buffer overruns.Slavery
What happens if we call getLine with 1 as the sz parameter? if (buff[strlen(buff)-1] != '\n') is where the problem occurs. Perhaps if (!sz) { return TOO_LONG; } if (buff[sz = strcspn(buff, "\n")] == '\n' || getchar() == '\n') { buff[sz] = '\0'; return OK; } unsigned char c; while (fread(&c, 1, 1, stdin) == 1 && c != '\n'); return TOO_LONG; which indeed won't overflow when you pass sz <= 1 and has the added benefit of removing the '\n' for you at zero overhead though it should be noticed that your code could be enhanced by strategic use of scanf...Lactic
@autistic, that's a good point, thanks for that, I have to admit I've never tried it with a buffer size of one simply because that cannot return any useful information. So I've just decided to catch that fast and make it an error condition.Mariettemarigold
size_t lastPos = strlen(buff) - 1; if (buff[lastPos] != '\n') { is exploitable as UB by entering a null character first.Nye
That's a good catch, @chux, I've added an extra check for that to treat it as "no input". Testing was done with printf "\0" | exeName to verify original problem and fix. I guess I never checked with an insane input scenario like that (but I damn well should have). Thanks for the heads up.Mariettemarigold
T
85

Most of the answers so far seem to focus on the string buffer overflow issue. In reality, the format specifiers that can be used with scanf functions support explicit field width setting, which limit the maximum size of the input and prevent buffer overflow. This renders the popular accusations of string-buffer overflow dangers present in scanf virtually baseless. Claiming that scanf is somehow analogous to gets in the respect is completely incorrect. There's a major qualitative difference between scanf and gets: scanf does provide the user with string-buffer-overflow-preventing features, while gets doesn't.

One can argue that these scanf features are difficult to use, since the field width has to be embedded into format string (there's no way to pass it through a variadic argument, as it can be done in printf). That is actually true. scanf is indeed rather poorly designed in that regard. But nevertheless any claims that scanf is somehow hopelessly broken with regard to string-buffer-overflow safety are completely bogus and usually made by lazy programmers.

The real problem with scanf has a completely different nature, even though it is also about overflow. When scanf function is used for converting decimal representations of numbers into values of arithmetic types, it provides no protection from arithmetic overflow. If overflow happens, scanf produces undefined behavior. For this reason, the only proper way to perform the conversion in C standard library is functions from strto... family.

So, to summarize the above, the problem with scanf is that it is difficult (albeit possible) to use properly and safely with string buffers. And it is impossible to use safely for arithmetic input. The latter is the real problem. The former is just an inconvenience.

P.S. The above in intended to be about the entire family of scanf functions (including also fscanf and sscanf). With scanf specifically, the obvious issue is that the very idea of using a strictly-formatted function for reading potentially interactive input is rather questionable.

Tanya answered 12/3, 2010 at 6:45 Comment(4)
I just have to point out, it's not that you can't read arithmetic input safely, more that you can't do it correctly and robustly for dirty input. To me there's a huge difference between crashing my program and/or opening the OS to attack and simply getting a few wrong values when users try purposeful mischief. What do I care if they typed in 1431337.4044194872987 and got 4.0 instead? Either way they entered 4.0. (Sometimes it might matter, but how often?)Stirling
Third paragraph: scanf will gladly read a value of >2^32 if encountered in the string, into a 32 bit integer and cause undefined behavior?Shopping
@2501: Yes, precisely. At least that's what happens according to the language standard.Tanya
"Claiming that scanf is somehow analogous to gets in the respect is completely incorrect." I get it, scanf at least does allow you to specify the maximum field size, but the ideological use of %s certainly has the same issues as gets, and as with many other dangerous yet useful tools in C, they're all easy to abuse. Even strtoul has its perils, so rather than suggesting that people stop using parts of C, can't we just jump to suggesting that people stop using all of C?Lactic
M
73

The problems with scanf are (at a minimum):

  • using %s to get a string from the user, which leads to the possibility that the string may be longer than your buffer, causing overflow.
  • the possibility of a failed scan leaving your file pointer in an indeterminate location.

I very much prefer using fgets to read whole lines in so that you can limit the amount of data read. If you've got a 1K buffer, and you read a line into it with fgets you can tell if the line was too long by the fact there's no terminating newline character (last line of a file without a newline notwithstanding).

Then you can complain to the user, or allocate more space for the rest of the line (continuously if necessary until you have enough space). In either case, there's no risk of buffer overflow.

Once you've read the line in, you know that you're positioned at the next line so there's no problem there. You can then sscanf your string to your heart's content without having to save and restore the file pointer for re-reading.

Here's a snippet of code which I frequently use to ensure no buffer overflow when asking the user for information.

It could be easily adjusted to use a file other than standard input if necessary and you could also have it allocate its own buffer (and keep increasing it until it's big enough) before giving that back to the caller (although the caller would then be responsible for freeing it, of course).

#include <stdio.h>
#include <string.h>

#define OK         0
#define NO_INPUT   1
#define TOO_LONG   2
#define SMALL_BUFF 3
static int getLine (char *prmpt, char *buff, size_t sz) {
    int ch, extra;

    // Size zero or one cannot store enough, so don't even
    // try - we need space for at least newline and terminator.

    if (sz < 2)
        return SMALL_BUFF;

    // Output prompt.

    if (prmpt != NULL) {
        printf ("%s", prmpt);
        fflush (stdout);
    }

    // Get line with buffer overrun protection.

    if (fgets (buff, sz, stdin) == NULL)
        return NO_INPUT;

    // Catch possibility of `\0` in the input stream.

    size_t len = strlen(buff);
    if (len < 1)
        return NO_INPUT;

    // If it was too long, there'll be no newline. In that case, we flush
    // to end of line so that excess doesn't affect the next call.

    if (buff[len - 1] != '\n') {
        extra = 0;
        while (((ch = getchar()) != '\n') && (ch != EOF))
            extra = 1;
        return (extra == 1) ? TOO_LONG : OK;
    }

    // Otherwise remove newline and give string back to caller.
    buff[len - 1] = '\0';
    return OK;
}

And, a test driver for it:

// Test program for getLine().

int main (void) {
    int rc;
    char buff[10];

    rc = getLine ("Enter string> ", buff, sizeof(buff));
    if (rc == NO_INPUT) {
        // Extra NL since my system doesn't output that on EOF.
        printf ("\nNo input\n");
        return 1;
    }

    if (rc == TOO_LONG) {
        printf ("Input too long [%s]\n", buff);
        return 1;
    }

    printf ("OK [%s]\n", buff);

    return 0;
}

Finally, a test run to show it in action:

$ printf "\0" | ./tstprg     # Singular NUL in input stream.
Enter string>
No input

$ ./tstprg < /dev/null       # EOF in input stream.
Enter string>
No input

$ ./tstprg                   # A one-character string.
Enter string> a
OK [a]

$ ./tstprg                   # Longer string but still able to fit.
Enter string> hello
OK [hello]

$ ./tstprg                   # Too long for buffer.
Enter string> hello there
Input too long [hello the]

$ ./tstprg                   # Test limit of buffer.
Enter string> 123456789
OK [123456789]

$ ./tstprg                   # Test just over limit.
Enter string> 1234567890
Input too long [123456789]
Mariettemarigold answered 12/3, 2010 at 3:24 Comment(7)
if (fgets (buff, sz, stdin) == NULL) return NO_INPUT; Why did you use NO_INPUT as return value? fgets returns NULL only on error.Generative
@Fabio, not quite. It also returns null if the stream is closed before any input has been made. That's the case being caught here. Don't make the mistake that NO_INPUT means empty input (pressing ENTER before anything else) - the latter gives you an empty string with no NO_INPUT error code.Mariettemarigold
The latest POSIX standard allows char *buf; scanf("%ms", &buf); which will allocate enough space for you with malloc (so it must be freed later), which would help prevent buffer overruns.Slavery
What happens if we call getLine with 1 as the sz parameter? if (buff[strlen(buff)-1] != '\n') is where the problem occurs. Perhaps if (!sz) { return TOO_LONG; } if (buff[sz = strcspn(buff, "\n")] == '\n' || getchar() == '\n') { buff[sz] = '\0'; return OK; } unsigned char c; while (fread(&c, 1, 1, stdin) == 1 && c != '\n'); return TOO_LONG; which indeed won't overflow when you pass sz <= 1 and has the added benefit of removing the '\n' for you at zero overhead though it should be noticed that your code could be enhanced by strategic use of scanf...Lactic
@autistic, that's a good point, thanks for that, I have to admit I've never tried it with a buffer size of one simply because that cannot return any useful information. So I've just decided to catch that fast and make it an error condition.Mariettemarigold
size_t lastPos = strlen(buff) - 1; if (buff[lastPos] != '\n') { is exploitable as UB by entering a null character first.Nye
That's a good catch, @chux, I've added an extra check for that to treat it as "no input". Testing was done with printf "\0" | exeName to verify original problem and fix. I guess I never checked with an insane input scenario like that (but I damn well should have). Thanks for the heads up.Mariettemarigold
A
22

From the comp.lang.c FAQ: Why does everyone say not to use scanf? What should I use instead?

scanf has a number of problems—see questions 12.17, 12.18a, and 12.19. Also, its %s format has the same problem that gets() has (see question 12.23)—it’s hard to guarantee that the receiving buffer won’t overflow. [footnote]

More generally, scanf is designed for relatively structured, formatted input (its name is in fact derived from “scan formatted”). If you pay attention, it will tell you whether it succeeded or failed, but it can tell you only approximately where it failed, and not at all how or why. You have very little opportunity to do any error recovery.

Yet interactive user input is the least structured input there is. A well-designed user interface will allow for the possibility of the user typing just about anything—not just letters or punctuation when digits were expected, but also more or fewer characters than were expected, or no characters at all (i.e., just the RETURN key), or premature EOF, or anything. It’s nearly impossible to deal gracefully with all of these potential problems when using scanf; it’s far easier to read entire lines (with fgets or the like), then interpret them, either using sscanf or some other techniques. (Functions like strtol, strtok, and atoi are often useful; see also questions 12.16 and 13.6.) If you do use any scanf variant, be sure to check the return value to make sure that the expected number of items were found. Also, if you use %s, be sure to guard against buffer overflow.

Note, by the way, that criticisms of scanf are not necessarily indictments of fscanf and sscanf. scanf reads from stdin, which is usually an interactive keyboard and is therefore the least constrained, leading to the most problems. When a data file has a known format, on the other hand, it may be appropriate to read it with fscanf. It’s perfectly appropriate to parse strings with sscanf (as long as the return value is checked), because it’s so easy to regain control, restart the scan, discard the input if it didn’t match, etc.

Additional links:

References: K&R2 Sec. 7.4 p. 159

Astronavigation answered 12/3, 2010 at 6:39 Comment(0)
M
6

It is very hard to get scanf to do the thing you want. Sure, you can, but things like scanf("%s", buf); are as dangerous as gets(buf);, as everyone has said.

As an example, what paxdiablo is doing in his function to read can be done with something like:

scanf("%10[^\n]%*[^\n]", buf));
getchar();

The above will read a line, store the first 10 non-newline characters in buf, and then discard everything till (and including) a newline. So, paxdiablo's function could be written using scanf the following way:

#include <stdio.h>

enum read_status {
    OK,
    NO_INPUT,
    TOO_LONG
};

static int get_line(const char *prompt, char *buf, size_t sz)
{
    char fmt[40];
    int i;
    int nscanned;

    printf("%s", prompt);
    fflush(stdout);

    sprintf(fmt, "%%%zu[^\n]%%*[^\n]%%n", sz-1);
    /* read at most sz-1 characters on, discarding the rest */
    i = scanf(fmt, buf, &nscanned);
    if (i > 0) {
        getchar();
        if (nscanned >= sz) {
            return TOO_LONG;
        } else {
            return OK;
        }
    } else {
        return NO_INPUT;
    }
}

int main(void)
{
    char buf[10+1];
    int rc;

    while ((rc = get_line("Enter string> ", buf, sizeof buf)) != NO_INPUT) {
        if (rc == TOO_LONG) {
            printf("Input too long: ");
        }
        printf("->%s<-\n", buf);
    }
    return 0;
}

One of the other problems with scanf is its behavior in case of overflow. For example, when reading an int:

int i;
scanf("%d", &i);

the above cannot be used safely in case of an overflow. Even for the first case, reading a string is much more simpler to do with fgets rather than with scanf.

Mercurate answered 12/3, 2010 at 6:39 Comment(0)
E
5

Yes, you are right. There is a major security flaw in scanf family(scanf,sscanf, fscanf..etc) esp when reading a string, because they don't take the length of the buffer (into which they are reading) into account.

Example:

char buf[3];
sscanf("abcdef","%s",buf);

clearly the the buffer buf can hold MAX 3 char. But the sscanf will try to put "abcdef" into it causing buffer overflow.

Empiric answered 12/3, 2010 at 3:25 Comment(6)
You can provide "%10s" as the format specifier and it will read no more than 10 characters into the buffer.Slavery
Sure - it's possible to use the API safely. It's also possible to use dynamite to clear dirt out of your garden safely. But I wouldn't recommend either, especially since there are safer alternatives.Contrast
My dad used to use gelignite for clearing down trees on the farm. You just have to understand your tools and know the dangers.Mariettemarigold
That buffer can only hold 2 chars since you need to reserve one for the null terminator.Fleur
@codaddict: The fact that someone doesn't use field width with scanf is the problem with that someone, not with scanf. It is completely irrelevant to the issue in question. This is C after all, not Java.Tanya
The problem is that the field width in scanf() must be hardcoded in the conversion specifier; with printf(), you can use * in the conversion specifier and pass the length as an argument. But since * means something different in scanf(), that doesn't work, so you basically have to generate a new format for each read like Alok does in his example. It just adds more work and clutter; might as well use fgets() and be done with it.Jernigan
L
5

The advantage of scanf is once you learn how use the tool, as you should always do in C, it has immensely useful usecases. You can learn how to use scanf and friends by reading and understanding the manual. If you can't get through that manual without serious comprehension issues, this would probably indicate that you don't know C very well.


scanf and friends suffered from unfortunate design choices that rendered it difficult (and occasionally impossible) to use correctly without reading the documentation, as other answers have shown. This occurs throughout C, unfortunately, so if I were to advise against using scanf then I would probably advise against using C.

One of the biggest disadvantages seems to be purely the reputation it's earned amongst the uninitiated; as with many useful features of C we should be well informed before we use it. The key is to realise that as with the rest of C, it seems succinct and idiomatic, but that can be subtly misleading. This is pervasive in C; it's easy for beginners to write code that they think makes sense and might even work for them initially, but doesn't make sense and can fail catastrophically.

For example, the uninitiated commonly expect that the %s delegate would cause a line to be read, and while that might seem intuitive it isn't necessarily true. It's more appropriate to describe the field read as a word. Reading the manual is strongly advised for every function.

What would any response to this question be without mentioning its lack of safety and risk of buffer overflows? As we've already covered, C isn't a safe language, and will allow us to cut corners, possibly to apply an optimisation at the expense of correctness or more likely because we're lazy programmers. Thus, when we know the system will never receive a string larger than a fixed number of bytes, we're given the ability to declare an array that size and forego bounds checking. I don't really see this as a down-fall; it's an option. Again, reading the manual is strongly advised and would reveal this option to us.

Lazy programmers aren't the only ones stung by scanf. It's not uncommon to see people trying to read float or double values using %d, for example. They're usually mistaken in believing that the implementation will perform some kind of conversion behind the scenes, which would make sense because similar conversions happen throughout the rest of the language, but that's not the case here. As I said earlier, scanf and friends (and indeed the rest of C) are deceptive; they seem succinct and idiomatic but they aren't.

Inexperienced programmers aren't forced to consider the success of the operation. Suppose the user enters something entirely non-numeric when we've told scanf to read and convert a sequence of decimal digits using %d. The only way we can intercept such erroneous data is to check the return value, and how often do we bother checking the return value?

Much like fgets, when scanf and friends fail to read what they're told to read, the stream will be left in an unusual state;

  • In the case of fgets, if there isn't sufficient space to store a complete line, then the remainder of the line left unread might be erroneously treated as though it's a new line when it isn't.
  • In the case of scanf and friends, a conversion failed as documented above, the erroneous data is left unread on the stream and might be erroneously treated as though it's part of a different field.

It's no easier to use scanf and friends than to use fgets. If we check for success by looking for a '\n' when we're using fgets or by inspecting the return value when we use scanf and friends, and we find that we've read an incomplete line using fgets or failed to read a field using scanf, then we're faced with the same reality: We're likely to discard input (usually up until and including the next newline)! Yuuuuuuck!

Unfortunately, scanf both simultaneously makes it hard (non-intuitive) and easy (fewest keystrokes) to discard input in this way. Faced with this reality of discarding user input, some have tried scanf("%*[^\n]%*c");, not realising that the %*[^\n] delegate will fail when it encounters nothing but a newline, and hence the newline will still be left on the stream.

A slight adaptation, by separating the two format delegates and we see some success here: scanf("%*[^\n]"); getchar();. Try doing that with so few keystrokes using some other tool ;)

Lactic answered 22/3, 2016 at 7:20 Comment(0)
J
4

Problems I have with the *scanf() family:

  • Potential for buffer overflow with %s and %[ conversion specifiers. Yes, you can specify a maximum field width, but unlike with printf(), you can't make it an argument in the scanf() call; it must be hardcoded in the conversion specifier.
  • Potential for arithmetic overflow with %d, %i, etc.
  • Limited ability to detect and reject badly formed input. For example, "12w4" is not a valid integer, but scanf("%d", &value); will successfully convert and assign 12 to value, leaving the "w4" stuck in the input stream to foul up a future read. Ideally the entire input string should be rejected, but scanf() doesn't give you an easy mechanism to do that.

If you know your input is always going to be well-formed with fixed-length strings and numerical values that don't flirt with overflow, then scanf() is a great tool. If you're dealing with interactive input or input that isn't guaranteed to be well-formed, then use something else.

Jernigan answered 12/3, 2010 at 15:18 Comment(1)
What other sane alternatives are there for reading fixed-length strings and numerical values safely?Rhamnaceous
S
4

Many answers here discuss the potential overflow issues of using scanf("%s", buf), but the latest POSIX specification more-or-less resolves this issue by providing an m assignment-allocation character that can be used in format specifiers for c, s, and [ formats. This will allow scanf to allocate as much memory as necessary with malloc (so it must be freed later with free).

An example of its use:

char *buf;
scanf("%ms", &buf); // with 'm', scanf expects a pointer to pointer to char.

// use buf

free(buf);

See here. Disadvantages to this approach is that it is a relatively recent addition to the POSIX specification and it is not specified in the C specification at all, so it remains rather unportable for now.

Slavery answered 3/10, 2014 at 1:20 Comment(0)
E
4

There is one big problem with scanf-like functions - the lack of any type safety. That is, you can code this:

int i;
scanf("%10s", &i);

Hell, even this is "fine":

scanf("%10s", i);

It's worse than printf-like functions, because scanf expects a pointer, so crashes are more likely.

Sure, there are some format-specifier checkers out there, but, those are not perfect and well, they are not part of the language or the standard library.

Eldoree answered 13/10, 2015 at 14:48 Comment(1)
This is more of a historical issus as most modern compilers will check that the type of the arguments match what is specified in the format string and produce warnings if they do not. However, I am sure there are still plenty that don't.Virgenvirgie

© 2022 - 2024 — McMap. All rights reserved.