Is fgets() returning NULL with a short buffer compliant?
Asked Answered
F

3

16

In unit testing a function containing fgets(), came across an unexpected result when the buffer size n < 2. Obviously such a buffer size is foolish, but the test is exploring corner cases.

Simplified code:

#include <error.h>
#include <stdio.h>

void test_fgets(char * restrict s, int n) {
  FILE *stream = stdin;
  s[0] = 42;
  printf("< s:%p n:%d stream:%p\n", s, n, stream);
  char *retval = fgets(s, n, stream);
  printf("> errno:%d feof:%d ferror:%d retval:%p s[0]:%d\n\n",
    errno, feof(stream), ferror(stream), retval, s[0]);
}

int main(void) {
  char s[100];
  test_fgets(s, sizeof s);  // Entered "123\n" and works as expected
  test_fgets(s, 1);         // fgets() --> NULL, feof() --> 0, ferror() --> 0
  test_fgets(s, 0);         // Same as above
  return 0;
}

What is surprising is that fgets() returns NULL and neither feof() nor ferror() are 1.

The C spec, below, seems silent on this rare case.

Questions:

  • Is returning NULL without setting feof() nor ferror() compliant behavior?
  • Could a different result be compliant behavior?
  • Does it make a difference if n is 1 or less than 1?

Platform: gcc version 4.5.3 Target: i686-pc-cygwin

Here is an abstract from the C11 Standard, some emphasis mine:

7.21.7.2 The fgets function

The fgets function reads at most one less than the number of characters specified by n [...]

The fgets function returns s if successful. If end-of-file is encountered and no characters have been read into the array, the contents of the array remain unchanged and a null pointer is returned. If a read error occurs during the operation, the array contents are indeterminate and a null pointer is returned.

Related postings
How to use feof and ferror for fgets (minishell in C)
Trouble creating a shell in C (Seg-Fault and ferror)
fputs(), fgets(), ferror() questions and C++ equivalents
Return value of fgets()


[Edit] Comments on answers

@Shafik Yaghmour well presented the overall issue: since the C spec does not mention what to do when it does not read any data nor write any data to s when (n <= 0), it is Undefined Behavior. So any reasonable response should be acceptable such as return NULL, set no flags, leave buffer alone.

As to what should happen when n==1, @Oliver Matthews answer and @Matt McNabb comment indicate a C spec's lack of clarity considering a buffer of n == 1. The C spec seems to favor a buffer of n == 1 should return the buffer pointer with s[0] == '\0', but is not explicit enough.

Formyl answered 30/4, 2014 at 12:51 Comment(9)
It's probably worth noting that quoting the c11 spec is potentially wrong - although it probably hasn't changed in c11, gcc 4.5 does not support any of c11...Jurisdiction
Also, behavior in gcc 4.8 is different - (s,1) returns errno:0 feof:0 ferror:0 retval:0x7fff183c87a0 s[0]:0. (s,0) returns errno:0 feof:0 ferror:0 retval:(nil) s[0]:42Jurisdiction
@Oliver Matthews 1) True about potentially wrong quote - mine is shortend from a draft C11. 2) Yes, although quote is C11-ish, compilation was not done in C11. I think C99 - will check later.Formyl
@OliverMatthews, it's probably better to say the behavior is different in the glibc included in your more recent distribution that contains gcc 4.8. It can be very well the case that it was a bug in glibc that was fixed.Possie
For clarity, this behavior is entirely determined by glibc; the gcc version has nothing to do with it. (The complete list of <stdio.h> functions that gcc has special knowledge of is here: gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/… You can see that fgets is not in that list.)Terrorize
@Zack I originally thought it might be a builtin but then I did not see it in this list.Mcmath
It's intriguing to me that the signature of fgets() is char *fgets(char * restrict s, int n, FILE * restrict stream); (see also POSIX fgets()) with the size being an int rather than size_t. That means that n can be negative but I'm not sure what should happen if it is. There's no space for a null byte, so it shouldn't write anything. Returning NULL is reasonable. POSIX can set errno, but it does not mention EINVAL, which would be most appropriate.Baier
@JonathanLeffler I suspect the genesis of fgets() is old and so int n rather than size_t. IMO, the weaknesses inherit in the corners of fgets() are long overdue for replacement long get_a_line(size_t n, char *dest, FILE *stream) like function.Formyl
The fgets() function was part of the original standard I/O package from 7th Edition Unix (circa 1978), and that pre-dates size_t by about a decade. And the standard I/O library was added at about the same time that unsigned was added to C, so the library may have chosen not to use the new features because not all C compilers supported the new features.Baier
M
8

The behavior is different in newer releases of glibc, for n == 1, it returns s which indicates success, this is not an unreasonable reading of 7.19.7.2 The fgets function paragraph 2 which says (it is the same in both C99 and C11, emphasis mine):

char *fgets(char * restrict s, int n, FILE * restrict stream);

The fgets function reads at most one less than the number of characters specified by n from the stream pointed to by stream into the array pointed to by s. No additional characters are read after a new-line character (which is retained) or after end-of-file. A null character is written immediately after the last character read into the array.

Not terribly useful but does not violate anything said in the standard, it will read at most 0 characters and null-terminate. So the results you are seeing looks like a bug that was fixed in later releases of glibc. It also clearly not an end of file nor a read error as covered in paragraph 3:

[...]If end-of-file is encountered and no characters have been read into the array, the contents of the array remain unchanged and a null pointer is returned. If a read error occurs during the operation, the array contents are indeterminate and a null pointer is returned.

As far as the final case where n == 0 this looks like it is simply undefined behavior. The draft C99 standard section 4. Conformance paragraph 2 says (emphasis mine):

If a ‘‘shall’’ or ‘‘shall not’’ requirement that appears outside of a constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words ‘‘undefined behavior’’ or by the omission of any explicit definition of behavior. There is no difference in emphasis among these three; they all describe ‘‘behavior that is undefined’’.

The wording is the same in C11. It is impossible to read at most -1 characters and it is neither an end of file nor a read error. So we have no explicit definition of the behavior in this case. Looks like a defect but I cannot find any defect reports that cover this.

Mcmath answered 30/4, 2014 at 13:32 Comment(2)
while I agree with your conclusion that test_fgets(s, 0); is undefined behavior, I don't think it is impossible to read at most -1 characters. Reading no characters is the only way to read at most any negative number of characters. The subtle semantics question is this: can a negative number describe a number of characters? If no, the behavior is undefined for negative values, if yes there is another interesting corner case: test_fgets(s, INT_MIN);.Climber
@chqrlie: test_fgets(s, INT_MIN); looks like it would invoke undefined behavior by pointer arithmetic underflow. s is probably less than INT_MAX.Consolation
J
3

tl;dr: that version of glibc has a bug for n=1, the spec has (arguably) an ambiguity for n<1; but I think newer glibc's take the most sensible option.

So, the c99 spec is basically the same.

The behavior for test_fgets(s, 1) is wrong. glibc 2.19 gives the correct output (retval!=null, s[0]==null.

The behavior for test_fgets(s,0) is undefined, really. It wasn't successful (you can't read at most -1 characters), but it doesn't hit either of the two 'return null' criteria (EOF& 0 read; read error).

However, GCC's behavior is arguably correct (returning the pointer to the unchanged s would also be OK) - feof isn't set, because it hasn't hit eof; ferror isn't set because there wasn't a read error.

I suspect the logic in gcc (not got the source to hand) has an 'if n<=0 return null' near the top.

[edit:]

On reflection, I actually think that glibc's behavior for n=0 is the most correct response it could give:

  • No eof read, so feof()==0
  • No reads, so no read error could have happened, so ferror=0

Now as for the return value - fgets cannot have read -1 characters (it's impossible). If fgets returned the passed in pointer back, it would look like a successful call. - Ignoring this corner case, fgets commits to returning a null-terminated string. If it didn't in this case, you couldn't rely on it. But fgets will set the character after after the last character read into the array to null. given we read in -1 characters (apparantly) on this call, that would make it setting the 0th character to null?

So, the sanest choice is to return null (in my opinion).

Jurisdiction answered 30/4, 2014 at 13:31 Comment(2)
In n1256, it says "The fgets function returns s if successful". However it doesn't define what "successful" is. This seems like a defect in the Standard to me.Densimeter
Not convinced of that. I'd read 'successful' as able to perform the task it is defined to do (i.e. read up to n-1 characters from s and null terminate the result).Jurisdiction
C
3

The C Standard (C11 n1570 draft) specifies fgets() this way (some emphasis mine):

7.21.7.2 The fgets function

Synopsis

   #include <stdio.h>
   char *fgets(char * restrict s, int n,
               FILE * restrict stream);

Description

The fgets function reads at most one less than the number of characters specified by n from the stream pointed to by stream into the array pointed to by s. No additional characters are read after a new-line character (which is retained) or after end-of-file. A null character is written immediately after the last character read into the array.

Returns

The fgets function returns s if successful. If end-of-file is encountered and no characters have been read into the array, the contents of the array remain unchanged and a null pointer is returned. If a read error occurs during the operation, the array contents are indeterminate and a null pointer is returned.

The phrase reads at most one less than the number of characters specified by n is not precise enough. A negative number cannot represent a number of characters*, but 0 does mean no characters. reading at most -1 characters does not seem possible, so the case of n <= 0 is not defined by the Standard, and as such has undefined behavior.

For n = 1, fgets is specified as reading at most 0 characters, which it should succeed at unless the stream is invalid or in an error condition. The phrase A null character is written immediately after the last character read into the array is ambiguous as no characters have been read into the array, but it makes sense to interpret this special case as meaning s[0] = '\0';. The specification for gets_s offers the same reading, with the same imprecision. Again the behavior is not explicitly defined, so it is undefined1.

The specification of snprintf is more precise, the case of n = 0 is explicitly specified, with useful semantics attached. Unfortunately, such semantics cannot be implemented for fgets:

7.21.6.5 The snprintf function

Synopsis

#include <stdio.h>
int snprintf(char * restrict s, size_t n,
     const char * restrict format, ...);

Description

The snprintf function is equivalent to fprintf, except that the output is written into an array (specified by argument s) rather than to a stream. If n is zero, nothing is written, and s may be a null pointer. Otherwise, output characters beyond the n-1st are discarded rather than being written to the array, and a null character is written at the end of the characters actually written into the array. If copying takes place between objects that overlap, the behavior is undefined.

The specification for get_s() also clarifies the case of n = 0 and makes it a runtime constraint violation:

K.3.5.4.1 The gets_s function

Synopsis

#define __STDC_WANT_LIB_EXT1__ 1
#include <stdio.h>
char *gets_s(char *s, rsize_t n);

Runtime-constraints

s shall not be a null pointer. n shall neither be equal to zero nor be greater than RSIZE_MAX. A new-line character, end-of-file, or read error shall occur within reading n-1 characters from stdin.

If there is a runtime-constraint violation, s[0] is set to the null character, and characters are read and discarded from stdin until a new-line character is read, or end-of-file or a read error occurs.

Description

The gets_s function reads at most one less than the number of characters specified by n from the stream pointed to by stdin, into the array pointed to by s. No additional characters are read after a new-line character (which is discarded) or after end-of-file. The discarded new-line character does not count towards number of characters read. A null character is written immediately after the last character read into the array.

If end-of-file is encountered and no characters have been read into the array, or if a read error occurs during the operation, then s[0] is set to the null character, and the other elements of s take unspecified values.

Recommended practice

The fgets function allows properly-written programs to safely process input lines too long to store in the result array. In general this requires that callers of fgets pay attention to the presence or absence of a new-line character in the result array. Consider using fgets (along with any needed processing based on new-line characters) instead of gets_s.

Returns

The gets_s function returns s if successful. If there was a runtime-constraint violation, or if end-of-file is encountered and no characters have been read into the array, or if a read error occurs during the operation, then a null pointer is returned.

The C library you are testing seems to have a bug for this case, which was fixed un later versions of the glibc. Returning NULL should mean some kind of failure condition (the opposite of success): end-of-file or read-error. Other cases such as invalid stream or stream not open for reading are more or less explicitly described as undefined behavior.

The cases of n = 0 and n < 0 are not defined. Returning NULL is a sensible choice, but it would be useful to clarify the description of fgets() in the Standard to require n > 0 as is the case for gets_s.

Note that there is another specification issue for fgets: the type of the n argument should have been size_t instead of int, but this function was originally specified by the C authors before size_t was even invented, and kept unchanged in the first C Standard (C89). Changing it then was considered unacceptable because they were trying to standardize existing usage: the signature change would have created inconsistencies across C libraries and broken well written existing code that uses function pointers or unprototyped functions.


1The C Standard specifies in paragraph 2 of 4. Conformance that If a “shall” or “shall not” requirement that appears outside of a constraint or runtime-constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words “undefined behavior” or by the omission of any explicit definition of behavior. There is no difference in emphasis among these three; they all describe “behavior that is undefined”.

Climber answered 1/2, 2017 at 9:30 Comment(11)
"For n = 1, fgets is specified as reading at most 0 characters, which it should succeed at unless the stream is ... in an error condition." --> Unclear if you refer to the error indicator. Had a previous read set the error indicator, nothing specifies that a set indicator affects the return value for subsequent reads - only that "If a read error occurs during the operation ...". OTOH the error indicator has its squishy specification-ness too and probing that is perhaps another question.Formyl
@chux: what I am hinting at is this: the buffer has no room for any character, so no attempt will be made to read a byte from the stream, so no possibility for input failure. Yet if the error or end-of-file flags are set returning NULL would be consistent with the state of the stream. Otherwise returning s seems reasonable.Climber
@chux: If the stream is in a state such that any read would fail, I would not think that testing such condition and returning NULL before even looking at n should be construed as violating the Standard, but nor should looking at n, concluding that it's not necessary to actually read anything, and returning success.Maremma
@Maremma I agree with your result given a scenario of "in a state such that any (subsequent) read would fail". Yet having such an error state is not required - ferror() may simply report the accumulated error results of prior reads - a parity error as an example - not a state affecting future reads. The result of ferror() need not affect the next input. If NULL is returned due to an input error from a prior fgets(), the next fgets() may return a non-error free result. IAC, the C spec lacks details concerning behavior about this and even more so when n==1.Formyl
@chux: The C does not require that any means exist by which a stream might enter such a state, but allows for the possibility that streams may somehow enter such a state. Allowing an implementation to choose in arbitrary fashion whether to report success or failure when reading zero bytes is a tiny fraction of the freedom the Standard would have to allow for many implementations to be considered compliant (I/O failures might cause messages to pop up on the console or have other such effects).Maremma
@chux: the C Standard is a bit more precise about the end-of-file case: 7.21.7.1 The fgetc function [...] If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end- of-file indicator for the stream is set and the fgetc function returns EOF. Hence fgets() should definitely return NULL if the end-of-file indicator is set, even if n == 1.Climber
@Climber Agree completely about the greater specify concerning the end-of-file indicator. The error indicator and ferror() are the tricky part that my comments have been addressing.Formyl
It is not unspecified. The C standard specifically calls this category of behaviour undefined - i.e. omission of any defined behaviour.Eb
@AnttiHaapala: indeed as specified in paragraph 2 of 4. Conformance If a “shall” or “shall not” requirement that appears outside of a constraint or runtime-constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words “undefined behavior” or by the omission of any explicit definition of behavior. There is no difference in emphasis among these three; they all describe “behavior that is undefined”. I shall update the answer.Climber
If I wrote a C library, my function would abort the program ;-)Eb
@AnttiHaapala: in debug mode, why not, but in production, setting errno to EINVAL and returning NULL is a sensible choice.Climber

© 2022 - 2024 — McMap. All rights reserved.