*Might* an unsigned char be equal to EOF? [duplicate]
Asked Answered
W

4

10

When using fgetc to read the next character of a stream, you usually check that the end-of-file was not attained by

if ((c = fgetc (stream)) != EOF)

where c is of int type. Then, either the end-of-file has been attained and the condition will fail, or c shall be an unsigned char converted to int, which is expected to be different from EOF —for EOF is ensured to be negative. Fine... apparently.

But there is a small problem... Usually the char type has no more than 8 bits, while int must have at least 16 bits, so every unsigned char will be representable as an int. Nevertheless, in the case char would have 16 or 32 bits (I know, this is never the case in practice...), there is no reason why one could not have sizeof(int) == 1, so that it would be (theoretically!) possible that fgetc (stream) returns EOF (or another negative value) but that end-of-file has not been attained...

Am I mistaken? Is it something in the C standard that prevents fgetc to return EOF if end-of-file has not been attained? (If yes, I could not find it!). Or is the if ((c = fgetc (stream)) != EOF) syntax not fully portable?...

EDIT: Indeed, this was a duplicate of Question #3860943. I did not find that question at first search. Thank for your help! :-)

Wolof answered 30/4, 2015 at 19:12 Comment(2)
I had already read it, but this does not answer my question...Rockfish
Multiple related questions: What platforms have something other than 8-bit char?, Exotic architectures the standards committees care about, and System where 1 byte != 8 bits for a few.Ofay
Z
2

If you are reading a stream that is standard ASCII only, there's no risk of receiving the char equivalent to EOF before the real end-of-file, because valid ASCII char codes go up to 127 only. But it could happen when reading a binary file. The byte would need to be 255(unsigned) to correspond to a -1 signed char, and nothing prevents it from appearing in a binary file.

But about your specific question (if there's something in the standard), not exactly... but notice that fgetc promotes the character as an unsigned char, so it won't ever be negative in this case anyway. The only risk would be if you had explicitly or implicitly cast down the return value to signed char (for instance, if your c variable were signed char).

NOTE: as @Ulfalizer mentioned in the comments, there's one rare case in which you may need to worry: if sizeof(int)==1, and you're reading a file that contains non-ascii characters, then you may get a -1 return value that is not the real EOF. Notice that environments in which this happens are quite rare (to my knowledge, compilers for low-end 8-bit microcontrollers, like the 8051). In such a case, the safe option would be to test feof() as @pmg suggested.

Zacharyzacherie answered 30/4, 2015 at 19:20 Comment(6)
Note that e.g. the test 0xFFFFFFFF == -1 is true for 32-bit ints though. The usual arithmetic conversions convert the -1 to an unsigned int.Noted
@Ulfalizer, I meant the other way around. If fgetc finds a 0xFF byte to read, it'll be promoted to 0x000000FF (thus a positive 255), not 0xFFFFFFFF, because it is promoted as unsigned char. See an example of fgetc implementation here: mirror.fsf.org/pmon2000/3.x/src/lib/libc/fgetc.cZacharyzacherie
But C doesn't limit itself to ASCII only.Nixon
@FabioCeconello: If char and int have the same size, then you might end up with e.g. a 0xFFFFFFFF char value though. I guess the standard might imply(ish) in a few places that the value should be representable as a signed int though. Converting from unsigned to signed is undefined behavior anyway.Noted
When the value does not fit in the signed type that is.Noted
My understanding of the standard is that the only case where the size of int and char is equal is when both are 1. Int may be bigger, but not char. See #2215945 but in those exotic places where sizeof(int)==1, you're right, because a 255 unsigned char will be back to a -1 1-byte int.Zacharyzacherie
N
2

I think you need to rely on stream error.

ch = fgetc(stream);
if (ferror(stream) && (ch == EOF)) /* end of file */;

From the standard

If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF.


Edit for better version

ch = fgetc(stream);
if (ch == EOF) {
    if (ferror(stream)) /* error reading */;
    else if (feof(stream)) /* end of file */;
    else /* read valid character with value equal to EOF */;
}
Niddering answered 30/4, 2015 at 19:25 Comment(3)
I don't think reaching eof is a read error.Laufer
Maybe not ... there's always feof()Niddering
The question actually is: is it allowed for an implementation that ch == EOF to be true if end-of-file is not reached and there is no read error?Informer
G
2

You asked:

Is it something in the C standard that prevents fgetc to return EOF if end-of-file has not been attained?

On the contrary, the standard explicitly allows EOF to be returned when an error occurs.

If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF.

In the footnotes, I see:

An end-of-file and a read error can be distinguished by use of the feof and ferror functions.

You also asked:

Or is the if ((c = fgetc (stream)) != EOF) syntax not fully portable?

On the theoretical platform where CHAR_BIT is more than 8 and sizeof(int) == 1, that won't be a valid way to check that end-of-file has been reached. For that, you'll have to resort to feof and ferror.

c = fgetc (stream);
if ( !feof(stream) && !ferror(stream) )
{
  // Got valid input in c.
}
Grevera answered 30/4, 2015 at 19:46 Comment(1)
When sizeof(int) == 1, I like the if ( !feof(stream) && !ferror(stream) ) test. +1 for a portable general solution.Columbarium
I
1

I agree with your reading.

C Standard says (C11, 7.21.7.1 The fgetc function p3):

If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the endof-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream. If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF.

There is nothing in the Standard (assuming UCHAR_MAX > INT_MAX) that disallows fgetc in a hosted implementation to return a value equal to EOF that is neither an end-of-file nor an error condition indicator.

Informer answered 30/4, 2015 at 20:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.