fgetc, checking EOF
Asked Answered
R

2

13

In the book Linux System Programming I have read some like this:

fgetc returns the character read as an unsigned char cast to an int or EOF on end of file or error. A common error using fgetc is:

char c;
if ((c = fgetc()) != EOF) {...}

The right version of this code is:

int c;
if ((c = fgetc()) != EOF) { printf("%c", (char)c); ... }

So, why can't I cast a return value to char before comparing with EOF? Why do I have to compare EOF exactly with int? As EOF defined as -1, isn't it normally casted to char?
Are there platforms/compilers where it is not true?

Rundown answered 15/6, 2012 at 19:48 Comment(2)
possible duplicate of "while( !feof( file ) )" is always wrongOscar
@jww: this question is not using feof() so it is not a duplicate of "while (!feof(file)) is always wrong".Vanguard
E
12

You can't cast the return value to char because the return value could be EOF, and EOF value is system-dependent and is unequal to any valid character code. link

Usually it is -1 but you should not assume that.

Check this great answer from the c-faq-site:

Two failure modes are possible if, as in the fragment above, getchar's return value is assigned to a char.

  1. If type char is signed, and if EOF is defined (as is usual) as -1, the character with the decimal value 255 ('\377' or '\xff' in C) will be sign-extended and will compare equal to EOF, prematurely terminating the input. (assuming 8 bits char).

  2. If type char is unsigned, an actual EOF value will be truncated (by having its higher-order bits discarded, probably resulting in 255 or 0xff) and will not be recognized as EOF, resulting in effectively infinite input.

Hope it helps!

Edited: (added the @FatalError comment on this answer, this is explained on the c-faq site but this looks more clear to me)

"If you cast it to char then EOF takes the same value as some valid character and hence becomes indistinguishable from that character. That alone should be enough justification to not make the result a char" @FatalError comment.

Eudoxia answered 15/6, 2012 at 19:49 Comment(8)
so what? i don't understand. after casting char will be equal to '-1', isnt it?Rundown
@Rundown The standard actually does not mandate whether char alone is signed or unsigned. If you want a guaranteed signed char, you must use signed char.Resistor
@Resistor Hm, i thought, that char is always signed. Thanks, i'll take a look at standatdRundown
@cacho i suspected it, but i didnt find any info about that. thanksRundown
If you cast it to char then EOF takes the same value as some valid character and hence becomes indistinguishable from that character. That alone should be enough justification to not make the result a char.Leaseback
@Rundown correct. As cacho mentioned, often it is, but I believe the standard only requires it to be negative and doesn't specify that it's -1.Leaseback
@FatalError, I've added your comment to the answer, I hope thats not a problem.....Eudoxia
@cacho Sure, no problem at all.Leaseback
V
5

There are two possibilities when you assign the value to a char before comparing with EOF:

  • char is a signed value. In that case, there is a legitimate character (often ÿ, SMALL LATIN LETTER Y WITH DIAERESIS, U+00FF) that will be misinterpreted as EOF.
  • char is an unsigned value. In that case, EOF will be translated to 0xFF, and then promoted to int as a positive value, which will never compare equal to EOF which is a negative value.

Either way, the program is going to misbehave some of the time.

There is (or, more accurately, used to be) a chance of a compiler bug such that the assignment occurred correctly but the assigned value was not used for the comparison. That would lead to the code appearing to work OK even when it is not. Fortunately, that is unlikely to be a problem found in a modern compiler.

Vanguard answered 15/6, 2012 at 20:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.