The function getchar
returns an int
which
if the function is successful, represents the character code of the next character on the stream in the range of an unsigned char
(i.e. a non-negative value), or
if the function fails, represents the special value EOF
(which is a negative value) to indicate failure.
The reason why getchar
always returns
- a non-negative number for a valid character, and
- a negative number to indicate failure,
is that it must be possible to distinguish between a valid character and the special value EOF
(which indicates failure and is not a valid character).
If you store the int
return value of getchar
in an unsigned char
, then you will lose information and will no longer be able to distinguish between a valid character and the special value EOF
.
On most platforms, EOF
is defined as the value -1
and an unsigned char
can represent the range 0
to 255
. On these platforms, the following applies:
The function getchar
can return an int
value in the range -1
to 255
, which is 257 possible values. The range 0
to 255
(which is 256 possible values) is used for valid character codes and the value -1
is used to indicate failure (EOF
).
If you store the int
return value of getchar
in a variable of type unsigned char
, then you will only have 256 instead of 257 possible values. The value -1
will be converted to the value 255
. This means that the unsigned char
variable is unable to represent the value EOF
and you will no longer be able to tell whether getchar
returned the value EOF
or whether it returned the valid character code 255
. The value 255
could mean both.
You will get a similar problem if you store the return value of getchar
in a variable of type signed char
, because a signed char
is also only able to represent 256
different values, but you need to be able to represent 257
different values. Even if a signed char
has the advantage that it is able to represent the value EOF
, you will still have the problem that you cannot distinguish EOF
from a valid character, because the value -1
could mean both. It could mean EOF
or it could mean a valid character with the character code 255
.
For this reason, you should always first store the return value of getchar
in a variable of type int
. Only after determining that getchar
did not return EOF
is it safe to store the return value in a variable of type unsigned char
or signed char
, because you no longer need to distingish valid characters from the special value EOF
.
The same also applies for storing the return value of getchar
in a char
. On some platforms, char
is equivalent to signed char
, and on some other platforms, char
is equivalent to unsigned char
. The ISO C standard allows both.
while ((c = getc(file)) != EOF)
loop won't stop executing. – Torero