With thanks to those who responded, and having now read relevant portions of the C99 standard, I have come to agree with the somewhat surprising conclusion that storing an arbitrary non-EOF value returned by fgetc()
as type char
without loss of fidelity is not guaranteed to be possible. In large part, that arises from the possibility that char
cannot represent as many distinct values as unsigned char
.
For their part, the stdio functions guarantee that if data are written to a (binary) stream and subsequently read back, then the read back data will compare equal to the original data. That turns out to have much narrower implications than I at first thought, but it does mean that fputs()
must output a distinct value for each distinct char
it successfully outputs, and that whatever conversion fgets()
applies to store input bytes as type char
must accurately reverse the conversion, if any, by which fputs()
would produce the input byte as its output. As far as I can tell, however, fputs()
and fgets()
are permitted to fail on any input they don't like, so it is not certain that fputs() maps every possible char
value to an unsigned char
.
Moreover, although fputs()
and fgets()
operate as if by performing sequences of fputc()
and fgetc()
calls, respectively, it is not specified what conversions they might perform between char
values in memory and the underlying unsigned char
values on the stream. If a platform's fputs()
uses standard integer conversion for that purpose, however, then the correct back-conversion is as I proposed:
int c = fgetc(stream);
char buf;
if (c >= 0) buf = (char) ((c > CHAR_MAX) ? (c - (UCHAR_MAX + 1)) : c);
That arises directly from the integer conversion rules, which specify that integer values are converted to unsigned types by adding or subtracting the integer multiple of <target type>_MAX + 1 needed to bring the result into the range of the target type, supported by the constraints on representation of integer types. Its correctness for that purpose does not depend on the specific representation of char
values or on whether char
is treated as signed or unsigned.
However, if char
cannot represent as many distinct values as unsigned char
, or if there are char
values that fgets()
refuses to output (e.g. negative ones), then there are possible values of c
that could not have resulted from a char
conversion in the first place. No back-conversion argument is applicable to such bytes, and there may not even be a meaningful sense of char
values corresponding to them. In any case, whether the given conversion is the correct reverse-conversion for data written by fputs()
seems to be implementation defined. It is certainly implementation-defined whether buf = (char) c
will have the same effect, though it does have on very many systems.
Overall, I am struck by just how many details of C I/O behavior are implementation defined. That was an eye-opener for me.