What are all the reasons `fgetc()` might return `EOF`?
Asked Answered
A

3

8

Certainly fgetc() returns EOF when end-of-file or an input error occurs.
Is that all and does that mean no more data is available?

FILE *inf = ...;
int ch;
while ((ch = fgetc(inf)) != EOF) {
  ;
}
if (feof(inf)) puts("End-of-file");
else if (ferror(inf)) puts("Error");
else puts("???");

Is testing with feof(), ferror() sufficient?

Note: EOF here is a macro that evaluates to some negative int, often -1. It is not a synonym for end-of-file.

I have found some questions and more that are close to this issue, yet none that enumerate all possibilities.

Agnostic answered 6/1, 2022 at 23:56 Comment(3)
I do look forward other answers (not just my own in this "answer my own question").Agnostic
Also, my interpretation is that feof() is the complement of ferror() when reading eof, but this may be a flawed assumption.Laminitis
@Laminitis John Bollinger answer to a related question may be useful.Agnostic
A
6

Is that all and does that mean no more data available?

No, there are more ways for EOF.
An EOF does not certainly mean no more data - it depends.

The C library lists three cases where fgetc() returns EOF.

If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream. If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF. C17dr § 7.21.7.1 3

Recall each stream, like stdin, has an end-of-file indicator and error indicator.

  • stream just encountered the end-of-file

    (Most common) An attempt has been made to get more data, but there was none.

  • end-of-file indicator for the stream is set

    The stream first examines its end-of-file indicator. If it sees that the indicator is set, it returns EOF. No attempt is made to see if more data exists. Some types of streams will report EOF, but data will have arrived after the prior EOF report. Until the end-of-file indicator is cleared as with clearerr(), the return remains EOF. Example 1. Example 2.

  • Input error

    The stream error indicator is not examined. Yet the function failed for some reason to read data other than end-of-file. A common example is fputc(stdin). Often input errors are persistent. Some are not. More data may be available. The common strategy is to end the input.

      // Example where ferror() is true, yet fgetc() does not return EOF
      FILE *inf = stdin;
      printf("end-of-file:%d error:%d\n", feof(inf), ferror(inf));
      printf("fputc():%d\n", fputc('?', inf));  // EOF reported
      printf("end-of-file:%d error:%d\n", feof(inf), ferror(inf));
      printf("fgetc():%d\n", fgetc(inf));  // User typed in `A`, 'A' reported
      printf("end-of-file:%d error:%d\n", feof(inf), ferror(inf));
    

    Output

    end-of-file:0 error:0
    fputc():-1
    end-of-file:0 error:1
    fgetc():65
    end-of-file:0 error:1
    

    When ferror() is true, it does not mean the error just occurred, just sometime in the past.

Other cases

  • Apparent EOF due to improperly saving as char

    fgetc() returns an int with a value in the unsigned char range and EOF - a negative value.
    When fgetc() reads character code 255, yet saves that as a char on a system where char is signed, that commonly results in the char having the same value as EOF, yet end-of-file did not occur.

        FILE *f = fopen("t", "w");
        fputc(EOF & 255, f);
        fclose(f);
        f = fopen("t", "r");
        char ch = fgetc(f); // Should be int ch
        printf ("%d %d\n", ch == EOF, ch);
        printf("end-of-file:%d error:%d\n", feof(f), ferror(f));
        fclose(f);
    

    Output

    1 -1  // ch == EOF !
    end-of-file:0 error:0
    
  • Systems where UCHAR_MAX == UINT_MAX. Rare.

    (I have only come across this in some older graphics processors, still something C allows.) In that case, fgetc() may read an unsigned char outside the int range and so convert it to EOF on the function return. Thus fgetc() is returning a character code that happens to equal EOF. This is mostly an oddity in the C history. A way to mostly handle is:

      while ((ch = fgetc(inf)) != EOF && !feof(inf) && !ferror(inf)) {
        ;
      }
    

    Such pedantic code is rarely needed.

  • Undefined behavior

    Of course when UB occurs, anything is possible.

          FILE * f = fopen("Some_non_existent_file", "r");
          // Should have tested f == NULL here
          printf("%d\n", fgetc(f) == EOF); // Result may be 1
    

A robust way to handle the return from fgetc().

FILE *inf = ...;
if (inf) {  // Add test
  int ch; // USE int !

  // Pedantic considerations, usually can be ignored
  #if UCHAR_MAX > INT_MAX
    clearerr(inf); // Clear history of prior flags
    while ((ch = fgetc(inf)) != EOF && !feof(inf) && !ferror(inf)) {
      ;
    }
  #else
    while ((ch = fgetc(inf)) != EOF) {
      ;
    }
  #endif

  if (feof(inf)) puts("End-of-file");
  else puts("Error");

If code needs to look for data after end-of-file or error, call clearerr() and repeat the if() block.

Agnostic answered 6/1, 2022 at 23:56 Comment(9)
Note: [f]printf() might reset errno.Natter
@Natter True that "The value of errno may be set to nonzero by a library function call whether or not there is an error, provided the use of errno is not documented in the description of the function ....", but how does that affect things here? errno is not the file error indicator.Agnostic
I suggest adding this (my favorite): reading byte 0x1a on Windows in text mode sets the EOF flag.Sequin
@Sequin Useful to add that "reading byte 0x1a on Windows in text mode sets the EOF flag" an an answer, but I suspect that is a compiler issue, not an OS one.Agnostic
I think it's the case on both MinGW and MSVC, so 'on windows' was a shorthand. :)Sequin
@Sequin With gcc on windows, the favorite is not there. Perhaps I misunderstand your thoughts?Agnostic
Hmm, it works at least in Wine. I think it should work on actual Windows too, but I won't be able to test until monday.Sequin
@Sequin Any testing results?Agnostic
@chux-ReinstateMonica I completely forgot. :| Tested it on both MSYS2 GCC (MINGW64 environment) and MSVC, on Win 10. Was able to reproduce on both.Sequin
F
5

Another case where EOF doesn't necessarily mean 'no more data' was (rather than 'is') reading magnetic tapes. You could have multiple files on a single tape, with the end of each marked with EOF. When you encountered EOF, you used clearerr(fp) to reset the EOF and error states on the file stream, and you could then continue reading the next file on the tape. However, magnetic tapes have (for the most part) gone the way of the dodo, so this barely counts any more.

Fonda answered 7/1, 2022 at 1:54 Comment(6)
This happens on a terminal too -- if you indicate an EOF (by hitting a system-specific key combo, usually ctrl-D or ctrl-Z) a program reading from the terminal will get an EOF. If that program then uses clearerr, it can read more from the terminal.Fabron
@ChrisDodd — that depends on which platform you're on, and/or which version of glibc you are using. You're right that on some (probably many) Linux systems, typing the EOF indicator (usually control-D) does not permanently set the EOF indicator. On most other Unix systems, once you indicate EOF on the terminal, you continue to get EOF until you clear the error and EOF indicator with clearerr(). RHEL 7.4 suffers from what I regard as 'this bug'; macOS (all versions I've used), does not.Fonda
right, but if you do call clearerr, you can then continue to read from the terminalFabron
If you try this code: #include <stdio.h> int main(void) { for (int i = 0; i < 3; i++) { int c; while ((c = getchar()) != EOF) putchar(c); puts("EOF received"); } return 0; } on your systems, you'll find that many (most) non-Linux systems take the first EOF and it remains in effect (so you get three lines saying "EOF received" immediately), but many Linux systems accept multiple lots of input with the EOF indicator typed between different sets of input.Fonda
Yes, if you call clearerr(), you can continue reading from the terminal on all systems that I know of.Fonda
Magnetic tapes are very much still used in commercial settings when there's a need to archive a huge amount of data and not access it frequently: tape wins over hard disk in terms of price per GB of the storage media alone. That said, you would probably be using some vendor-provided software to access the tape these days, rather than accessing the device directly from C, so you probably wouldn't run into this issue there anymore. See: en.wikipedia.org/wiki/Linear_Tape-OpenFlinch
S
2

Here's one obscure reason:

On Windows, reading byte 0x1A in text mode causes EOF.

By "Windows" I mean both MSVC and MinGW (so it's probably a quirk of Microsoft's CRT). This doesn't happen on Cygwin.

Sequin answered 7/1, 2022 at 17:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.