fread is signalling EOF prematurely with a binary file
Asked Answered
B

1

2

I'm a newcomer to C. I'm attempting to make my own version of base64; the program takes input from stdin and outputs its base64 equivalent to stdout. While testing my program against a binary file, I noticed that fread-ing from stdin seemed to be returning a short count early before actually reaching EOF.

Here is the relevant portion of my main method:

int main(void)
{
    unsigned char buffer[BUFFER_SIZE];
    unsigned char base64_buffer[BASE64_BUFFER];

    while (1)
    {
        TRACE_PUTS("Reading in data from stdin...");
        size_t read = fread(buffer, 1, sizeof(buffer), stdin); /* Read the data in using fread(3) */

        /* Process the buffer */

        TRACE_PRINTF("Amount read: %zu\n", read);
        TRACE_PUTS("Beginning base64 encode of buffer");
        size_t encoded = base64_encode(buffer, read, base64_buffer, sizeof(base64_buffer));

        /* Write the data to stdout */
        TRACE_PUTS("Writing data to standard output");
        ...

        if (read < sizeof(buffer))
        {
            break; /* We reached EOF or had an error during the read */
        }
    }

    if (ferror(stdin))
    {
        /* Handle errors */
        fprintf(stderr, "%s\n", "There was a problem reading from the file.");
        exit(1);
    }

    puts(""); /* Output a newline before finishing */

    return 0;
}

As you can see, the main loop calls fread every iteration on stdin into a buffer, then at the end checks if the amount read is less than the size of the buffer. If it is, we assume there was either an error (in which case 0 was returned) or EOF was reached, and exit from the loop.

I am assuming that it is OK to check read to be < sizeof(buffer), rather than just != 0, based on this quote from fread's manpage:

On success, fread() and fwrite() return the number of items read or written. This number equals the number of bytes transferred only when size is 1. If an error occurs, or the end of the file is reached, the return value is a short item count (or zero).

This implies that if the full buffer is not read into, then EOF is reached.

With that established, this is the trace I get when I run my app against cat /bin/echo:

$ cat /bin/echo | bin/base64 >/dev/null # only view the trace output
TRACE: C:/Users/James/Code/c/base64/main.c:23: Reading in data from stdin...
TRACE: C:/Users/James/Code/c/base64/main.c:28: Amount read: 600
TRACE: C:/Users/James/Code/c/base64/main.c:29: Beginning base64 encode of buffer
TRACE: C:/Users/James/Code/c/base64/main.c:43: Writing data to standard output
TRACE: C:/Users/James/Code/c/base64/main.c:23: Reading in data from stdin...
TRACE: C:/Users/James/Code/c/base64/main.c:28: Amount read: 600
TRACE: C:/Users/James/Code/c/base64/main.c:29: Beginning base64 encode of buffer
TRACE: C:/Users/James/Code/c/base64/main.c:43: Writing data to standard output
TRACE: C:/Users/James/Code/c/base64/main.c:23: Reading in data from stdin...
TRACE: C:/Users/James/Code/c/base64/main.c:28: Amount read: 600
TRACE: C:/Users/James/Code/c/base64/main.c:29: Beginning base64 encode of buffer
TRACE: C:/Users/James/Code/c/base64/main.c:43: Writing data to standard output
TRACE: C:/Users/James/Code/c/base64/main.c:23: Reading in data from stdin...
TRACE: C:/Users/James/Code/c/base64/main.c:28: Amount read: 569
TRACE: C:/Users/James/Code/c/base64/main.c:29: Beginning base64 encode of buffer
TRACE: C:/Users/James/Code/c/base64/main.c:43: Writing data to standard output

$

And here is how large /bin/echo actually is:

$ cat /bin/echo | wc -c
28352

So as you can see, the whole file is 28352 bytes long, but my app is only reading in about ~2400 of them before it stops. Any idea why? Does fread handle null terminators specially?

I am using MinGW-w64 with GCC if that helps; thanks.

Bagpipe answered 5/9, 2016 at 22:22 Comment(0)
A
4

Are you on Windows? Yes, the pathname starts C: so you are. You've probably got a Control-Z ('\x1A' or '\32') character in the file. It (the Windows C run-time, and hence your program) won't treat standard input as a binary file unless you tweak it somehow, so the Control-Z marks the end of the input.

One possible 'somehow' to tweak the mode is _set_fmode(). However, it is more likely that you need _setmode():

_setmode(fileno(stdin), O_BINARY);

I reserve judgement on whether that's the best or only method for doing so. You can research the manuals as well as I can. I have no way to test that fileno() — or perhaps _fileno() in the Microsoft world — is available.

Aery answered 5/9, 2016 at 22:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.