How does fread really work?
Asked Answered
W

7

81

The declaration of fread is as following:

size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);

The question is: Is there a difference in reading performance of two such calls to fread:

char a[1000];
  1. fread(a, 1, 1000, stdin);
  2. fread(a, 1000, 1, stdin);

Will it read 1000 bytes at once each time?

Wistful answered 21/12, 2011 at 11:57 Comment(0)
D
111

There may or may not be any difference in performance. There is a difference in semantics.

fread(a, 1, 1000, stdin);

attempts to read 1000 data elements, each of which is 1 byte long.

fread(a, 1000, 1, stdin);

attempts to read 1 data element which is 1000 bytes long.

They're different because fread() returns the number of data elements it was able to read, not the number of bytes. If it reaches end-of-file (or an error condition) before reading the full 1000 bytes, the first version has to indicate exactly how many bytes it read; the second just fails and returns 0.

In practice, it's probably just going to call a lower-level function that attempts to read 1000 bytes and indicates how many bytes it actually read. For larger reads, it might make multiple lower-level calls. The computation of the value to be returned by fread() is different, but the expense of the calculation is trivial.

There may be a difference if the implementation can tell, before attempting to read the data, that there isn't enough data to read. For example, if you're reading from a 900-byte file, the first version will read all 900 bytes and return 900, while the second might not bother to read anything. In both cases, the file position indicator is advanced by the number of characters successfully read, i.e., 900.

But in general, you should probably choose how to call it based on what information you need from it. Read a single data element if a partial read is no better than not reading anything at all. Read in smaller chunks if partial reads are useful.

Deform answered 21/12, 2011 at 12:16 Comment(10)
the second might not bother to read anything. In both cases, the file position indicator is advanced by the number of characters successfully read, i.e., 900 shouldn't it be that in the second version the file position indicator wouldn't advance since there was nothing read? In other words, shouldn't fread(a, 1000, N, stdin); always advance the fp indicator by a multiple of 1000?Greaseball
Nevermind, found it. C11 at 7.21.8.1.2 and 7.21.8.2.2 says: If an error occurs, the resulting value of the file position indicator for the stream is indeterminate.Greaseball
so there is no way to recover the position of the indicator? Or to avoid reading that last chunck that mess with the position indicator?Altamirano
@David天宇Wong: If you need to recover the position, call ftell before calling fread, and then fseek after.Deform
I don't really understand @KeithThompson fseek will just put me where I want, but how do I know where I want to be?Altamirano
@David天宇Wong: I don't know where you want to be. If you want to be at the position where you were before the fread call, you can call ftell before calling fread (it returns a value that indicates your current position), then pass that result to fseek after the fread call.Deform
well right when fread cannot read big chunks, so I can switch to a smaller fread. Thanks :)Altamirano
@David天宇Wong: Why would fread not be able to read big chunks? The only limits on the size of data that can be read by a single fread call should be the size of the file and the size of the memory buffer you're reading into. (It might make multiple calls to some underlying system call, but that's almost entirely transparent.)Deform
as you said in your answer above, it fails if it reaches EOF. My problem is documented here : #21648235Altamirano
The POSIX specification is much stricter ... it requires that fread does size fgetc's per object, so the exact same number of fgetc's will be done in either case (but the return values will be different).Neighbors
O
18

According to the specification, the two may be treated differently by the implementation.

If your file is less than 1000 bytes, fread(a, 1, 1000, stdin) (read 1000 elements of 1 byte each) will still copy all the bytes until EOF. On the other hand, the result of fread(a, 1000, 1, stdin) (read 1 1000-byte element) stored in a is unspecified, because there is not enough data to finish reading the 'first' (and only) 1000 byte element.

Of course, some implementations may still copy the 'partial' element into as many bytes as needed.

Osteoplastic answered 21/12, 2011 at 12:12 Comment(0)
A
18

That would be implementation detail. In glibc, the two are identical in performance, as it's implemented basically as (Ref http://sourceware.org/git/?p=glibc.git;a=blob;f=libio/iofread.c):

size_t fread (void* buf, size_t size, size_t count, FILE* f)
{
    size_t bytes_requested = size * count;
    size_t bytes_read = read(f->fd, buf, bytes_requested);
    return bytes_read / size;
}

Note that the C and POSIX standard does not guarantee a complete object of size size need to be read every time. If a complete object cannot be read (e.g. stdin only has 999 bytes but you've requested size == 1000), the file will be left in an interdeterminate state (C99 §7.19.8.1/2).

Edit: See the other answers about POSIX.

Adoptive answered 21/12, 2011 at 12:25 Comment(5)
You mention the POSIX standard but it requires fread to be implemented in terms of fgetc, which is much more deterministic than the C requirement.Neighbors
Awesome anwer..!! exactly what everyone landing here needs..!!! I m surprised it is having so many les votes..Brig
Is it the same for fwrite as well?Brig
Important point: You can break the file when reading >1 sized records.Furfuran
@kennythm Does not read may be called several times before fread returns to meet the caller's requirement which may want to fread 1MB bytes?Lexicography
F
6

fread calls getc internally. in Minix number of times getc is called is simply size*nmemb so how many times getc will be called depends on the product of these two. So Both fread(a, 1, 1000, stdin) and fread(a, 1000, 1, stdin) will run getc 1000=(1000*1) Times. Here is the siimple implementation of fread from Minix

size_t fread(void *ptr, size_t size, size_t nmemb, register FILE *stream){
register char *cp = ptr;
register int c;
size_t ndone = 0;
register size_t s;

if (size)
    while ( ndone < nmemb ) {
    s = size;
    do {
        if ((c = getc(stream)) != EOF)
            *cp++ = c;
        else
            return ndone;
    } while (--s);
    ndone++;
}

return ndone;
}
Flipper answered 21/12, 2011 at 13:22 Comment(1)
genuine answer in my opinionGuilt
H
3

There may be no performance difference, but those calls are not the same.

  • fread returns the number of elements read, so those calls will return different values.
  • If an element cannot be completely read, its value is indeterminate:

If an error occurs, the resulting value of the file position indicator for the stream is indeterminate. If a partial element is read, its value is indeterminate. (ISO/IEC 9899:TC2 7.19.8.1)

There's not much difference in the glibc implementation, which just multiplies the element size by the number of elements to determine how many bytes to read and divides the amount read by the member size in the end. But the version specifying an element size of 1 will always tell you the correct number of bytes read. However, if you only care about completely read elements of a certain size, using the other form saves you from doing a division.

Highcolored answered 21/12, 2011 at 12:19 Comment(0)
G
1

One more sentence form http://pubs.opengroup.org/onlinepubs/000095399/functions/fread.html is notable

The fread() function shall read into the array pointed to by ptr up to nitems elements whose size is specified by size in bytes, from the stream pointed to by stream. For each object, size calls shall be made to the fgetc() function and the results stored, in the order read, in an array of unsigned char exactly overlaying the object.

Inshort in both case data will be accessed by fgetc()...!

Gallopade answered 21/12, 2011 at 12:20 Comment(6)
yea i also feel so but on that page written "The functionality described on this reference page is aligned with the ISO C standard." seems doubty ?Gallopade
@Mr.32: the standard says the same thing about calls to fgetc, so Posix is indeed aligned with C99. But the standard doesn't give a conforming program any means to determine whether fgetc is "really" called, or whether fread does something else that's equivalent. 5.1.2.3 explains that the standard only describes the behavior of an "abstract machine", and lists in what ways the actual program must match that behavior. This is called the "as-if" rule in C++ but not C (my mistake earlier). Non-observable behavior need not be identical.Midge
So, even if a particular implementation gives you some means to count how many times fgetc is called (perhaps by letting you link your program against your own version of that function, for example by modifying and recompiling libc), it can do that with the caveat that the function you're replacing is not called always and only when the standard describes the abstract machine as calling it.Midge
@SteveJessop "Non-observable behavior need not be identical." So why it is documented in POSIX?Wistful
@Beginner: because a description of the behavior of the abstract machine is a convenient way to describe the effect of fread (or any other bit of C code). It's documented that way in Posix simply because it's documented that way in the standard.Midge
@SteveJessop Any detectable difference between the library implementation and an implementation in terms of fgetc is observable, and is non-conformant. Of course one can debate what "detectable" consists of.Neighbors
S
1

I wanted to clarify the answers here. fread performs buffered IO. The actual read block sizes fread uses are determined by the C implementation being used.

All modern C libraries will have the same performance with the two calls:

fread(a, 1, 1000, file);
fread(a, 1000, 1, file);

Even something like:

for (int i=0; i<1000; i++)
  a[i] = fgetc(file)

Should result in the same disk access patterns, although fgetc would be slower due to more calls into the standard c libraries and in some cases the need for a disk to perform additional seeks which would have otherwise been optimized away.

Getting back to the difference between the two forms of fread. The former returns the actual number of bytes read. The latter returns 0 if the file size is less than 1000, otherwise it returns 1. In both cases the buffer would be filled with the same data, i.e. the contents of the file up to 1000 bytes.

In general, you probably want to keep the 2nd parameter (size) set to 1 such that you get the number of bytes read.

Sachet answered 7/6, 2012 at 22:20 Comment(3)
"All modern C libraries will have the same performance with the two calls" -- yes. "in some cases the need for a disk to perform additional seeks which would have otherwise been optimized away" -- no. fgetc simply reads from stdio's in-memory buffer. And even if the stream has been set to be unbuffered, the underlying OS buffers disk reads.Neighbors
@Jim: fgetc reads from stdio in a different way than fread. The obvious result of this is that fgetc will always maximize the number of seeks/system calls (bad) where as fread will minimize the number of seeks/system calls as you are providing libc with more information about what you are doing.Sachet
Sorry, but you have no idea what you're talking about ... there's no way in which fread or fgetc differ that affects the number of seeks, and you have provided no support for this absurd claim. Note that the definition of fread in the C99 and POSIX standards is given in terms of fgetc, as discussed elsewhere on this page.Neighbors

© 2022 - 2024 — McMap. All rights reserved.