How does fread really work?

Asked 21/12, 2011 at 11:57 Answered 7/6, 2012 at 22:20

The declaration of fread is as following:

size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);

The question is: Is there a difference in reading performance of two such calls to fread:

char a[1000];

fread(a, 1, 1000, stdin);
fread(a, 1000, 1, stdin);

Will it read 1000 bytes at once each time?

Wistful answered 21/12, 2011 at 11:57 Comment(0)

111

There may or may not be any difference in performance. There is a difference in semantics.

fread(a, 1, 1000, stdin);

attempts to read 1000 data elements, each of which is 1 byte long.

fread(a, 1000, 1, stdin);

attempts to read 1 data element which is 1000 bytes long.

They're different because fread() returns the number of data elements it was able to read, not the number of bytes. If it reaches end-of-file (or an error condition) before reading the full 1000 bytes, the first version has to indicate exactly how many bytes it read; the second just fails and returns 0.

In practice, it's probably just going to call a lower-level function that attempts to read 1000 bytes and indicates how many bytes it actually read. For larger reads, it might make multiple lower-level calls. The computation of the value to be returned by fread() is different, but the expense of the calculation is trivial.

There may be a difference if the implementation can tell, before attempting to read the data, that there isn't enough data to read. For example, if you're reading from a 900-byte file, the first version will read all 900 bytes and return 900, while the second might not bother to read anything. In both cases, the file position indicator is advanced by the number of characters successfully read, i.e., 900.

But in general, you should probably choose how to call it based on what information you need from it. Read a single data element if a partial read is no better than not reading anything at all. Read in smaller chunks if partial reads are useful.

Deform answered 21/12, 2011 at 12:16 Comment(10)

the second might not bother to read anything. In both cases, the file position indicator is advanced by the number of characters successfully read, i.e., 900 shouldn't it be that in the second version the file position indicator wouldn't advance since there was nothing read? In other words, shouldn't fread(a, 1000, N, stdin); always advance the fp indicator by a multiple of 1000? – Greaseball 27/1, 2014 at 11:26

Nevermind, found it. C11 at 7.21.8.1.2 and 7.21.8.2.2 says: If an error occurs, the resulting value of the file position indicator for the stream is indeterminate. – Greaseball 27/1, 2014 at 11:27

so there is no way to recover the position of the indicator? Or to avoid reading that last chunck that mess with the position indicator? – Altamirano 8/2, 2014 at 16:1

@David天宇Wong: If you need to recover the position, call ftell before calling fread, and then fseek after. – Deform 8/2, 2014 at 16:2

I don't really understand @KeithThompson fseek will just put me where I want, but how do I know where I want to be? – Altamirano 8/2, 2014 at 16:32

@David天宇Wong: I don't know where you want to be. If you want to be at the position where you were before the fread call, you can call ftell before calling fread (it returns a value that indicates your current position), then pass that result to fseek after the fread call. – Deform 8/2, 2014 at 19:38

well right when fread cannot read big chunks, so I can switch to a smaller fread. Thanks :) – Altamirano 8/2, 2014 at 20:42

@David天宇Wong: Why would fread not be able to read big chunks? The only limits on the size of data that can be read by a single fread call should be the size of the file and the size of the memory buffer you're reading into. (It might make multiple calls to some underlying system call, but that's almost entirely transparent.) – Deform 8/2, 2014 at 20:43

as you said in your answer above, it fails if it reaches EOF. My problem is documented here : #21648235 – Altamirano 8/2, 2014 at 20:49

The POSIX specification is much stricter ... it requires that fread does size fgetc's per object, so the exact same number of fgetc's will be done in either case (but the return values will be different). – Neighbors 15/8, 2014 at 6:34

According to the specification, the two may be treated differently by the implementation.

If your file is less than 1000 bytes, fread(a, 1, 1000, stdin) (read 1000 elements of 1 byte each) will still copy all the bytes until EOF. On the other hand, the result of fread(a, 1000, 1, stdin) (read 1 1000-byte element) stored in a is unspecified, because there is not enough data to finish reading the 'first' (and only) 1000 byte element.

Of course, some implementations may still copy the 'partial' element into as many bytes as needed.

Osteoplastic answered 21/12, 2011 at 12:12 Comment(0)

That would be implementation detail. In glibc, the two are identical in performance, as it's implemented basically as (Ref http://sourceware.org/git/?p=glibc.git;a=blob;f=libio/iofread.c):

size_t fread (void* buf, size_t size, size_t count, FILE* f)
{
    size_t bytes_requested = size * count;
    size_t bytes_read = read(f->fd, buf, bytes_requested);
    return bytes_read / size;
}

Note that the C ~~and POSIX~~ standard does not guarantee a complete object of size size need to be read every time. If a complete object cannot be read (e.g. stdin only has 999 bytes but you've requested size == 1000), the file will be left in an interdeterminate state (C99 §7.19.8.1/2).

Edit: See the other answers about POSIX.

Adoptive answered 21/12, 2011 at 12:25 Comment(5)

You mention the POSIX standard but it requires fread to be implemented in terms of fgetc, which is much more deterministic than the C requirement. – Neighbors 15/8, 2014 at 6:36

Awesome anwer..!! exactly what everyone landing here needs..!!! I m surprised it is having so many les votes.. – Brig 3/9, 2014 at 8:30

Is it the same for fwrite as well? – Brig 3/9, 2014 at 8:35

Important point: You can break the file when reading >1 sized records. – Furfuran 27/10, 2015 at 3:38

@kennythm Does not read may be called several times before fread returns to meet the caller's requirement which may want to fread 1MB bytes? – Lexicography 1/6, 2022 at 9:34

fread calls getc internally. in Minix number of times getc is called is simply size*nmemb so how many times getc will be called depends on the product of these two. So Both fread(a, 1, 1000, stdin) and fread(a, 1000, 1, stdin) will run getc 1000=(1000*1) Times. Here is the siimple implementation of fread from Minix

size_t fread(void *ptr, size_t size, size_t nmemb, register FILE *stream){
register char *cp = ptr;
register int c;
size_t ndone = 0;
register size_t s;

if (size)
    while ( ndone < nmemb ) {
    s = size;
    do {
        if ((c = getc(stream)) != EOF)
            *cp++ = c;
        else
            return ndone;
    } while (--s);
    ndone++;
}

return ndone;
}

Flipper answered 21/12, 2011 at 13:22 Comment(1)

genuine answer in my opinion – Guilt 1/5, 2020 at 15:56

There may be no performance difference, but those calls are not the same.

fread returns the number of elements read, so those calls will return different values.
If an element cannot be completely read, its value is indeterminate:

If an error occurs, the resulting value of the file position indicator for the stream is indeterminate. If a partial element is read, its value is indeterminate. (ISO/IEC 9899:TC2 7.19.8.1)

There's not much difference in the glibc implementation, which just multiplies the element size by the number of elements to determine how many bytes to read and divides the amount read by the member size in the end. But the version specifying an element size of 1 will always tell you the correct number of bytes read. However, if you only care about completely read elements of a certain size, using the other form saves you from doing a division.

Highcolored answered 21/12, 2011 at 12:19 Comment(0)

One more sentence form http://pubs.opengroup.org/onlinepubs/000095399/functions/fread.html is notable

The fread() function shall read into the array pointed to by ptr up to nitems elements whose size is specified by size in bytes, from the stream pointed to by stream. For each object, size calls shall be made to the fgetc() function and the results stored, in the order read, in an array of unsigned char exactly overlaying the object.

Inshort in both case data will be accessed by fgetc()...!

Gallopade answered 21/12, 2011 at 12:20 Comment(6)

yea i also feel so but on that page written "The functionality described on this reference page is aligned with the ISO C standard." seems doubty ? – Gallopade 21/12, 2011 at 12:35

@Mr.32: the standard says the same thing about calls to fgetc, so Posix is indeed aligned with C99. But the standard doesn't give a conforming program any means to determine whether fgetc is "really" called, or whether fread does something else that's equivalent. 5.1.2.3 explains that the standard only describes the behavior of an "abstract machine", and lists in what ways the actual program must match that behavior. This is called the "as-if" rule in C++ but not C (my mistake earlier). Non-observable behavior need not be identical. – Midge 21/12, 2011 at 13:13

So, even if a particular implementation gives you some means to count how many times fgetc is called (perhaps by letting you link your program against your own version of that function, for example by modifying and recompiling libc), it can do that with the caveat that the function you're replacing is not called always and only when the standard describes the abstract machine as calling it. – Midge 21/12, 2011 at 13:15

@SteveJessop "Non-observable behavior need not be identical." So why it is documented in POSIX? – Wistful 21/12, 2011 at 14:44

@Beginner: because a description of the behavior of the abstract machine is a convenient way to describe the effect of fread (or any other bit of C code). It's documented that way in Posix simply because it's documented that way in the standard. – Midge 21/12, 2011 at 15:13

@SteveJessop Any detectable difference between the library implementation and an implementation in terms of fgetc is observable, and is non-conformant. Of course one can debate what "detectable" consists of. – Neighbors 15/8, 2014 at 6:41

I wanted to clarify the answers here. fread performs buffered IO. The actual read block sizes fread uses are determined by the C implementation being used.

All modern C libraries will have the same performance with the two calls:

fread(a, 1, 1000, file);
fread(a, 1000, 1, file);

Even something like:

for (int i=0; i<1000; i++)
  a[i] = fgetc(file)

Should result in the same disk access patterns, although fgetc would be slower due to more calls into the standard c libraries and in some cases the need for a disk to perform additional seeks which would have otherwise been optimized away.

Getting back to the difference between the two forms of fread. The former returns the actual number of bytes read. The latter returns 0 if the file size is less than 1000, otherwise it returns 1. In both cases the buffer would be filled with the same data, i.e. the contents of the file up to 1000 bytes.

In general, you probably want to keep the 2nd parameter (size) set to 1 such that you get the number of bytes read.

Sachet answered 7/6, 2012 at 22:20 Comment(3)

"All modern C libraries will have the same performance with the two calls" -- yes. "in some cases the need for a disk to perform additional seeks which would have otherwise been optimized away" -- no. fgetc simply reads from stdio's in-memory buffer. And even if the stream has been set to be unbuffered, the underlying OS buffers disk reads. – Neighbors 15/8, 2014 at 6:46

@Jim: fgetc reads from stdio in a different way than fread. The obvious result of this is that fgetc will always maximize the number of seeks/system calls (bad) where as fread will minimize the number of seeks/system calls as you are providing libc with more information about what you are doing. – Sachet 16/8, 2014 at 0:49

Sorry, but you have no idea what you're talking about ... there's no way in which fread or fgetc differ that affects the number of seeks, and you have provided no support for this absurd claim. Note that the definition of fread in the C99 and POSIX standards is given in terms of fgetc, as discussed elsewhere on this page. – Neighbors 16/8, 2014 at 2:22

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags