std::fstream buffering vs manual buffering (why 10x gain with manual buffering)?
Asked Answered
R

3

80

I have tested two writing configurations:

  1. Fstream buffering:

    // Initialization
    const unsigned int length = 8192;
    char buffer[length];
    std::ofstream stream;
    stream.rdbuf()->pubsetbuf(buffer, length);
    stream.open("test.dat", std::ios::binary | std::ios::trunc)
    
    // To write I use :
    stream.write(reinterpret_cast<char*>(&x), sizeof(x));
    
  2. Manual buffering:

    // Initialization
    const unsigned int length = 8192;
    char buffer[length];
    std::ofstream stream("test.dat", std::ios::binary | std::ios::trunc);
    
    // Then I put manually the data in the buffer
    
    // To write I use :
    stream.write(buffer, length);
    

I expected the same result...

But my manual buffering improve performance by a factor of 10 to write a file of 100MB, and the fstream buffering does not change anything compared to the normal situation (without redefining a buffer).

Does someone has an explanation of this situation ?

EDIT : Here are the news : a benchmark just done on a supercomputer (linux 64-bit architecture, lasts intel Xeon 8-core, Lustre filesystem and ... hopefully well configured compilers) benchmark (and I don't explain the reason of the "resonance" for a 1kB manual buffer...)

EDIT 2 : And the resonance at 1024 B (if someone has an idea about that, I'm interested) : enter image description here

Rolph answered 21/10, 2012 at 10:45 Comment(14)
g++ 4.7.1 on Ubuntu 12.04 (inside a VirtualBox on Window 7 x64 pro) with a SSD disk. I will test that on a supercomputer and then I come back with the result of the test.Rolph
And can you check default buffering size, like in this question: #10351259Oyler
And of course with optimizations turned on? Sorry for the question, but this is always the first thing to ask when somebody talks about measuring performance and doesn't say anything about the platform or compilation method.Mercy
You are right to ask this question, even if the answer is "of course" : g++-4.7 -Wall -Wextra -Winline -O3 -std=c++0x test.cpp -o test. I will come back with more complete results.Rolph
One question for my tests : is it better to call pubsetbuff before or after opening the file or it doesn't change anything ?Rolph
I recommend you check the source for the C++ library, to see what it does differently depending on the buffer.Perutz
@Rolph Well, cppreference.com says it doesn't even have an effect when calling it after opening the file on gcc 4.6's standard library, though I don't know if this applies to all versions of gcc.Mercy
Thanks. But in fact, in the version I tested, I call it before opening the file.Rolph
@Rolph Well, then maybe post the version that corresponds to your posted results.Mercy
Could you post the complete, compilable test program you're benchmarking (even if on an external site like ideone.com)?Frey
@Rolph After having seen your time vs. buffer size diagrams I have deleted my answer and upvoted your question. Please post your complete, compilable test program.Goodlooking
@MichaelBurr : Ok, I will do that as soon as I have some available time, because currently this benchmark relies on some classes of the library I am currently writing.Rolph
Please note that the ofstream buffer size in the manual version is always fixed by the ofstream constructor (BUFSIZ in gcc). You are only changing the size of that char array. That char array is different in the first version, and it serves as the ofstream buffer (which means, if you reach its limit, we call the system write function). You should make sure to call pubsetbuf before open. Otherwise, it has no effect (and both versions should have the same performance at least in gcc). I have to admit. Nothing makes sense, I hope you can share your code. That's kind of impossible now :).Eton
I made a std::streambuf that triple buffers async flushes, and found big improvements in my I/O bound code. I wonder how that would affect this question?Kamerad
I
40

This is basically due to function call overhead and indirection. The ofstream::write() method is inherited from ostream. That function is not inlined in libstdc++, which is the first source of overhead. Then ostream::write() has to call rdbuf()->sputn() to do the actual writing, which is a virtual function call.

On top of that, libstdc++ redirects sputn() to another virtual function xsputn() which adds another virtual function call.

If you put the characters into the buffer yourself, you can avoid that overhead.

Inessive answered 21/10, 2012 at 17:53 Comment(2)
I have to admit that I cannot understand why this answer is correct. Both versions are using stream::write, so why should it be different? pubsetbuf then open should have the same effect as ifstream cstr.Eton
@AJed: It isn't explicit in the question, but the assumption is that in the first case, when writing x, these writes are small, so you end up doing a lot more calls to stream.write that way for the same amount of data.Inessive
L
16

I would like to explain what is the cause of the peak in the second chart.

In fact, virtual functions used by std::ofstream lead to a performance decrement similar to what we see on the first picture, but it does not give an answer to why the highest performance hit when manual buffer size was less than 1024 bytes.

The problem relates to the high cost of writev() and write() system call and internal implementation of std::filebuf internal class of std::ofstream.

To show how write() influences the performance, I did a simple test using the dd tool on my Linux machine to copy 10MB file with different buffer sizes (bs option):

test@test$ time dd if=/dev/zero of=zero bs=256 count=40000
40000+0 records in
40000+0 records out
10240000 bytes (10 MB) copied, 2.36589 s, 4.3 MB/s

real    0m2.370s
user    0m0.000s
sys     0m0.952s
test$test: time dd if=/dev/zero of=zero bs=512 count=20000
20000+0 records in
20000+0 records out
10240000 bytes (10 MB) copied, 1.31708 s, 7.8 MB/s

real    0m1.324s
user    0m0.000s
sys     0m0.476s

test@test: time dd if=/dev/zero of=zero bs=1024 count=10000
10000+0 records in
10000+0 records out
10240000 bytes (10 MB) copied, 0.792634 s, 12.9 MB/s

real    0m0.798s
user    0m0.008s
sys     0m0.236s

test@test: time dd if=/dev/zero of=zero bs=4096 count=2500
2500+0 records in
2500+0 records out
10240000 bytes (10 MB) copied, 0.274074 s, 37.4 MB/s

real    0m0.293s
user    0m0.000s
sys     0m0.064s

As you can see: the smaller the buffer is, the lower the write speed is and therefore the more the time dd spends in the system space. So, the read/write speed decreases when the buffer size decreases.

But why did the speed peak when the manual buffer size was less than 1024 bytes in the topic creator manual buffer tests? Why it was almost constant?

The explanation relates to the std::ofstream implementation, especially to the std::basic_filebuf.

By default it uses 1024 bytes buffer (BUFSIZ variable). So, when you write your data using pieces less than 1024, writev() (not write()) system call is called at least once for two ofstream::write() operations (pieces have a size of 1023 < 1024 - first is written to the buffer, and second forces writing of first and second). Based on it, we can conclude that ofstream::write() speed does not depend on the manual buffer size before the peak (write() is called at least twice rarely).

When you try writing greater or equal to 1024 bytes buffer at once using ofstream::write() call, writev() system call is called for each ofstream::write. So, you see that speed increases when the manual buffer is greater than 1024 (after the peak).

Moreover, if you would like to set std::ofstream buffer greater than 1024 buffer (for example, 8192 bytes buffer) using streambuf::pubsetbuf() and call ostream::write() to write data using pieces of 1024 size, you would be surprised that the write speed will be the same as if you would use 1024 buffer. It is because implementation of std::basic_filebuf - the internal class of std::ofstream - is hard coded to force calling system writev() call for each ofstream::write() call when passed buffer is greater or equal to 1024 bytes (see basic_filebuf::xsputn() source code). There is also an open issue in the GCC bugzilla which was reported at 2014-11-05.

So, the solution of this problem can be provided using two possible cases:

  • replace std::filebuf by your own class and redefine std::ofstream
  • devide a buffer, which has to be passed to the ofstream::write(), into pieces of size less than 1024 and pass them to the ofstream::write() one by one
  • don't pass small pieces of data to the ofstream::write() to avoid decreasing performance on the virtual functions of std::ofstream
Lindsey answered 2/2, 2018 at 15:24 Comment(1)
gcc bugzilla has been closed and issue seems to have been fixed in gcc13.Hydrogenate
A
3

I'd like to add to the existing responses that this performance behavior (all the overhead from the virtual method calls/indirection) is typically not an issue if writing large blocks of data. What seems to have been omitted from the question and these prior answers (although probably implicitly understood) is that the original code was writing a small number of bytes each time. Just to clarify for others: if you are writing large blocks of data (~kB+), there is no reason to expect manually buffering will have a significant performance difference to using std::fstream's buffering.

Affecting answered 19/7, 2018 at 16:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.