Why is istream/ostream slow [duplicate]

Asked 8/9, 2013 at 21:21 Answered 16/12, 2020 at 14:56

At 50:40 of http://channel9.msdn.com/Events/GoingNative/2013/Writing-Quick-Code-in-Cpp-Quickly Andrei Alexandrescu makes a joke about how not efficient/slow istream is.

I had an issue in the past with ostream being slow and fwrite being significantly faster (reducing many seconds when running the main loop once) but I never understood why nor looked into it.

What makes istream and ostream slow in C++? or at least slow compared to other things (like fread/fget, fwrite) which would equally satisfied the needs.

Crockery answered 8/9, 2013 at 21:21 Comment(5)

IIRC the C++ streams have to sync with the C i/o "constructs" if you will (for compatibility reasons). I believe you can make them faster by turning that syncing off (granted you'll have to restrain from doing things like printf afterwards) – Glow 8/9, 2013 at 21:23

@Borgleader: What C "constructs" would ostream sync to (it was a file output stream not std::out) and why is it slower then C fwrite? – Crockery 8/9, 2013 at 21:25

Take a look at this answer: https://mcmap.net/q/44966/-why-is-reading-lines-from-stdin-much-slower-in-c-than-python – Glow 8/9, 2013 at 21:30

@Borgleader: That definitely answers the cin question. +1 – Crockery 8/9, 2013 at 21:32

Related: #4340896 – Balfour 8/9, 2013 at 21:50

Actually, IOStreams don't have to be slow! It is a matter of implementing them in a reasonable way to make them fast, though. Most standard C++ library don't seem to pay too much attention to implement IOStreams. A long time ago when my CXXRT was still maintained it was about as fast as stdio - when used correctly!

Note that there are few performance traps for users laid out with IOStreams, however. The following guidelines apply to all IOStream implementations but especially to those which are tailored to be fast:

When using std::cin, std::cout, etc. you need to call std::sync_with_stdio(false)! Without this call, any use of the standard stream objects is required to synchronize with C's standard streams. Of course, when using std::sync_with_stdio(false) it is assumed that you don't mix std::cin with stdin, std::cout with stdout, etc.
Do not use std::endl as it mandates many unnecessary flushes of any buffer. Likewise, don't set std::ios_base::unitbuf or use std::flush unnecessarily.
When creating your own stream buffers (OK, few users do), make sure they do use an internal buffer! Processing individual characters jumps through multiple conditions and a virtual function which makes it hideously slow.

Houston answered 8/9, 2013 at 21:39 Comment(2)

+1 For pointing out that it's mostly a problem with the implementation, not the library itself. Efficient iostreams implementation is also one of the main concerns in the C++ Performance Report published by the ISO committee in 2006. – Brodeur 9/9, 2013 at 7:18

@ComicSansMS: As it happens, much of the material on performance of IOStreams is based on my contributions :-) (the contributions are not attributed to their respective authors; the contributors are listed on page 6, however). – Macedonia 9/9, 2013 at 8:52

There are several reasons why [i]ostreams are slow by design:

Shared formatting state: every formatted output operation has to check all formatting state that might have been previously mutated by I/O manipulators. For this reason iostreams are inherently slower than printf-like APIs (especially with format string compilation like in Rust or {fmt} that avoid parsing overhead) where all formatting information is local.
Uncontrolled use of locales: all formatting goes through an inefficient locale layer even if you don't want this, for example when writing a JSON file. See N4412: Shortcomings of iostreams.
Inefficient top-level API: formatting a message with iostreams normally consists of multiple formatting function calls because arguments and I/O manipulators are interleaved with parts of the message. For example, there are three such calls (godbolt) in
```
std::cout << "The answer is " << answer << ".\n";
```
compared to just one (godbolt) in the equivalent printf call:
```
printf("The answer is %d.\n", answer);
```
Each of these formatting function invocations has substantial overhead (see above). In replacement-based APIs like printf there is little overhead for literal portions of the string.
Extra buffering and synchronization. This can be disabled with sync_with_stdio(false) at the cost of poor interoperability with other I/O facilities.

Fukuoka answered 16/12, 2020 at 14:56 Comment(0)

Perhaps this can give some idea of what you're dealing with:

#include <stdio.h>
#include <iomanip>
#include <iostream>
#include <iterator>
#include <fstream>
#include <time.h>
#include <string>
#include <algorithm>

unsigned count1(FILE *infile, char c) { 
    int ch;
    unsigned count = 0;

    while (EOF != (ch=getc(infile)))
        if (ch == c)
            ++count;
    return count;
}

unsigned int count2(FILE *infile, char c) { 
    static char buffer[8192];
    int size;
    unsigned int count = 0;

    while (0 < (size = fread(buffer, 1, sizeof(buffer), infile)))
        for (int i=0; i<size; i++)
            if (buffer[i] == c)
                ++count;
    return count;
}

unsigned count3(std::istream &infile, char c) {    
    return std::count(std::istreambuf_iterator<char>(infile), 
                    std::istreambuf_iterator<char>(), c);
}

unsigned count4(std::istream &infile, char c) {    
    return std::count(std::istream_iterator<char>(infile), 
                    std::istream_iterator<char>(), c);
}

unsigned int count5(std::istream &infile, char c) {
    static char buffer[8192];
    unsigned int count = 0;

    while (infile.read(buffer, sizeof(buffer)))
        count += std::count(buffer, buffer+infile.gcount(), c);
    count += std::count(buffer, buffer+infile.gcount(), c);
    return count;
}

unsigned count6(std::istream &infile, char c) {
    unsigned int count = 0;
    char ch;

    while (infile >> ch)
        if (ch == c)
            ++count;
    return count;
}

template <class F, class T>
void timer(F f, T &t, std::string const &title) { 
    unsigned count;
    clock_t start = clock();
    count = f(t, 'N');
    clock_t stop = clock();
    std::cout << std::left << std::setw(30) << title << "\tCount: " << count;
    std::cout << "\tTime: " << double(stop-start)/CLOCKS_PER_SEC << "\n";
}

int main() {
    char const *name = "equivs2.txt";

    FILE *infile=fopen(name, "r");

    timer(count1, infile, "ignore");

    rewind(infile);
    timer(count1, infile, "using getc");

    rewind(infile);
    timer(count2, infile, "using fread");

    fclose(infile);

    std::ifstream in2(name);
    timer(count3, in2, "ignore");

    in2.clear();
    in2.seekg(0);
    timer(count3, in2, "using streambuf iterators");

    in2.clear();
    in2.seekg(0);
    timer(count4, in2, "using stream iterators");

    in2.clear();
    in2.seekg(0);
    timer(count5, in2, "using istream::read");

    in2.clear();
    in2.seekg(0);
    timer(count6, in2, "using operator>>");

    return 0;
}

Running this, I get results like this (with MS VC++):

ignore                          Count: 1300     Time: 0.309
using getc                      Count: 1300     Time: 0.308
using fread                     Count: 1300     Time: 0.028
ignore                          Count: 1300     Time: 0.091
using streambuf iterators       Count: 1300     Time: 0.091
using stream iterators          Count: 1300     Time: 0.613
using istream::read             Count: 1300     Time: 0.028
using operator>>                Count: 1300     Time: 0.619

and this (with MinGW):

ignore                          Count: 1300     Time: 0.052
using getc                      Count: 1300     Time: 0.044
using fread                     Count: 1300     Time: 0.036
ignore                          Count: 1300     Time: 0.068
using streambuf iterators       Count: 1300     Time: 0.068
using stream iterators          Count: 1300     Time: 0.131
using istream::read             Count: 1300     Time: 0.037
using operator>>                Count: 1300     Time: 0.121

As we can see in the results, it's not really a matter of iostreams being categorically slow. Rather, a great deal depends on exactly how you use iostreams (and to a lesser extent FILE * as well). There's also a pretty substantial variation just between these to implementations.

Nonetheless, the fastest versions with each (fread and istream::read) are essentially tied. With VC++ getc is quite a bit slower than either istream::read or and istreambuf_iterator.

Bottom line: getting good performance from iostreams requires a little more care than with FILE * -- but it's certainly possible. They also give you more options: convenience when you don't care all that much about speed, and performance directly competitive with the best you can get from C-style I/O, with a little extra work.

Experience answered 8/9, 2013 at 22:54 Comment(5)

Since my edit got rejected: your istream::read-version has a bug. The last chunk of characters isn’t checked, see here. – Jehiel 15/9, 2013 at 16:1

Handy. Also, if you copy count6 to a new count7 with "while (infile.get(ch))" and you'll see that it is twice as fast as operator>> but still twice as slow as getc. – Cheston 12/7, 2017 at 0:36

@NickWestgate: Yeah--no matter how many I add, there are at least three more that could be added. If (for example) another method were faster than anything else, I'd probably add it--but another that's more or less in the middle of the pack just doesn't seem like it's worth bothering... – Experience 12/7, 2017 at 0:48

Well it would be useful for those (like me) who are comparing the current state of some code to the other options. I'm pretty disappointed that istream::get spends a lot of time entering and exiting critical sections in some single-threaded code I maintain. ; - ) Anyway, thanks for the handy test suite. – Cheston 12/7, 2017 at 0:59

File I/O is inherently noisy on Windows and probably Linux as well due to caching. – Fechter 14/7, 2020 at 10:12

While this question is quite old, I'm amazed nobody has mentioned iostream object construction.

That is, whenever you create an STL iostream (and other stream variants), if you step into the code, the constructor calls an internal Init function. In there, operator new is called to create a new locale object. And likewise, is destroyed upon destruction.

This is hideous, IMHO. And certainly contributes to slow object construction/destruction, because memory is being allocated/deallocated using a system lock, at some point.

Further, some of the STL streams allow you to specify an allocator, so why is the locale created NOT using the specified allocator?

Using streams in a multithreaded environment, you could also imagine the bottleneck imposed by calling operator new every time a new stream object is constructed.

Hideous mess if you ask me, as I am finding out myself right now!

Heteromerous answered 14/3, 2018 at 19:58 Comment(3)

Karl Knechtel says here: "(...) This task is almost certainly I/O bound and there is way too much FUD going around about the cost of creating std::string objects in C++ or using <iostream> in and of itself." – Apologize 16/3, 2018 at 4:36

Somebody else has exactly the same reasoning.... – Heteromerous 21/10, 2021 at 18:8

The rationale provided by the llvm project for never using <iostream> is interesting. – Empanel 12/10, 2022 at 20:57

On a similar topic, STL says: "You can call setvbuf() to enable buffering on stdout."

https://web.archive.org/web/20170329163751/https://connect.microsoft.com/VisualStudio/feedback/details/642876/std-wcout-is-ten-times-slower-than-wprintf-performance-bug-in-c-library

Nevanevada answered 9/9, 2013 at 1:3 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags