Why does GCC's ifstream >> double allocate so much memory?
Asked Answered
M

2

21

I need to read a series of numbers from a space-separated human-readable file and do some math, but I've run into some truly bizarre memory behavior just reading the file.

If I read the numbers and immediately discard them...

#include <fstream>

int main(int, char**) {
    std::ifstream ww15mgh("ww15mgh.grd");
    double value;
    while (ww15mgh >> value);
    return 0;
}

My program allocates 59MB of memory according to valgrind, scaling linearly with respect to the size of the file:

$ g++ stackoverflow.cpp
$ valgrind --tool=memcheck --leak-check=yes ./a.out 2>&1 | grep total
==523661==   total heap usage: 1,038,970 allocs, 1,038,970 frees, 59,302,487 

But, if I use ifstream >> string instead and then use sscanf to parse the string, my memory usage looks a lot more sane:

#include <fstream>
#include <string>
#include <cstdio>

int main(int, char**) {
    std::ifstream ww15mgh("ww15mgh.grd");
    double value;
    std::string text;
    while (ww15mgh >> text)
        std::sscanf(text.c_str(), "%lf", &value);
    return 0;
}
$ g++ stackoverflow2.cpp
$ valgrind --tool=memcheck --leak-check=yes ./a.out 2>&1 | grep total
==534531==   total heap usage: 3 allocs, 3 frees, 81,368 bytes allocated

To rule out the IO buffer as the issue, I've tried both ww15mgh.rdbuf()->pubsetbuf(0, 0); (which makes the program take ages and still do 59MB worth of allocations) and pubsetbuf with an enormous stack-allocated buffer (still 59MB). The behavior reproduces when compiled on either gcc 10.2.0 and clang 11.0.1 when using /usr/lib/libstdc++.so.6 from gcc-libs 10.2.0 and /usr/lib/libc.so.6 from glibc 2.32. The system locale is set to en_US.UTF-8 but this also reproduces if I set the environment variable LC_ALL=C.

The ARM CI environment where I first noticed the problem is cross-compiled on Ubuntu Focal using GCC 9.3.0, libstdc++6 10.2.0 and libc 2.31.

Following advice in the comments, I tried LLVM's libc++ and get perfectly sane behavior with the original program:

$ clang++ -std=c++14 -stdlib=libc++ -I/usr/include/c++/v1 stackoverflow.cpp
$ valgrind --tool=memcheck --leak-check=yes ./a.out 2>&1 | grep total
==700627==   total heap usage: 3 allocs, 3 frees, 8,664 bytes allocated

So, this behavior seems to be unique to GCC's implementation of fstream. Is there something I could do differently in constructing or using the ifstream that would avoid allocating tons of heap memory when compiled in a GNU environment? Is this a bug in their <fstream>?

As discovered in the comments discussion, the actual memory footprint of the program is perfectly sane (84kb), it's just allocating and freeing the same small bit of memory hundreds thousands of times, which creates a problem when using custom allocators like ASAN which avoid re-using heap space. I posted a follow-up question asking how to cope with this kind of problem at the "ASAN" level.

A gitlab project that reproduces the issue in its CI pipeline was generously contributed by Stack Overflow user @KamilCuk.

Mines answered 13/1, 2021 at 13:44 Comment(23)
I have no idea, but - just out of curiosity - I would prepare a file two or ten times that long at check if the memory consumption depends on the size of data....Fizgig
@Fizgig The normal file is 9.2MB worth of ascii numbers and the program allocates 59MB using >>double. If I double the size, the program allocates 119MB. If I triple the file size, the program allocates 178MB. So allocations seem to scale linearly with respect to the number of istream>>double calls.Mines
I also tried reading the regular 9.2MB file multiple times (with .clear() and .seekg(0) in between), which yielded the same results.Mines
Looks like a memory leak inside the istream class's operator >>(double&) implementation... Still no idea how to fix it. :(Fizgig
I don't think there's anything really wrong with your code, you could try to use Clang with libc++ and see if that implementation uses less memory.Nork
@Nork It uses much less -- 8KB total no matter how much input data I throw at it. I'll update the question.Mines
So, to be clear, this is with libstdc++?Mustachio
@Yakk-AdamNevraumont Correct. When compiling under GCC normally, ldd shows it linked against /usr/lib/libstdc++.so.6, which my system identifies as being part of gcc-libsMines
On current laptops, 60Megabytes is small..Petronilapetronilla
On embedded ARM boards, it starts to become a problem. When testing under ASAN (which adds padding to each allocation), it actually broke my ARM CI runner. Also, it's 6.5 times the size of the input file, which is ridiculous.Mines
What locale are you using? Does this reproduces with different locale? Does it change in different locale? What glibc , libstdc++ and gcc versions are you using? I get only 2 allocs.Stationery
@Stationery edited the post, but on my system locales en_us.UTF-8 and C both reproduce. Glibc is 2.32, gcc and libstdc++ are both 10.2.0. In my ARM environment gcc is 9.3.0.Mines
I went through libstdc++ sources - __convert_to_v or _M_extract_float or do_get are of interest. But investigate further - please do valgrind --tool=massif and then ms_print massif.something and show all allocations.Stationery
Note that the number shown by valgrind is the sum of all allocations. There was not necessarily 59 megabytes of memory used at any one time. It's reasonable to guess that each input operation allocates and then deallocates some substantial amount of memory, but that doesn't mean your program actually has a large memory footprint: it's just using the same memory over and over again. @KamilCuk's massif suggestion will show you the peak memory usage and I bet it's small.Moschatel
For example, if each operation allocated and freed 64 Kbytes, and you processed 1 million inputs, valgrind would show 64 Gbytes, yet your program would run just fine with only 64 Kbytes of memory. Obviously the number shown is going to grow linearly with the number of inputs, but that doesn't indicate any kind of excessive usage or leak. libc++ probably just uses a different algorithm where it keeps a block of memory instead of allocating and freeing every time.Moschatel
@NateEldredge @Stationery ms_print output here. This --tool is new to me so I'm not sure I'm reading its output correctly but it looks like, as you suspected (and as I inferred from the lack of leaks reported by --tool=memcheck) it's just allocating and freeing the same small chunk of memory a million times rather than actually leaking. This still creates a problem for me, though, because my real project is compiled with ASAN's memory allocator which adds padding and doesn't re-use heap space.Mines
Right, the program's actual memory usage is a totally reasonable 81 Kbytes. So I think your real question is how to configure ASAN to deal with this allocation pattern more efficiently.Moschatel
@Stationery gitlab.com/-/snippets/2060515Mines
@NateEldredge that would be good question, but at the moment I'm more interested in how/why @Stationery can't reproduce (2 allocs). If that investigation goes nowhere, though, I'll accept that as an answer and open a new question about how to not run out of RAM with ASAN turned on.Mines
@Stationery your gitlab pipeline doesn't reproduce the issue because it doesn't download the file (so the while loops immediately exit). Try this one. (Line 372)Mines
In this talk: youtube.com/watch?v=4P_kbF0EbZM it is said that the current algorithm for double <-> str conversion often uses bignum (arbitrary precision integers) that need heap allocations. Maybe this has something to with this.Rake
Just out of curiosity, once you have established that there aren't any leaks, does rising the optimization level change the results?Burnisher
@Burnisher nope. same results with -O3.Mines
M
13

It really doesn't. The number 59,302,487 shown by valgrind is the sum of all allocations, and does not represent the actual memory consumption of the program.

It turns out that the libstdc++ implementation of the relevant operator>> creates a temporary std::string for scratch space, and reserves 32 bytes for it. This is then deallocated immediately after being used. See num_get::do_get. With overhead, this perhaps actually allocates 56 bytes or so, which multiplied by about 1 million repetitions does mean, in a sense, that a total of 59 megabytes were allocated, and of course this is why that number scales linearly with the number of inputs. But it was the same 56 bytes being allocated and freed over and over again. This is perfectly innocent behavior by libstdc++ and isn't a leak or excessive memory consumption.

I didn't check the libc++ source, but a good bet would be that it uses scratch space on the stack instead of the heap.

As determined in comments, your real problem is that you are running this under AddressSanitizer, which delays the reuse of freed memory in order to help catch use-after-free errors. I have some thoughts about how to address that (no pun intended) and will post them on How do I exclude allocations in a tight loop from ASAN?

Moschatel answered 14/1, 2021 at 16:38 Comment(0)
A
1

Unfortunately, the C++ stream-based I/O library is generally underused since everybody "knows" that it performs poorly, so there's a chicken and egg problem there - bad opinion leads to little use leads to sparse bug reports leads to low pressure for a fix.

I'd say that the largest user of C++'s streams is the basic CS/IT education sector and "quick one-off scripts" (that will invariably outlive the author), and there nobody really cares about performance.

What you're seeing is just a wasteful implementation - it constantly allocates and deallocates somewhere in the guts, but it doesn't leak memory as far as I can tell. I don't think that there's any sort of a "pattern" that will guarantee better performance in a non-brittle way while using stream I/O.

The best strategy to win at this in an embedded setting is not to play the game at all. Forget about the C++ stream I/O and all'll be good. There are alternative formatted I/O libraries that bring back the C++'s type safety and perform much better and then you're not beholden to standard library implementation bugs/inefficiencies. Or just use sscanf if you don't want to add dependencies.

Autopilot answered 14/1, 2021 at 16:29 Comment(1)
IMHO, it's not generally true that "constantly allocates and deallocates somewhere in the guts" is a "wasteful implementation". This makes the assumption that heap allocation is slow, especially this one. In fact, this pattern can be very fast.Downstroke

© 2022 - 2024 — McMap. All rights reserved.