I need to read a series of numbers from a space-separated human-readable file and do some math, but I've run into some truly bizarre memory behavior just reading the file.
If I read the numbers and immediately discard them...
#include <fstream>
int main(int, char**) {
std::ifstream ww15mgh("ww15mgh.grd");
double value;
while (ww15mgh >> value);
return 0;
}
My program allocates 59MB of memory according to valgrind, scaling linearly with respect to the size of the file:
$ g++ stackoverflow.cpp
$ valgrind --tool=memcheck --leak-check=yes ./a.out 2>&1 | grep total
==523661== total heap usage: 1,038,970 allocs, 1,038,970 frees, 59,302,487
But, if I use ifstream >> string
instead and then use sscanf
to parse the string, my memory usage looks a lot more sane:
#include <fstream>
#include <string>
#include <cstdio>
int main(int, char**) {
std::ifstream ww15mgh("ww15mgh.grd");
double value;
std::string text;
while (ww15mgh >> text)
std::sscanf(text.c_str(), "%lf", &value);
return 0;
}
$ g++ stackoverflow2.cpp
$ valgrind --tool=memcheck --leak-check=yes ./a.out 2>&1 | grep total
==534531== total heap usage: 3 allocs, 3 frees, 81,368 bytes allocated
To rule out the IO buffer as the issue, I've tried both ww15mgh.rdbuf()->pubsetbuf(0, 0);
(which makes the program take ages and still do 59MB worth of allocations) and pubsetbuf
with an enormous stack-allocated buffer (still 59MB). The behavior reproduces when compiled on either gcc
10.2.0 and clang
11.0.1 when using /usr/lib/libstdc++.so.6
from gcc-libs
10.2.0 and /usr/lib/libc.so.6
from glibc
2.32. The system locale is set to en_US.UTF-8
but this also reproduces if I set the environment variable LC_ALL=C
.
The ARM CI environment where I first noticed the problem is cross-compiled on Ubuntu Focal using GCC 9.3.0, libstdc++6
10.2.0 and libc
2.31.
Following advice in the comments, I tried LLVM's libc++ and get perfectly sane behavior with the original program:
$ clang++ -std=c++14 -stdlib=libc++ -I/usr/include/c++/v1 stackoverflow.cpp
$ valgrind --tool=memcheck --leak-check=yes ./a.out 2>&1 | grep total
==700627== total heap usage: 3 allocs, 3 frees, 8,664 bytes allocated
So, this behavior seems to be unique to GCC's implementation of fstream
. Is there something I could do differently in constructing or using the ifstream
that would avoid allocating tons of heap memory when compiled in a GNU environment? Is this a bug in their <fstream>
?
As discovered in the comments discussion, the actual memory footprint of the program is perfectly sane (84kb), it's just allocating and freeing the same small bit of memory hundreds thousands of times, which creates a problem when using custom allocators like ASAN which avoid re-using heap space. I posted a follow-up question asking how to cope with this kind of problem at the "ASAN" level.
A gitlab project that reproduces the issue in its CI pipeline was generously contributed by Stack Overflow user @KamilCuk.
>>double
. If I double the size, the program allocates 119MB. If I triple the file size, the program allocates 178MB. So allocations seem to scale linearly with respect to the number ofistream>>double
calls. – Mines.clear()
and.seekg(0)
in between), which yielded the same results. – Minesistream
class'soperator >>(double&)
implementation... Still no idea how to fix it. :( – Fizgig/usr/lib/libstdc++.so.6
, which my system identifies as being part ofgcc-libs
– Mines2 allocs
. – Stationeryen_us.UTF-8
andC
both reproduce. Glibc is 2.32, gcc and libstdc++ are both 10.2.0. In my ARM environment gcc is 9.3.0. – Minesdo_get
are of interest. But investigate further - please dovalgrind --tool=massif
and thenms_print massif.something
and show all allocations. – Stationerymassif
suggestion will show you the peak memory usage and I bet it's small. – Moschatelms_print
output here. This--tool
is new to me so I'm not sure I'm reading its output correctly but it looks like, as you suspected (and as I inferred from the lack of leaks reported by--tool=memcheck
) it's just allocating and freeing the same small chunk of memory a million times rather than actually leaking. This still creates a problem for me, though, because my real project is compiled with ASAN's memory allocator which adds padding and doesn't re-use heap space. – Mines2 allocs
). If that investigation goes nowhere, though, I'll accept that as an answer and open a new question about how to not run out of RAM with ASAN turned on. – Mineswhile
loops immediately exit). Try this one. (Line 372) – Mines