I am porting some C99 code that makes heavy use of variable length arrays (VLA) to C++.
I replaced the VLAs (stack allocation) with an array class that allocates memory on the heap. The performance hit was huge, a slowdown of a factor of 3.2 (see benchmarks below). What fast VLA replacement can I use in C++? My goal is to minimize performance hit when rewriting the code for C++.
One idea that was suggested to me was to write an array class that contains a fixed-size storage within the class (i.e. can be stack-allocated) and uses it for small arrays, and automatically switches to heap allocation for larger arrays. My implementation of this is at the end of the post. It works fairly well, but I still cannot reach the performance of the original C99 code. To come close to it, I must increase this fixed-size storage (MSL
below) to sizes which I am not comfortable with. I don't want to allocate too-huge arrays on the stack even for the many small arrays that don't need it because I worry that it will trigger a stack overflow. A C99 VLA is actually less prone to this because it will never use more storage than needed.
I came upon std::dynarray
, but my understanding is that it was not accepted into the standard (yet?).
I know that clang and gcc support VLAs in C++, but I need it to work with MSVC too. In fact better portability is one of the main goals of rewriting as C++ (the other goal being making the program, which was originally a command line tool, into a reusable library).
Benchmark
MSL
refers to the array size above which I switch to heap-allocation. I use different values for 1D and 2D arrays.
Original C99 code: 115 seconds.
MSL = 0 (i.e. heap allocation): 367 seconds (3.2x).
1D-MSL = 50, 2D-MSL = 1000: 187 seconds (1.63x).
1D-MSL = 200, 2D-MSL = 4000: 143 seconds (1.24x).
1D-MSL = 1000, 2D-MSL = 20000: 131 (1.14x).
Increasing MSL
further improves performance more, but eventually the program will start returning wrong results (I assume due to stack overflow).
These benchmarks are with clang 3.7 on OS X, but gcc 5 shows very similar results.
Code
This is the current "smallvector" implementation I use. I need 1D and 2D vectors. I switch to heap-allocation above size MSL
.
template<typename T, size_t MSL=50>
class lad_vector {
const size_t len;
T sdata[MSL];
T *data;
public:
explicit lad_vector(size_t len_) : len(len_) {
if (len <= MSL)
data = &sdata[0];
else
data = new T[len];
}
~lad_vector() {
if (len > MSL)
delete [] data;
}
const T &operator [] (size_t i) const { return data[i]; }
T &operator [] (size_t i) { return data[i]; }
operator T * () { return data; }
};
template<typename T, size_t MSL=1000>
class lad_matrix {
const size_t rows, cols;
T sdata[MSL];
T *data;
public:
explicit lad_matrix(size_t rows_, size_t cols_) : rows(rows_), cols(cols_) {
if (rows*cols <= MSL)
data = &sdata[0];
else
data = new T[rows*cols];
}
~lad_matrix() {
if (rows*cols > MSL)
delete [] data;
}
T const * operator[] (size_t i) const { return &data[cols*i]; }
T * operator[] (size_t i) { return &data[cols*i]; }
};
GCC
does VLAs as an extension and runs on all those platforms. – Snowslidealloca
(plaform-specific function, but exists on Linux/Windows/OS X): man7.org/linux/man-pages/man3/alloca.3.html It dynamically allocates memory on the stack. – Levensonalloca
to create an on-stack array class? The difficulty I'm encountering is that if I callalloca
in the constructor of the class, it will use the constructor's stack (and the pointer won't be valid once the constructor has returned). – MysteriousT
is alwaysint
,char
orbool
in this application, never floating point. It is a complex combinatorial algorithm. Also, if the root of the performance problem is that auto-vectorization is lost, then why does the optimization I tried work as well as it does? – Mysteriousalloca
would need to be called in the function whose stack should be used. That is, not in the constructor of the vector class (or the initialization list.) The class could take the pointer as a constructor argument, likelad_vector vec( (int*)alloca(10 * sizeof(int)), 10 );
. Maybe make a macro for this (but not an inline function), to get a syntax likelad_vector vec = MAKE_LADVECTOR(10);
– Levenson