Not making large arrays static
, even when they're constexpr
can have dramatic performance impact and can lead to many missed optimizations. It may slow down your code by orders of magnitude. Your variables are still local and the compiler may decide to initialize them at runtime instead of storing them as data in the executable.
Consider the following example:
template <int N>
void foo();
void bar(int n)
{
// array of four function pointers to void(void)
constexpr void(*table[])(void) {
&foo<0>,
&foo<1>,
&foo<2>,
&foo<3>
};
// look up function pointer and call it
table[n]();
}
You probably expect gcc-10 -O3
to compile bar()
to a jmp
to an address which it fetches from a table, but that is not what happens:
bar(int):
mov eax, OFFSET FLAT:_Z3fooILi0EEvv
movsx rdi, edi
movq xmm0, rax
mov eax, OFFSET FLAT:_Z3fooILi2EEvv
movhps xmm0, QWORD PTR .LC0[rip]
movaps XMMWORD PTR [rsp-40], xmm0
movq xmm0, rax
movhps xmm0, QWORD PTR .LC1[rip]
movaps XMMWORD PTR [rsp-24], xmm0
jmp [QWORD PTR [rsp-40+rdi*8]]
.LC0:
.quad void foo<1>()
.LC1:
.quad void foo<3>()
This is because GCC decides not to store table
in the executable's data section, but instead initializes a local variable with its contents every time the function runs. In fact, if we remove constexpr
here, the compiled binary is 100% identical.
This can easily be 10x slower than the following code:
template <int N>
void foo();
void bar(int n)
{
static constexpr void(*table[])(void) {
&foo<0>,
&foo<1>,
&foo<2>,
&foo<3>
};
table[n]();
}
Our only change is that we have made table
static
, but the impact is enormous:
bar(int):
movsx rdi, edi
jmp [QWORD PTR bar(int)::table[0+rdi*8]]
bar(int)::table:
.quad void foo<0>()
.quad void foo<1>()
.quad void foo<2>()
.quad void foo<3>()
In conclusion, never make your lookup tables local variables, even if they're constexpr
. Clang actually optimizes such lookup tables well, but other compilers don't. See Compiler Explorer for a live example.
constexpr
, unlikeconst
, can always be put in read-only storage (you can't cast awayconstexpr
[?]) and that would obviate the need to push a fresh copy on the stack, even when it would be required forconst
variables. Sort of like pooling constant literal strings. But I don't know if that's legal. – Wapentake