Optimized handling of tag (empty struct) function parameters
Asked Answered
S

1

7

In some circumstances, we use tags to discriminate between functions. A tag is usually an empty struct:

struct Tag { };

Suppose that I have a function, which uses this tag:

void func(Tag, int a);

Now, let's call this function:

func(Tag(), 42);

And check out the resulting x86-64 disassembly, godbolt:

mov     edi, 42
jmp     func(Tag, int)            # TAILCALL

It is fine, the tag gets completely optimized away: there is no register/stack space allocated for it.

But, if I check out other platforms, the tag has some presence.

On ARM, r0 is used as the tag, and it gets zeroed (seems unnecessary):

mov     r1, #42
mov     r0, #0
b       func(Tag, int)

With MSVC,ecx is used as the tag, and it is "initialized" from the stack (again, seems unnecessary):

movzx   ecx, BYTE PTR $T1[rsp]
mov     edx, 42                             ; 0000002aH
jmp     void func(Tag,int)                 ; func

My question is: is there a tag technique, which is equally optimized on all these platforms?


Note: I don't find where the SysV ABI specifies that empty classes can be optimized away at parameter passing... (and even, the Itanium C++ ABI says: "Empty classes will be passed no differently from ordinary classes".)

Swoon answered 22/3, 2019 at 16:26 Comment(1)
You could specialize a template but I really dislike specializations since they have so many drawbacks.Gayle
G
2

I think the basic problem here is that when generating the standalone version of a function, the compiler has to generate code that can be called by anyone from anywhere according to the respective calling convention. And when generating a call to a function without knowing its definition, all the compiler really knows is that this function expects to be called according to the calling convention. Based on that, it would seem that, unless the calling convention specifies that function parameters of empty type are removed, the compiler can't really optimize away parameters from function calls in general. Now, it may be technically legal for a C++ compiler to make up whatever calling convention it sees fit for a given function signature on the spot unless the function has non-C++ language linkage (e.g., an extern "C" function). But in practice, this would most likely not be all that simple. First of all, you then need an algorithm that can decide what the best calling convention for a given function signature looks like in general. And second, the ability to link code that was not necessarily all generated with exactly the same version of exactly the same compiler using exactly the same flags, while not required by the C++ standard, is probably relevant in practice. Function calling convention optimization is certainly not impossible. But I'm not aware of any C++ compiler that actually does it (when generating object code).

One possible solution would be to, e.g., use different names for the actual function implementations and have simple inline wrapper functions that translate calls with Tag types to the respective implementations:

struct TagA { };
struct TagB { };

inline void func(int a, TagA)
{
    void funcA(int a);
    funcA(a);
}

inline void func(int a, TagB)
{
    void funcB(int a);
    funcB(a);
}

void call() {
    func(42, TagA());
    func(42, TagB());
}

try it out here

Also, note that, while the compiler may generate the function calls like that in the initial object files, link-time optimizations may be able to get rid of unused parameters in the end. At least one major compiler even documents such behavior…

Gink answered 22/3, 2019 at 17:13 Comment(2)
Yup, the x86-64 System V calling convention packs small structs into up-to-2 registers. I hadn't realized that empty structs could use 0 registers, but that is consistent with the existing rules. While Windows x64 has very strict arg -> register mappings to simplify variadic functions (every arg fills exactly one 8-byte arg-passing slot), so e.g. a float first arg in XMM0 still bumps the 2nd arg to RDX even if its the first integer arg.Downstroke
clang will sort of optimize away passing args it knows the caller won't use, with __attribute__((noinline)). I was playing around with godbolt.org/z/vur-oK to add enough args that there were no registers left for Tag, so see if it took a slot in memory, and noticed that clang only copies 16 bytes of uninitialized stack memory in call() before calling func(), and doesn't set up integers in registers. gcc only does that much IPA when making a custom .constprop clone of the non-inline function, but attribute noclone disables that unnecessarily.Downstroke

© 2022 - 2024 — McMap. All rights reserved.