Passing structs by-value in LLVM IR
Asked Answered
R

1

8

I'm generating LLVM IR for JIT purposes, and I notice that LLVM's calling conventions don't seem to match the C calling conventions when aggregate values are involved. For instance, when I declare a function as taking a {i32, i32} (that is, a struct {int a, b;} in C terms) parameter, it appears to pass each of the struct elements in its own x86-64 GPR to the function, even though the x86-64 ABI specifies (sec. 3.2.3) that such a struct should be packed in a single 64-bit GPR.

This is in spite of LLVM's documentation claiming to match the C calling convention by default:

ccc” - The C calling convention

This calling convention (the default if no other calling convention is specified) matches the target C calling conventions. This calling convention supports varargs function calls and tolerates some mismatch in the declared prototype and implemented declaration of the function (as does normal C).

My question, then, is: Am I doing something wrong to cause LLVM to not match the C calling convention, or is this known behavior? (At the very least, the documentation seems to be wrong, no?)

I can find only very few references to the issue at all on the web, such as this bug report from 2007, which claims to be fixed. It also claims that "First, LLVM has no way to deal with aggregates as singular Value*'s", which I don't know if it was true in 2007, but it doesn't seem to be true now, given the extractvalue/insertvalue instructions. I also found this SO question whose second (non-accepted) answer simply seems to accept implicitly that argument coercion has to be done manually.

I'm currently building code for doing argument coercion in my IR generator, but it is complicating my design considerably (not to mention making it architecture-specific), so if I'm simply doing something wrong, I'd rather know about that. :)

Repine answered 11/9, 2016 at 16:4 Comment(2)
I don't know the big-picture answer, but if you use clang and emit LLVM (-S -emit-llvm), it looks like the argument coercion is done in the frontend. e.g. gist.github.com/isbadawi/319cca53c18358464e957301de22561cLignocellulose
@IsmailBadawi: Yes, I was aware; it was one of the things that gave me reason to think that LLVM didn't handle it properly. But then again, I thought that a non-trivial front-end like Clang perhaps had reason to do the coercion manually anyway, so I didn't want to read too much into it.Repine
T
5

LLVM's support for C-language compatible calling convention is extremely limited I'm afraid. Several folks have wished for more direct calling convention support in LLVM (or a related library), but so far this has not emerged. That logic is currently encoded in the C-language frontend (Clang for example).

What LLVM provides is a mapping from specific LLVM IR types to specific C ABI lowerings for a specific CPU backend. You can see which IR types to use for a given C function by using Clang to emit LLVM IR, much as the comment above suggests: https://c.compiler-explorer.com/z/8jWExWPYq

struct S { int x, y; };

void f(struct S s);

void test(int x, int y) {
    struct S s = {x, y};
    f(s);
}

Turns into:

define dso_local void @test(i32 noundef %0, i32 noundef %1) #0 {
  %3 = alloca i32, align 4
  %4 = alloca i32, align 4
  %5 = alloca %struct.S, align 4
  store i32 %0, ptr %3, align 4
  store i32 %1, ptr %4, align 4
  %6 = getelementptr inbounds %struct.S, ptr %5, i32 0, i32 0
  %7 = load i32, ptr %3, align 4
  store i32 %7, ptr %6, align 4
  %8 = getelementptr inbounds %struct.S, ptr %5, i32 0, i32 1
  %9 = load i32, ptr %4, align 4
  store i32 %9, ptr %8, align 4
  %10 = load i64, ptr %5, align 4
  call void @f(i64 %10)
  ret void
}

declare void @f(i64) #1

There is sadly some non-trivial logic to map specific C types into the LLVM IR that will match the ABI when lowered for a platform. Outside of extremely simple types (basic C integer types, pointers, float, double, maybe a few others), these aren't even portable between the different architecture ABIs/calling-conventions.

FWIW, the situation is even worse for C++ which has much more complexity here I'm afraid.

So your choices are to:

  1. Use a very small set of types in a limited range of signatures that you build custom logic to lower correctly into LLVM IR, checking that it matches what Clang (or another C frontend) produces in every case.
  2. Directly use Clang or another C frontend to emit the LLVM IR.
  3. Take on the major project of extracting this ABI/calling-convention logic from Clang into a re-usable library. There has in the past been appetite for this in the LLVM/Clang communities, but it is a very large and complex undertaking from my understanding. There are some partial efforts (specifically for C and JITs) that you may be able to find and re-use, but I don't have a good memory of where all those are.
Tebet answered 4/1, 2023 at 8:12 Comment(5)
Just for the record, I've already (long since) done the work of lowering C types into LLVM myself. It's not as bad as you make it out to be, I do it in about 500 lines of C code (for x86-64 only, though), and that includes an implementation of va_arg. It's not exactly what I'd call "practical" and I'd still much rather LLVM could do it natively, of course, but it is very doable. I've also long considered, but not yet gotten around to, writing a simple-ish library for a higher-level IR that compiles to LLVM IR to do those kinds of things.Repine
The harder thing is to factor the current logic out of Clang, and continue to share it. An independent implementation more tractable but has its own challenges (the long tail of minute differences, especially across different architectures and platform ABIs, etc.).Tebet
Also, mostly answering this for posterity and future searches. Didn't imagine it'd help you out given the time lag. Sorry about that.Tebet
No, I get it, and it was nice to be able to mark the question as answered. I also think that much of the confusion comes from, as I mentioned, LLVM's documentation claiming the calling convention to be C-compatible when in fact it isn't. I seem to remember reporting this as a bug back when, but I can't find the bug report now.Repine
Agreed. FWIW, I would at most call in "C-inspired". ;] But I'm not contributing that much these days, just took a turn answering some SO questions.Tebet

© 2022 - 2025 — McMap. All rights reserved.