Compiler output language - LLVM IR vs C
Asked Answered
K

5

19

For writing a compiler, what are the advantages and disadvantages of using LLVM IR vs C for a target language? I know both are used, and I imagine that the final machine code would be similar if I were to use clang to compile the C. So what are other things to consider?

Krisha answered 22/4, 2012 at 2:8 Comment(0)
N
12

I've used LLVM IR for a few compiler back ends and have worked with compilers that use C as a back end. One thing that I found that gave the LLVM IR an advantage is that it is typed. It is hard to make completely ill-formed output without getting errors from the LLVM libraries.

It is also easier to keep a close correlation between the source code and the IR for debugging, in my opinion.

Plus, you get all the cool LLVM command line tools to analyse and process the IR your front end emits.

Neoplasm answered 22/4, 2012 at 12:24 Comment(2)
What do you mean IR is typed? Isn't C typed as well?Krisha
Right, C is typed. But you don't get an indication of the error until you try to compile the C code. With LLVM IR you get an indication of the error when you generate the IR. Much easier to debug.Neoplasm
G
3

I doubt you can implement proper debugging support for your language when targeting C.

Gust answered 22/4, 2012 at 13:19 Comment(1)
That was exactly the reason I've been searcing for this thread. I see no way there could be "source maps" on debug symbols, because there are backwards incompatible changes in C compilers' debug symbols generators. One would have to update debug symbol mapping software with each change of supported C compilers.Hourly
H
3

LLVM advantages:

  1. JIT - you can compile and run your code dynamically. Sure the same is possible with C (e.g., using an embedded tcc), but it is a much less robust and portable option.
  2. You can run your own optimisation passes over the generated IR.
  3. Reflection for free - inspecting the generated code is much easier with LLVM.
  4. LLVM library is not as big as most of the C compilers (not counting tcc, of course).

LLVM drawbacks:

  1. Code is not portable, you have to change it slightly depending on your target. There is a somewhat portable subset of LLVM, but it is still a dodgy practice.
  2. Runtime dependency on the C++ libraries, might be a bit of an issue.
Holdfast answered 23/4, 2012 at 11:43 Comment(1)
you forgot: if you want C interop (which language doesn't?) you have to code all those nasty C ABIs yourself because llvm doesn't do that all by itself (it splits that work 50/50 with clang)Trail
T
1

Architectures and OSes for which there is no CLANG obviously, or for which it is in an experimental state.

C is more widely accepted, but LLVM IR allows you to spoon feed the LLVM engine. Not all paths to IR are equal.

Tarrel answered 22/4, 2012 at 9:51 Comment(0)
G
1

I will use LLVM to refer to the framework, and LLVM IR to refer to the target language.

C Advantages

  1. Cross-platform
  2. Debugging (Please read below. It is partly related to point 4.)
  3. Interoperability
  4. Ease of use

LLVM IR Advantages

  1. Performance
  2. Customization options
  3. Memory footprint
  4. Strong typing/Saftey

C

  1. There exist C-compilers for all sorts of embedded systems even though LLVM has gotten more targets as of late. It can be argued that C has a slight advantage over the LLVM IR (Intermediate representation) in this category.

  2. The main advantage of targetting C instead of LLVM is that the generated code is on a higher level compared to LLVM. Using standardized debuggers such as the GDB, it can be argued that it is easier to reason about the behavior of the generated code. It is also easier to make use of a debugger such as GDB to construct a debugger for the language compiled to C.

  3. The third point. Interoperability is fussier. However, C has a standardized application binary interface. It is thus easier to write libraries and interface these libraries with other programs written in C and or C+. Still, many languages, such as Java, provide standardized interfaces to C.

  4. It can be argued that it is easier to get started and get something working by targetting C

LLVM

  1. C is a quite high-level language, and if it is not written, idiomatically, performance may degrade (Depending on the target compiler, and what assumptions said compiler makes). There are some papers such as An llVM backend for GHC which illustrates some disadvantages of C and advantages of LLVM IR as a target language.

  2. Since LLVM (The framework) is built as a collection of reusable units, it is easy to write target language-specific passes for your specific target language. It is also easier to write a custom GC (There is as of 2020 some support for this). In the case of C it is also possible, and there are some garbage collectors such as Boehm GC. However, C is not designed as an intermediate language.

  3. Memory footprint. Generated C code has a larger memory footprint compared to LLVM bitcode. If you are compiling and linking a big system, you are likely to get compilation time advantages targetting LLVM.

  4. While C is weakly typed language. LLVM IR is a strongly typed one. It can, therefore, be argued that it is safer to target LLVM IR.

Gharry answered 22/7, 2020 at 13:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.