Easiest way to work with intermediate format
Asked Answered
S

3

1

A tool I'm working on needs to take the intermediate format generated by the compiler, add some code to it and then give that modified intermediate code to the backend of the compiler to generate the final code.

By doing a little research on gcc, I found that the GIMPLE format is easy to understand, but I'm not sure about the complexity of modifying the GIMPLE code and don't know of any way to restart the compilation from there except using plugins and adding your own pass. Also people warned me that documentation is scarce and the going gets tough when you are stuck when working with gcc.

Another option is to use the LLVM bytecode. But I have never worked with LLVM, so don't know how complex my task would be with LLVM. There maybe even better options I'm not aware of. Therefore, I just want to know the best option. My preferences are the following.

  • Platform independency
  • Easy to use
  • Well documented
  • More people using it, so more help available
Strontian answered 2/2, 2012 at 17:50 Comment(5)
Data point regarding LLVM: They don't shy away from changing APIs that suck. You could argue that this makes keeping up a hassle, but it also means what exists today is likely good and likely to improve instead of becoming old and busted. It's also modern C++ - if you can swallow that, it'll make your life easier.Rillings
@delnan, I have read that LLVM is not very platform independant. Is that true? I want an intermediate code which can be compiled for any platform. Regarding LLVM using C++, I'm pretty cool with that, infact I prefer C++ over C.Strontian
Define platform independent. There aren't as many backends as for GCC, but numerous common architectures are covered and I'm not aware of any issues preventing addition of more backends. But make no mistake, there are issues that won't be resolved soon (for instance, C++ exception handling on Windows).Rillings
MetallicPriest, Can you describe a task to be solved with you tool? Is it a form of instrumentation? Can you give some examples like "Convert each X into X Y Z"? AFAIK, llvm is better designed to create third-party passes; its LLVM IR can be easyly dumped and reloaded back.Argument
osgx: Basically I want to do both instrumentation and modification. Modification would be done after performing instrumentation. A part of instrumentation is to count the number of instructions.Strontian
A
1

As you probably already know, MELT is a high-level domain specific language to extend GCC. You can work on Gimple (etc...) quite easily with it (and also modify the internal representations in Gcc)

However, extending GCC means some work, because the Gimple (and also the Tree) representations (with others, eg Edges..) are complex...

Abrogate answered 2/2, 2012 at 17:57 Comment(0)
P
1

According to your description, LLVM fits the bill perfectly. One of its main aims is to serve as a flexible library and framework for manipulation of IR code. The countless optimization, transformation and analysis "passes" it comes with serve both as a proof and as great examples. IMO LLVM also answers the 4 points you list in your question very well:

  • Platform independency: LLVM runs on the major platforms (Linux, Mac and Windows) and knows how to generate code for many CPU types.
  • Easy to use: IR and compiler backends are a difficult area to hack, but as far as such things go LLVM is a good candidate since it's a relatively new project, well documented, with a very clean code base.
  • Well documented: knock yourself out
  • More people using it: very active development and usage, with some corporations already heavily invested in it (most notably Apple and Google).
Patch answered 6/2, 2012 at 17:42 Comment(0)
T
0

This may not be helpful at all, but I wondered about the processing passes of gcc. The abridged (pared down mostly to exec/fork calls) output from strace -f -o gcc.strace gcc -c tstamp.c:

7141  execve("/usr/bin/gcc", ["gcc", "-c", "tstamp.c"], [/* 52 vars */]) = 0
7141  open("/tmp/ccqzaCI4.s", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0600) = 3
7141  close(3)                          = 0
7141  vfork( <unfinished ...>
7142  execve("/usr/libexec/gcc/i686-redhat-linux/4.6.1/cc1", ["/usr/libexec/gcc/i686-redhat-lin"..., "-quiet", "tstamp.c", "-quiet", "-dumpbase", "tstamp.c", "-mtune=generic", "-march=i686", "-auxbase", "tstamp", "-o", "/tmp/ccqzaCI4.s"], [/* 55 vars */] <unfinished ...>
7141  <... vfork resumed> )             = 7142
7141  waitpid(7142,  <unfinished ...>
7142  <... execve resumed> )            = 0
7142  open("tstamp.c", O_RDONLY|O_NOCTTY|O_LARGEFILE) = 3
7142  close(3)                          = 0
7142  open("/tmp/ccqzaCI4.s", O_RDWR|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3
7142  open("/usr/include/stdio.h", O_RDONLY|O_NOCTTY|O_LARGEFILE) = 4
... (opens and closes every include file)
7142  close(4)                          = 0
7142  close(3)                          = 0
7142  exit_group(0)                     = ?
7141  <... waitpid resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 7142
7141  vfork( <unfinished ...>
7143  execve("/usr/bin/as", ["as", "--32", "-o", "tstamp.o", "/tmp/ccqzaCI4.s"], [/* 55 vars */] <unfinished ...>
7141  <... vfork resumed> )             = 7143
7141  waitpid(7143,  <unfinished ...>
7143  <... execve resumed> )            = 0
7143  unlink("tstamp.o")                = 0
7143  open("tstamp.o", O_RDWR|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3
7143  open("/tmp/ccqzaCI4.s", O_RDONLY|O_LARGEFILE) = 4
7143  close(4)                          = 0
7143  close(3)                          = 0
7143  exit_group(0)                     = ?
7141  <... waitpid resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 7143
7141  unlink("/tmp/ccqzaCI4.s")         = 0
7141  exit_group(0)                     = ?

cc1 has all the applicable logic. I imagine that is a complex program, especially after typing:

/usr/libexec/gcc/i686-redhat-linux/4.6.1/cc1 --help

and

/usr/libexec/gcc/i686-redhat-linux/4.6.1/cc1 --help=C
Thetos answered 2/2, 2012 at 18:27 Comment(9)
You don't need to strace, just run gcc -v (with other arguments) and it will show you what processes it is starting... Indeed, most of the compilation proper is done by its cc1...Abrogate
To see internal compiler passes (not linker and not assembler) start gcc with top-secret options: -fdump-rtl-all-all -fdump-tree-all-all -fdump-ipa-all-all = this will dump internal representation of the program after EACH translation and optimization phase. To see how linker works use gcc option -Wl,--verbose. But.. stop.. You has 18k and this post is not an answer, this is something like question, isn't it?Argument
@osgx: agreed, it is not a full answer, only contributing clues for further research. The information you added is very valuable.Thetos
But according to #9104633, Metallic already knowns about some -fdump- options.Argument
@osgx: Okay, looks like my contribution doesn't help. I will delete shortly (since I don't think you have enough rep to see deleted answers).Thetos
You can't. If you will delete you answer, the top-secret set of gcc options will be lost forever. They are so secret that even google doesn't know about them.Argument
@osgx: Maybe you will add that as an answer?Thetos
Again no. This will be not an answer for the question by Metallic; I even can't understand what kind of pragma he wants to implement. Can you ask new question "What are passes of GCC and what programs are called by GCC" or like?Argument
@osgx: I have asked here.Thetos

© 2022 - 2024 — McMap. All rights reserved.