How do C/C++ compilers work? [closed]
Asked Answered
E

11

35

After over a decade of C/C++ coding, I've noticed the following pattern - very good programmers tend to have detailed knowledge of the innards of the compiler.

I'm a reasonably good programmer, and I have an ad-hoc collection of compiler "superstitions", so I'd like to reboot my knowledge and start from the basics.

Can anyone recommend links to online resources or favorite books? I'm particularly interested in C/C++ compiling, optimization, GCC and LLVM.

Engle answered 6/7, 2009 at 5:0 Comment(3)
Take a look here for resources: https://mcmap.net/q/63539/-learning-to-write-a-compiler-closedFokos
GCC has an "internals" manual that documents some specific internal details, like its machine-description files, and the data structures it uses to represent function logic, and GIMPLE and RTL internal representations: gcc.gnu.org/onlinedocs/gccint. But it's not really an overview of how it works.Decision
what is the actual question here? is it "how do compilers work"? or "please recommend me a learning resource about how compilers work"?Gruver
S
28

Start with the dragon book....(stress more on code optimization and code generation)

Go onto write a toy compiler for an educational programming language like Decaf or Cool.., you may use parser generators (lex and yacc) for your front end(to make life easier and focus on more imp stuff)....

Then read gcc internals book along with browsing gcc source code.

Subalternate answered 6/7, 2009 at 5:26 Comment(6)
Thanks, nice sequence. I take the dragon book is : en.wikipedia.org/wiki/index.html?curid=188976Engle
Yes, that is the dragon book. I read the 1st edition. It had a much simpler dragon....Gladiator
Gah. People keep recommending this. Not me. Start with a casual introduction---say "Let's build a compiler"---then look at a Computer Sciencey reference with all the math and theory.Machos
I'd recommend against trying to understand GCC. It's fairly unusual as far as compilers go, and its architecture is poor by design (as in, the design is crippled on purpose. Yes, I'm serious. No, I'm not just making a joke at GCC's expense).Berkowitz
When it comes to understanding what you are doing, LEX and YACC just add an extra layer of technology that obscures your view of what's going on. If the goal is UNDERSTANDING how a compiler works a recursive decent parser will give you a better understanding than using LEX and YACC, and generally speaking if you're just doing it as a learning exercise you are probably not going to write a optimising compiler in your free time without someone else helping you.Bangalore
If you are interested in compiler optimizations only then you can try SUIFSubalternate
N
11
Needlefish answered 6/7, 2009 at 5:3 Comment(1)
I was thumbing through the GCC internals manual, it doesn't seem useful for "Learning" how a compiler works. It's not a teaching document it assumes that you already have a knowlege of the subject.Bangalore
B
11

Compiler Text are good, but they are a bit heavy for teaching yourself. Jack Crenshaw has a "Book" that was a series of articles you can download and read call "Lets Build a Compiler." It follows a "Learn By Doing" methodology that is great if you didn't get anything out of taking formal classes on the subject, or it's been WAY too many years since took it (that's my case). It holds your hand and leads you through writting a compiler instead of smacking you around with Lambda Calculus and deep theoretical issues that only academia cares about. It was a good way to stir up those brain cells that only had a fuzzy memory of writting something on the Vax (YEAH, that right a VAX!) many many moons ago at school. It's written very conversationally and easy to just sit down and read, unlike most text books which require several pots of coffee just to get past the first chapter. Once you have a basis for understanding then more traditional text such as the Dragon book are great references to expand on your understanding. (And personal I like the Dead Tree versions, I printed out Jack's, it's much easier to read in a comfortable position than on a laptop. And the Ebook readers are too expensive for something that doesn't actually feel like you're reading a real book yet.)

What some might call a "downside" is that it's written in Pascal, but I thought that just made me think about it more than if someone had given me a working C program to start with. Appart from that it was written with the 68000 in mind, which is only being used in embedded systems at this point time. Again for me this wasn't a problem, I knew 68000 asm and 68000 asm is easier to read than some other asm.

Bangalore answered 6/7, 2009 at 23:9 Comment(0)
G
9

If you want dead-tree edition, try The Art of Compiler Design: Theory and Practice.

Godbey answered 6/7, 2009 at 5:13 Comment(0)
K
4

As noted by Pete Eddy, Jack Crenshaw's tutorial is excellent for newbies. But if you want to see how to a real, production C compiler works—one which was designed by brilliant engineers instead of created by throwing code at the wall until something stuck—get yourself a copy of Fraser and Hanson's A Retargetable C Compiler: Design and Implementation, which contains the source code to the very clean lcc compiler. Explanations of the design and implementation are mixed in with the code. It is not a first book for a beginner, but it will repay careful study, and you can get a used copy for $35.

For a longer blurb about lcc, see Compile C Faster on Linux.

The lcc web page also has links to a number of good textbooks. I don't know of an intro text that I really like, however.

P.S. Sorry you got ripped off at Uni.

Kurtis answered 7/7, 2009 at 2:14 Comment(3)
Thanks for the tip - I will check lcc outEngle
Brillant Engineers? Jack Crenshaw designed parts of the space shuttle, and home made computers were a HOBBY of his. Not to dispute the intellect of folks who wrote lcc, but you don't have to be brilliant to design a compiler. It's really not that hard.Bangalore
The reference was not to Crenshaw but to gcc. RMS is many things, but brilliant engineer is not one of them. Then add 1000 monkeys and stir well...Kurtis
B
3

see Fabrice Bellard's otcc source code

http://bellard.org/otcc/

Bazan answered 11/7, 2009 at 10:54 Comment(0)
E
2

Depending on what you exactly want to know, you should have a look at pipes&filter pattern, because as far as I know this (or something similar) is used in a lot of compilers in the last years.

When my compiler knowledge is not too outdated it works like this:

Parse sourcecode into symbolic representation

Clean up symbolic representation, do some normalization

Optimization of the symbolic tree based on certain rules

write out executable code based on symbolic tree

Of course dependencies etc. have to be resolved too.

And of course having a look at gcc or javac sourcecode may help in getting more detailed understanding.

Explant answered 6/7, 2009 at 5:14 Comment(0)
G
1

It may also be valuable to pick up and read the source code to a compiler. I doubt that GCC is the best first choice, since it is burdened with full compatibility to more than 20 years of evolution of the language. But I'm also sure that a reading of its source, guided by one of the internal reference manuals, would be educational.

I'd seriously consider looking at the source to a scripting language that is internally compiled to a bytecode for a virtual machine. Several languages fit that description, but I would start with Lua. The language is small, and the VM is novel. The source code is also small and the bits I've looked at have been very clear although lightly commented.

Gladiator answered 6/7, 2009 at 8:3 Comment(0)
M
0

have a look on Kaleidoscope. You can write your own compiler in just a few days with LLVM.

Moralist answered 1/9, 2009 at 16:32 Comment(1)
I doubt that claim, considering that LLVM does not even support making C library calls. That's something you must implement in your frondend and that for every platform you ant to support. E.g. Clang has an implementation and so has Rust (Rust also uses LLVM as backend). This alone takes weeks to implement and without you cannot call any external system API. If you want a backend that is easy to use and that does offer you C API calls, check out QBE: c9x.me/compileDeloisedelong
D
0

very good programmers tend to have detailed knowledge of the innards of the compiler.

Very good programmers don't care about how compilers work, as compilers have no say in how programming languages work. Their task is it to transform source code into instructions that CPUs or interpreters can execute.

Compilers don't define programming languages, their rules, their expected behavior, or their edge cases, language standards do that and compilers have to follow those standards. They also don't define how CPUs work, CPU vendors or consortia define that and compilers have to work with what they get from those.

Also GCC works very different in many as aspects to LLVM, so it would not be very smart to optimize your C or C++ code for either of these compilers as then the code would perform poorly or not even compile with the other one. Also an optimization in your code that works great today may work horrible in the future, as compilers do change with releases, so you must never rely that your compiler will do or won't do something, as with the next release, everything may be different as it is today.

So unless you want to write your own C/C++ compiler, you don't need to know how current compilers operate as a C/C++ programmer. The next generation of compilers may be AI based and nothing you learn today may be relevant in 5 to 10 years anymore. Also C or C++ may not even be a relevant language anymore in a foreseeable future, considering that modern languages are way more powerful, have better error checking, and are easier to write - nothing new so far - but also meanwhile achieve comparable speeds. E.g. I have plenty of code samples where Swift code is equally fast than C++ code and where Rust code can even beat C code in speed, yet both offer far better compile time and runtime error checking and make it much harder to shoot yourself in the foot than C or C++.

If you want to learn how to write a compiler, there are tons of books about compiler theory but I don't recommend trying to write your own compiler backend, as no matter what you come up with, it will be worse than any existing backend out there. If anything, you write a compiler frontend and use an existing backend and all you need for a frontend is syntax parsing and converting your own language to the intermediate representation that your backend expects. From there the backend will take over and does all the hard work for you.

Deloisedelong answered 13/3 at 17:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.