Generating intermediate code in a compiler. Is an AST or parse tree always necessary when dealing with conditionals?
Asked Answered
P

2

8

I'm taking a compiler-design class where we have to implement our own compiler (using flex and bison). I have had experience in parsing (writing EBNF's and recursive-descent parsers), but this is my first time writing a compiler.

The language design is pretty open-ended (the professor has left it up to us). In class, the professor went over generating intermediate code. He said that it is not necessary for us to construct an Abstract Syntax Tree or a parse tree while parsing, and that we can generate the intermediate code as we go.

I found this confusing for two reasons:

  • What if you are calling a function before it is defined? How can you resolve the branch target? I guess you would have to make it a rule that you have to define functions before you use them, or maybe pre-define them (like C does?)

  • How would you deal with conditionals? If you have an if-else or even just an if, how can you resolve the branch target for the if when the condition is false (if you're generating code as you go)?

I planned on generating an AST and then walking the tree after I create it, to resolve the addresses of functions and branch targets. Is this correct or am I missing something?

Promotion answered 19/3, 2011 at 1:28 Comment(0)
M
8

The general solution to both of your issues is to keep a list of addresses that need to be "patched." You generate the code and leave holes for the missing addresses or offsets. At the end of the compilation unit, you go through the list of holes and fill them in.

In FORTH the "list" of patches is kept on the control stack and is unwound as each control structure terminates. See FORTH Dimensions

Anecdote: an early Lisp compiler (I believe it was Lisp) generated a list of machine code instructions in symbolic format with forward references to the list of machine code for each branch of a conditional. Then it generated the binary code walking the list backwards. This way the code location for all forward branches was known when the branch instruction needed to be emitted.

Minutiae answered 19/3, 2011 at 1:42 Comment(3)
Alternately you can emit assembly code and let the assembler worry about that part of the problem. Or is that cheating?Plowboy
Sure, then the assembler will need to make two passes (or one and a half if it uses patching) over the code. Many compilers, FORTH in particular, target binary machine code directly. This could be for performance (both speed and space) or other pragmatic reasons (such as no available assembler).Minutiae
Some of us push the patching onto the linker. This means you can spit an object code stream (with placeholders for patches; I actually link the patch values together when I can) and issue the patches when the final location is known, at least for a very fast, one-pass compiler I built.Stafani
P
1

The Crenshaw tutorial is a concrete example of not using an AST of any kind. It builds a working compiler (including conditionals, obviously) with immediate code generation targeting m68k assembly.

You can read through the document in an afternoon, and it is worth it.

Plowboy answered 11/6, 2011 at 23:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.