Compiler Optimizations Questions

Asked 6/5, 2009 at 0:7 Answered 30/7, 2009 at 15:7

Solved optimization compiler-construction compiler-development

What are some of the ways a compiler eliminates repeated subexpressions recomputations? How do you keep track of the sub-expressions? And How do you identify the repeated ones?
Besides the usage of bitwise operators, what are some of the strength reduction techniques common compilers use?

Pearson answered 6/5, 2009 at 0:7 Comment(1)

You should split this question into two parts, and fix the title. (I dont have the power yet). – Lyallpur 30/7, 2009 at 15:15

I believe many compilers use SSAPRE (Static Single Assignment Partial Redundancy Elimination) to eliminate repeated expressions. This requires the code to be in SSA form, allowing many more optimizations.
I'm not really sure about this one, but look at this list of LLVM passes. LLVM is an optimizing IR for compilers that is often faster than even GCC. There is a small explanation of each pass. If you need more info, look at the LLVM source for these passes. It is written in C++ but is quite clean and understandable.

Edit: By the way, if you're developing a compiler, I highly recommend LLVM, it is very easy to use and generates highly optimized code.

Relief answered 6/5, 2009 at 0:27 Comment(3)

WOW Thank you. Those links were really helpful. – Pearson 6/5, 2009 at 2:26

Definitely agree on LLVM. We use it in my research group. It's great. – Amparoampelopsis 6/5, 2009 at 5:20

Don't recommend SSAPRE. It requires HSSA, which your compiler almost certainly won't provide. – Lyallpur 30/7, 2009 at 15:9

For 1, The name of the optimization you're looking for is common subexpression elimination (CSE). Depending on your representation, this can be fairly easy. Usually, a compiler will have some intermediate representation of a program where operations are broken down as much as possible and linearized. So for example, the expression c = a * b + a * b might be broken down as:

v1 = a * b
v2 = a * b
c = v1 + v2

So you could do CSE at a very low level by looking for operations with the same operator and operands. When you encounter a duplicate (v2 in this case), you replace all instances of it with the original. So we could simplify the code above to be:

v1 = a * b
c = v1 + v1

This generally assumes that you only assign each variable once (single static assignment form), but you can implement something like this without that restriction. This gets more complicated when you try and perform this optimization across branches. As Zifre mentions, look into Partial Redundancy Elimination.

Either way, you get some basic improvement, and all you need to keep track of are basic expressions. You may want to take this a step further and look for arithmetic identities. For instance, a * b is the same as b * a. Also, x * (y + z) = x * y + x * z. This makes your optimization more complicated, and it's not clear that it would give you that much performance improvement. Anecdotally, most of the benefit from a CSE optimization comes from address computations like array accesses, and you won't need complicated identities like the ones above.

For 2, what strength reductions are useful really depends on the architecture you compile for. Usually this just involves transforming multiplications and divisions into shifts, additions, and subtractions.

Amparoampelopsis answered 6/5, 2009 at 0:35 Comment(2)

what if the repeated ones are not in the same expression? For example x= a * b and somewhere down y = a* b. Is there someway I can detect the repeated a*b? – Pearson 6/5, 2009 at 3:37

It is a repeat though. In that case, you replace all the future uses of y with x. – Amparoampelopsis 6/5, 2009 at 5:19

I would highly recommend two printed references on these subjects:

Advanced Compiler Design & Implementation by Steven S. Muchnick
Building an Optimizing Compiler by Robert Morgan

The Muchnick book is on the formal side but is very readable and has good descriptions of all of the important optimization techniques. The Morgan book has a much more hands-on feel and would be a great basis for a compiler project focused on optimization techniques. Neither book has much to say about lexical analysis or parsing, knowledge of these subjects is assumed.

Renelle answered 6/5, 2009 at 0:44 Comment(0)

I believe many compilers use SSAPRE (Static Single Assignment Partial Redundancy Elimination) to eliminate repeated expressions. This requires the code to be in SSA form, allowing many more optimizations.
I'm not really sure about this one, but look at this list of LLVM passes. LLVM is an optimizing IR for compilers that is often faster than even GCC. There is a small explanation of each pass. If you need more info, look at the LLVM source for these passes. It is written in C++ but is quite clean and understandable.

Edit: By the way, if you're developing a compiler, I highly recommend LLVM, it is very easy to use and generates highly optimized code.

Relief answered 6/5, 2009 at 0:27 Comment(3)

WOW Thank you. Those links were really helpful. – Pearson 6/5, 2009 at 2:26

Definitely agree on LLVM. We use it in my research group. It's great. – Amparoampelopsis 6/5, 2009 at 5:20

Don't recommend SSAPRE. It requires HSSA, which your compiler almost certainly won't provide. – Lyallpur 30/7, 2009 at 15:9

To add one more book to the list of recommendations, check out "Hacker's Delight" by Henry S. Warren. It's a great compendium of techniques for optimizing common operations, like transforming integer divisions into multiplications.

Seashore answered 7/5, 2009 at 23:5 Comment(0)

You're looking for partial-redundancy elimination (PRE). Both CSE (from the other answers) and loop-invariant code motion are subsumed by PRE. (A variation of PRE is Lazy Code Motion, which I believe is optimal).

Check out Keith Cooper's lecture notes, which seem to describe the techniques very well.

Do NOT use SSAPRE. AFAIK, this requires a particular form of SSA known as HSSA, which has a few downsides:

Its pretty complicated
It requires global value numbering (and so SSAPRE doesn't provide value numbering, as its expected to exist already).
It doesn't provide anything if your language doesn't support pointers to stack variables (and if it does, stop writing your own analysis and use LLVM or gcc).
gcc used HSSA for a while, but they have moved away from it.
LLVM experimented with it, but AFAIK they don't use it anymore.

EDIT:

Muchnick's book has a detailed description, its linked in another answer.

Lyallpur answered 30/7, 2009 at 15:7 Comment(3)

do you know why GCC has moved away from HSSA, and why LLVM decided not to use it? Current versions of GCC (7.0) mark variables with VDEF and VUSE, this looks similar to HSSA to me. Do you know if they are related? – Birdwell 8/5, 2017 at 20:52

Sorry, it's been a long time and I wasn't involved. I literally don't remember anything about this :( – Lyallpur 9/5, 2017 at 0:16

Know that feel bro. – Birdwell 9/5, 2017 at 17:17

EDIT:

Recommended topics

Hot tags