One of my side projects is a decompiler that turns native code into LLVM IR, simplifies it and outputs pseudo-C. An essential phase of the program's process is pattern-independent control flow structuring, which finds regions in a program and turns them into structured control flow statements (i.e. not gotos).
I had to roll out my own code for finding regions because LLVM's regions don't look exactly how the paper expects them. However, finding regions requires you to have a post-dominator tree, and it turns out that LLVM's post-dominator tree building algorithm can't make one for functions that have no end block, like functions that "end" with an endless loop.
This appears to be because the tree-building algorithm needs a starting point. Normally, the starting point is the function's returning block, since it post-dominates every other block; but there isn't any in functions that spin into an endless loop, since they don't have any return
or unreachable
terminator. (At this point, it might be worth noting that LLVM's region code also relies on a post-dominator tree and is also useless for functions for which it can't be built.)
It seems to me that even though this algorithm fails, the fact that a function doesn't return doesn't mean that you can't make a post-dominator tree for it.1 In fact, if that endless loop has a single back-edge (which is something that I can ensure), the node with that back edge necessarily post-dominates every other node in the graph, so it should be possible to make a post-dominator tree.
If I could find that node, I could probably feed it to LLVM's post-dom infrastructure and get a reasonable post-dominator tree out of it. Unfortunately, I'm not very imaginative and the only straightforward way that I can think of to identify that crucial node is that "it's the one that post-dominates everything", which certainly won't help me bootstrap a post-dominator tree.
Finding back edges isn't particularly hard. As Doug Currie says, you can do it with a simple DFS, and in fact, another part of my project does exactly that. However, in the case of a function with an endless loop and nested, terminating loops, I don't know how I would tell the inner back edge from the outer back edge without domination information. (If it can help, at this stage of the process, every loop is guaranteed to have a single entry node and at most one exit node.)
So how can I build the post-dominator tree of a function that doesn't have any returning basic block?
1. My compiler and graph theory background is entirely self-taught. This might not be accurate.
if (5 == 7) break;
? – Conker