EDIT 4: A bit of background/preamble.
"The only way to jump backward in the code is via a loop." in Peter's answer isn't strictly true. You could jump back and forth without it meaning it's a loop. A simplified case would be something like this:
0: goto 2
1: goto 3
2: goto 1
Of course, this particular example is very artificial and a bit silly. However, making assumptions as to how the source-to-bytecode compiler is going to behave could lead to surprises. As Peter and I have shown in our respective answers, two popular compilers can produce a rather different output (even without obfuscation). It rarely matters, because all of this tends to be optimised rather well by the JIT compiler when you execute the code.
This being said, in the vast majority of cases, jumping backwards will be a reasonable indication as to where a loop starts. Compared with the rest, finding out the entry point of a loop is the "easy" part.
Before considering any loop start/exit instrumentation, you should look into the definitions of what entry, exit and successors are. Although a loop will only have one entry point, it may have multiple exit points and/or multiple successors, typically caused by break
statements (sometimes with labels), return
statements and/or exceptions (explicitly caught or not). While you haven't given details regarding the kind of instrumentations you're investigating, it's certainly worth considering where you want to insert code (if that's what you want to do). Typically, some instrumentation may have to be done before each exit statement or instead of each successor statement (in which case you'll have to move the original statement).
Soot is a good framework to do this. It has a number of intermediate representations that make bytecode analysis more convenient (e.g. Jimple).
You can build a BlockGraph based on your method body, for example an ExceptionalBlockGraph. Once you've decomposed the control flow graph into such a block graph, from the nodes, you should be able to identity the dominators (i.e. blocks that have an arrow coming back to them). This will give you the start of the loop.
You may find something similar done in sections 4.3 to 4.7 of this dissertation.
EDIT:
Following the discussion with @Peter in comments to his answer. Talking the same example:
public int foo(int i, int j) {
while (true) {
try {
while (i < j)
i = j++ / i;
} catch (RuntimeException re) {
i = 10;
continue;
}
break;
}
return j;
}
This time, compiled with the Eclipse compiler (no specific option: simply autocompilation from within the IDE).
This code hasn't been obfuscated (apart from being bad code, but that's a different matter).
Here is the result (from javap -c
):
public int foo(int, int);
Code:
0: goto 10
3: iload_2
4: iinc 2, 1
7: iload_1
8: idiv
9: istore_1
10: iload_1
11: iload_2
12: if_icmplt 3
15: goto 25
18: astore_3
19: bipush 10
21: istore_1
22: goto 10
25: iload_2
26: ireturn
Exception table:
from to target type
0 15 18 Class java/lang/RuntimeException
There is a loop between 3 and 12 (jumped in starting a 10) and another loop, due to the exception occurring from the division by zero at 8 to 22.
Unlike the javac
compiler result, where one could make as guess that there was an outer loop between 0 and 22 and an inner loop between 0 and 12, the nesting is less obvious here.
EDIT 2:
To illustrate the kind of problems you may get with a less awkward example. Here is a relatively simple loop:
public void foo2() {
for (int i = 0; i < 5; i++) {
System.out.println(i);
}
}
After (normal) compilation within Eclipse, javap -c
gives this:
public void foo2();
Code:
0: iconst_0
1: istore_1
2: goto 15
5: getstatic #25; //Field java/lang/System.out:Ljava/io/PrintStream;
8: iload_1
9: invokevirtual #31; //Method java/io/PrintStream.println:(I)V
12: iinc 1, 1
15: iload_1
16: iconst_5
17: if_icmplt 5
20: return
Before doing anything within the loop, you jump straight from 2 to 15. Block 15 to 17 is the header of the loop (the "entry point"). Sometimes, the header block could contain far more instructions, especially if the exit condition involves more evaluation, or if it's a do {} while()
loop.
The concept of "entry" and "exit" of a loop may not always reflect what you'd write sensibly as Java source code (including the fact that you can rewrite for
loops as while
loops, for example). Using break
can also lead to multiple exit points.
By the way, by "block", I mean a sequence of bytecode into which you can't jump and out of which you can't jump in the middle: they're only entered from the first line (not necessarily from the previous line, possibly from a jump from somewhere else) and exited from the last (not necessarily to the following line, it can jump somewhere else too).
EDIT 3:
It seems that new classes/methods to analyse loops have been added since last time I had looked at Soot, which make it a bit more convenient.
Here is a complete example.
The class/method to analyse (TestLoop.foo()
)
public class TestLoop {
public void foo() {
for (int j = 0; j < 2; j++) {
for (int i = 0; i < 5; i++) {
System.out.println(i);
}
}
}
}
When compiled by the Eclipse compiler, this produces this bytecode (javap -c
):
public void foo();
Code:
0: iconst_0
1: istore_1
2: goto 28
5: iconst_0
6: istore_2
7: goto 20
10: getstatic #25; //Field java/lang/System.out:Ljava/io/PrintStream;
13: iload_2
14: invokevirtual #31; //Method java/io/PrintStream.println:(I)V
17: iinc 2, 1
20: iload_2
21: iconst_5
22: if_icmplt 10
25: iinc 1, 1
28: iload_1
29: iconst_2
30: if_icmplt 5
33: return
Here is a program that loads the class (assuming it's on the classpath here) using Soot and displays its blocks and loops:
import soot.Body;
import soot.Scene;
import soot.SootClass;
import soot.SootMethod;
import soot.jimple.toolkits.annotation.logic.Loop;
import soot.toolkits.graph.Block;
import soot.toolkits.graph.BlockGraph;
import soot.toolkits.graph.ExceptionalBlockGraph;
import soot.toolkits.graph.LoopNestTree;
public class DisplayLoops {
public static void main(String[] args) throws Exception {
SootClass sootClass = Scene.v().loadClassAndSupport("TestLoop");
sootClass.setApplicationClass();
Body body = null;
for (SootMethod method : sootClass.getMethods()) {
if (method.getName().equals("foo")) {
if (method.isConcrete()) {
body = method.retrieveActiveBody();
break;
}
}
}
System.out.println("**** Body ****");
System.out.println(body);
System.out.println();
System.out.println("**** Blocks ****");
BlockGraph blockGraph = new ExceptionalBlockGraph(body);
for (Block block : blockGraph.getBlocks()) {
System.out.println(block);
}
System.out.println();
System.out.println("**** Loops ****");
LoopNestTree loopNestTree = new LoopNestTree(body);
for (Loop loop : loopNestTree) {
System.out.println("Found a loop with head: " + loop.getHead());
}
}
}
Check the Soot documentation for more details on how to load classes. The Body
is a model for the body of the loop, i.e. all the statements made from the bytecode. This uses the intermediate Jimple representation, which is equivalent to the bytecode, but easier to analyse and process.
Here is the output of this program:
Body:
public void foo()
{
TestLoop r0;
int i0, i1;
java.io.PrintStream $r1;
r0 := @this: TestLoop;
i0 = 0;
goto label3;
label0:
i1 = 0;
goto label2;
label1:
$r1 = <java.lang.System: java.io.PrintStream out>;
virtualinvoke $r1.<java.io.PrintStream: void println(int)>(i1);
i1 = i1 + 1;
label2:
if i1 < 5 goto label1;
i0 = i0 + 1;
label3:
if i0 < 2 goto label0;
return;
}
Blocks:
Block 0:
[preds: ] [succs: 5 ]
r0 := @this: TestLoop;
i0 = 0;
goto [?= (branch)];
Block 1:
[preds: 5 ] [succs: 3 ]
i1 = 0;
goto [?= (branch)];
Block 2:
[preds: 3 ] [succs: 3 ]
$r1 = <java.lang.System: java.io.PrintStream out>;
virtualinvoke $r1.<java.io.PrintStream: void println(int)>(i1);
i1 = i1 + 1;
Block 3:
[preds: 1 2 ] [succs: 4 2 ]
if i1 < 5 goto $r1 = <java.lang.System: java.io.PrintStream out>;
Block 4:
[preds: 3 ] [succs: 5 ]
i0 = i0 + 1;
Block 5:
[preds: 0 4 ] [succs: 6 1 ]
if i0 < 2 goto i1 = 0;
Block 6:
[preds: 5 ] [succs: ]
return;
Loops:
Found a loop with head: if i1 < 5 goto $r1 = <java.lang.System: java.io.PrintStream out>
Found a loop with head: if i0 < 2 goto i1 = 0
LoopNestTree
uses LoopFinder
, which uses an ExceptionalBlockGraph
to build the list of blocks.
The Loop
class will give you the entry statement and the exit statements. You should then be able to add extra statements if you wish. Jimple is quite convenient for this (it's close enough to the bytecode, but has a slightly higher level so as not to deal with everything manually). You can then output your modified .class
file if needed. (See the Soot documentation for this.)