Is there a Java bytecode optimizer that removes useless gotos?

Asked 3/6, 2009 at 12:37 Answered 30/6, 2009 at 12:48

java bytecode goto jvm-hotspot optimization

Problem: I have a method that compiles to over 8000 bytes of Java bytecode. HotSpot has a magic limit that makes the JIT not kick in for methods that exceed 8000 bytes. (Yes, it is reasonable to have a huge method. This is a tokenizer loop.) The method is in a library and I don't want to require users of the library to have to configure HotSpot to deactivate the magic limit.

Observation: Decompiling the bytecode shows that Eclipse Java Compiler generates a lot of pointless gotos. (javac is even worse.) That is, there are gotos that are only reachable from jumps. Obviously, the jump that jumps to the goto should instead jump directly where the goto jumps and the goto should be eliminated.

Question: Is there a bytecode optimizer for Java 5 class files that flattens pointless jump chains and then removes unnecessary gotos?

Edit: I mean patterns like:

8698:   goto    8548
8701:   goto    0

Obviously, the second goto can only be reached by a jump to 8701 which might as well be a direct jump to 0.

On a second investigation, this questionable pattern is more common:

4257:   if_icmpne   4263
4260:   goto    8704
4263:   aload_0

Where obviously, one would like the compiler to reverse the "not equal" comparison to "equal" comparison, jump to 8704 and eliminate the goto.

Hazaki answered 3/6, 2009 at 12:37 Comment(5)

Some architectures have a limit on how far a relative branch can go (because they hold the address in a 8 or 16 bit register) so they often got around that with a relative branch to a non-relative branch that used the full program counter size. Is the JVM like that? – Analytic 3/6, 2009 at 12:48

Do you mean LABELS only reachable from jumps? – Shrub 3/6, 2009 at 13:7

A runtime annotation to give a hint to the JVM sure would seem to be nice in this case....but I don't think such a thing exists (and a quick google doesn't turn up anything.) – Skinnydip 3/6, 2009 at 14:15

Interesting question. Why does the JIT not kick in for methods that exceed a magical limit? Is there a good engineering reason for it? – Aglow 3/6, 2009 at 14:27

I guess I was wrong about the relative jumps - I just looked and both if_cmpne and goto both take 16 bit signed ints. – Analytic 3/6, 2009 at 18:36

I feel your pain. I had to write a parser once that had around 5kloc of if(str.equals(...)) code. I broke into several methods along the lines of parse1, parse2, etc. If parse1 didn't result in a parsed answer, parse2 was called, etc. This isn't necessarily best-practices, but it does do what you need it to.

Neutralize answered 3/6, 2009 at 12:55 Comment(4)

btw: in case you didn't happen to know, you've implemented the "Chain of Responsibility" pattern with the parse1, parse2 etc approach (not that that's a bad or good thing for this problem; just wanted to mention it...) – Castled 3/6, 2009 at 13:0

I used to have multiple methods instead of one huge switch in a loop. However, the huge switch structure fits the spec better and is faster when the JIT does kick in. – Hazaki 3/6, 2009 at 13:8

Well you could always do a smaller switch() and the default case calls another method with a small switch and so on until all of your larger switch cases are covered. This wouldn't be as nice as one large switch table, but it would work. In my case I couldn't use switch since it doesn't work on Strings. – Neutralize 3/6, 2009 at 16:38

Yeah, I've considered having two different switch loops and shipping locals between the two. That's ugly though, and I want to keep the overall structure such that it's easy to mechanically compile the whole thing into gotoful C++ with even less switch. (Yes, I have unusual requirements. :-) – Hazaki 4/6, 2009 at 6:24

One method compiling to over 8000 bytes? Does anybody understand that code? Is is testable? Try to split it up into multiple (private?) methods with meaningfull names instead of hassling with the optimizer!

OK, maybe there are cases legitimate large methods. But sorry, there are no hints in the question.

Hymenium answered 3/6, 2009 at 12:49 Comment(2)

Sounds like he's writing a parser -- it is actually very reasonable for a tokenizer loop to get big, and it's actually more readable to keep it together. (I've had folks force me to break up such a method just because it was over n lines and the next month they complained because it was harder to follow...) – Castled 3/6, 2009 at 12:55

Yes, I understand the code. It also maps more closely to spec than any reformulation as recursive descent methods. Yes, it is testable with a large set of unit tests. – Hazaki 3/6, 2009 at 13:23

Does it make a difference if you don't compile with debug symbols (i.e. the -g flag in javac)? That might bring the method down below the magic limit.

Euraeurasia answered 3/6, 2009 at 12:57 Comment(1)

At least in the Eclipse Java Compiler this has no effect on the number of bytecodes in the compiled method. (I assume the debug table is separate.) – Hazaki 3/6, 2009 at 13:31

Would it be impossible to refactor the method into submethods? Modern JIT's inline those calls anyway.

Mefford answered 3/6, 2009 at 13:6 Comment(7)

if you've not hand-written a parser before, sometimes the scanner (lexer/tokenizer) can get big, but readability can go way down when you split it up. of course I haven't seen his code, but I've been there... – Castled 3/6, 2009 at 13:10

I've already split the actual tokenizer actions that can be split out without affecting control flow. The control structure itself is huge. – Hazaki 3/6, 2009 at 13:22

@scott, I've hand-written a parser before. If it is too complex (for any particular reason) it might be a reasonable time to consider a suitable parser generator. – Shrub 3/6, 2009 at 14:15

@hsivonen, well, I believe it is time you show us some code then :) – Shrub 3/6, 2009 at 14:15

@Thorbjørn, Sure - most of the time I prefer a parser generator; sometimes the parser grammar is very trivial and it's all about lexing, which often is very straightforward but can grow long quickly – Castled 3/6, 2009 at 15:57

@Thorbjørn Current rev in the middle of a refactoring: hsivonen.iki.fi/Tokenizer.java There used to be a per code unit read() method. I wanted to inline it manually in order to hoist crucial state variables from fields to locals. As far as I can tell, the only substantial piece eligible for extraction is stuff in CHARACTER_REFERENCE_LOOP – Hazaki 4/6, 2009 at 6:30

Based on the snippet, I can see why you have problems with this. It is basically a maze of goto's all alike with too many for(;;)'s and fallthroughs for my taste. I would suggest you rewrite so there is no "break foo" or "continue foo" which will most likely require boolean flags, which then in turn goes in the various "for"-loops. When you are there, you should be able to break up the logic easily. – Shrub 4/6, 2009 at 23:19

If it's a tokenizer loop, would it be better to do it with a data drivven set of mappings and a bit of reflection as appropriate?

So you would store your token matches in a structure that maps them to data about that token's syntax and methods implementing associated functions. Lookup can be optimised over the structure and you avoid the big loop.

That introduces the issue of keeping the data and implementation in sync, but you could generate the data from your codebase with a doclet or possibly annotation.

Without knowing exactly what your big method does, we're limited to trying to optimise it the way that you're assuming is best (and which is apparently not possible anyway).

Demography answered 3/6, 2009 at 14:34 Comment(2)

No, ideally you want the state transitions to compile down to jumps in code—not data structure lookups. – Hazaki 4/6, 2009 at 6:31

Surely that rather depends on whether you can optimise the lookup - both through efficient searching of the match space and through caching of the results. A large, hard coded series of token matches might be quite resistant to optimisation (which I presume is why you're looking for JIT help). In a previous project I used cached method references with quite good results. Changes in the 'source' only needed to invalidate the reference to propmpt a fresh lookup, and normal execution could run at good speed, firing off the method calls without any performance hit from token matching. – Demography 4/6, 2009 at 8:25

does your performance increase if you run a bytecode shrinker/obfuscator on your class? e.g., yguard, proguard, ...

maybe you can write a class file postprocessor using asm because your use case is so specific.

even if you remove all pointless gotos, does that bring you under the magic limit?

Hormuz answered 30/6, 2009 at 12:29 Comment(0)

A list of bytecode libraries mentions BCEL and ASM, that I'd heard of before, along with many others doing various things.

Abdella answered 30/6, 2009 at 12:48 Comment(0)

Recommended topics

Hot tags