Does anyone know why the JSR/RET bytecode pair is deprecated in Java 6?
The only meaningful explanation I found on the net was that they made code analysis by the runtime harder and slower to perform. Does anyone know another reason?
Does anyone know why the JSR/RET bytecode pair is deprecated in Java 6?
The only meaningful explanation I found on the net was that they made code analysis by the runtime harder and slower to perform. Does anyone know another reason?
JSR and RET make bytecode verification a lot more difficult than it might otherwise be due to the relaxation of some normal bytecode constraints (such as having a consistent stack shape on entry to a JSR). The upside is very minor (potentially slightly smaller methods in some cases) and the continuing difficulties in the verifier dealing with odd JSR/RET patterns (and potential security vulnerabilities, and the associated runtime cost of full verification) make it a non-useful feature to continue having.
Stack maps and the lighter-weight verifier that is enabled as a result of the data are a big performance win during class loading for no sacrifice in safety.
if
branch you push something, then on the other (else
) branch you also must push an equivalent number of things, in the final account. When the control flow merges (at the end of if
...else
), all branches must have the same stack depth. I think there are some additional checks on what gets pushed too (types-wise), but I don't recall the details on that right now. –
Chemisorb The people who use them to obfuscate bytecode explain why:
The
jsr
-ret
construct is particularly difficult to handle when dealing with typing issues because each subroutine can be called from multiple places, requiring that type information be merged and therefore a more conservative estimate [need be produced]. Also, decompilers will usually expect to find a specificret
for everyjsr
.
The last sentence is only relevant for obfuscators (like the soot
-based JBCO) which don't even put a ret
but pop
the return address, emulating a goto. That's still effective enough ~15 years later against some 'modern' decompilers:
org.benf.cfr.reader.util.ConfusedCFRException: Missing node tying up JSR block
That (relatively simple) trick aside, the first part of the quote says that (even) if used 'as originally designed' jsr
s cause (dataflow) analysis slowdowns. A bytecode verifier is a dataflow analyzer, see Leroy for an in-depth discussion--I should probably stop before I namedrop abstract interpretation here, although that's also [conceptually] involved in bytecode verification...
The first JVM bytecode verification algorithm is due to Gosling and Yellin at Sun [...]. Almost all existing bytecode verifiers implement this algorithm. It can be summarized as a dataflow analysis applied to a type-level abstract interpretation of the virtual machine.
But, in more detail in Leroy, there's this complication that jsr
introduced:
While any must-alias analysis can be used, Sun’s verifier uses a fairly simple analysis, whereas an uninitialized object is identified by the position (program counter value) of the
new
instruction that created it. More precisely, the type algebra is enriched by the types Cp denoting an uninitialized instance of class C created by anew
instruction at PC p. [...]Subroutines [meaning jsr-ret] complicate the verification of object initialization. As discovered by Freund and Mitchell [15], a
new
instruction inside a subroutine can result in distinct uninitialized objects having the same static type Cp, thus fooling Sun’s verifier into believing that all of them become initialized after invoking an initialization method on one of them. The solution is to prohibit or set to 'top' [=unitialized] all registers and stack locations that have type Cp across a subroutine call.Coglio [9] observes that Sun’s restriction on backward branches as well as Freund and Mitchell’s restriction on
new
are unnecessary for a bytecode verifier based on monovariant dataflow analysis. More precisely, [9, section 5.8.2] shows that, in the absence of subroutines, a register or stack location cannot have the type Cp just before a program point p containing anew
C instruction. Thus, the only program points where uninitialized object types in stack types or register types must be prohibited (or turned into 'top') are subroutine calls.
The relevant citations there being:
[15] Stephen N. Freund and John C. Mitchell. A type system for object initialization in the Java bytecode language. ACM Transactions on Programming Languages and Systems, 21(6):1196–1250, 1999.
[9] Alessandro Coglio. Improving the official specification of Java bytecode verification. Concurrency and Computation: Practice and Experience, 15(2):155–179, 2003.
The conclusions of the latter (2003) paper:
As evidenced by the above discussions, subroutines are a major source of complexity for bytecode verification. Even though the approach described in Section 5.9.5 is quite simple, it impacts on the whole verification algorithm, requiring the use of sets of type assignments. If subroutines did not exist, single type assignments would be sufficient; moreover, the harmful interaction with object initialization described in Section 5.8.3 would not happen.
Subroutines were introduced in Java bytecode to avoid code duplication when compiling finally blocks, but it has been found that very little space is actually saved in mundane code [21,27]. It is widely conjectured that it might have been better not to introduce subroutines in the first place. While future Java compilers could simply avoid the generation of subroutines, future versions of the JVM must be able to accept previously compiled code that may have subroutines. In other words, the need for backward compatibility prevents the total elimination of subroutines.
A subsequent (2004) paper by Coglio notes (on p. 666) that most Java bytecode verifier implementations violated the JVM spec and rejected some valid (spec-wise) programs involving subroutines.
Another titbit/criticism from Leroy:
While effective in practice, Sun’s approach to subroutine verification raises a challenging issue: determining the subroutine structure is difficult. Not only are subroutines not syntactically delimited, but return addresses are stored in general-purpose registers rather than on a subroutine-specific stack, which makes tracking return addresses and matching ret/jsr pairs more difficult. To facilitate the determination of the subroutine structure, the JVM specification states a number of restrictions on correct JVM code, such as “two different subroutines cannot ‘merge’ their execution to a single ret instruction” [33, section 4.9.6]. These restrictions seem rather ad-hoc and specific to the particular subroutine labeling algorithm that Sun’s verifier uses. Moreover, the description of subroutine labeling given in the JVM specification is very informal and incomplete.
Coglio's 2004 paper also noted that CLR has a built-in endfinally
at their VM opcode level, which avoids some issues with jsr/ret. It looks like that's because you can't freely jump in or out of such blocks in CLR by means of other instructions, while you can goto in/out of a 'subroutine', which has no specific/enforced boundaries as such in the JVM, which complicated matters.
© 2022 - 2024 — McMap. All rights reserved.