Why are JSR/RET deprecated Java bytecode?
Asked Answered
S

2

18

Does anyone know why the JSR/RET bytecode pair is deprecated in Java 6?

The only meaningful explanation I found on the net was that they made code analysis by the runtime harder and slower to perform. Does anyone know another reason?

Skinner answered 3/5, 2011 at 14:37 Comment(4)
Do you mean deprecated by the JVM, or simply not used anymore by Oracle's Java compiler? I could not find a deprecation notice on the JVMS 7 docs.oracle.com/javase/specs/jvms/se7/html/…Gabriel
@CiroSantilli巴拿馬文件六四事件法轮功: After quite a lot of searching, I found the rule against these instructions in Java 7 classes (class file format 51.0). It's in §4.9.1 of the JVMS. See my Q&A on this for details.Vannoy
FWTW: sable.mcgill.ca/publications/papers/#sas2000 "By decoupling the type inference problem from the low level bytecode representation, and abstracting it into a constraint system, we show that there exists verifiable bytecode that cannot be statically typed. Further, we show that, without transforming the program, the static typing problem is NP-hard. In order to develop a practical approach we have developed an algorithm that works efficiently for the usual cases and then applies efficient program transformations to simplify the hard cases."Chemisorb
BTW, why does there need to be "another reason"? Isn't bytecode verification difficulties enough?Chemisorb
S
17

JSR and RET make bytecode verification a lot more difficult than it might otherwise be due to the relaxation of some normal bytecode constraints (such as having a consistent stack shape on entry to a JSR). The upside is very minor (potentially slightly smaller methods in some cases) and the continuing difficulties in the verifier dealing with odd JSR/RET patterns (and potential security vulnerabilities, and the associated runtime cost of full verification) make it a non-useful feature to continue having.

Stack maps and the lighter-weight verifier that is enabled as a result of the data are a big performance win during class loading for no sacrifice in safety.

Subject answered 3/5, 2011 at 16:54 Comment(4)
for the record: security vulnerabilities are not only potential, there has been a verifier bug in an older Java version, where using the SWAP bytecode to swap two return addresses on stack (JSR/JSR/SWAP/RET vulnerability) caused type confusion.Innards
What does having a consistent stack shape on entry means ?Pazpaza
The JVM uses a stack to represent operands for operations. eg: the iadd bytecode expects 2 integers on the stack, which it pops and then pushes the result of adding those together. So to be a consistent shape means that at any given bytecode position in the method, the stack is always the same depth and each element on the stack is of the right type (ie: int vs reference, etc..).Subject
@KodeWarrior: it checks that all branching code paths have the same stack depth among other things. So, if on an if branch you push something, then on the other (else ) branch you also must push an equivalent number of things, in the final account. When the control flow merges (at the end of if...else), all branches must have the same stack depth. I think there are some additional checks on what gets pushed too (types-wise), but I don't recall the details on that right now.Chemisorb
B
0

The people who use them to obfuscate bytecode explain why:

The jsr - ret construct is particularly difficult to handle when dealing with typing issues because each subroutine can be called from multiple places, requiring that type information be merged and therefore a more conservative estimate [need be produced]. Also, decompilers will usually expect to find a specific ret for every jsr.

The last sentence is only relevant for obfuscators (like the soot-based JBCO) which don't even put a ret but pop the return address, emulating a goto. That's still effective enough ~15 years later against some 'modern' decompilers:

org.benf.cfr.reader.util.ConfusedCFRException: Missing node tying up JSR block

That (relatively simple) trick aside, the first part of the quote says that (even) if used 'as originally designed' jsrs cause (dataflow) analysis slowdowns. A bytecode verifier is a dataflow analyzer, see Leroy for an in-depth discussion--I should probably stop before I namedrop abstract interpretation here, although that's also [conceptually] involved in bytecode verification...

The first JVM bytecode verification algorithm is due to Gosling and Yellin at Sun [...]. Almost all existing bytecode verifiers implement this algorithm. It can be summarized as a dataflow analysis applied to a type-level abstract interpretation of the virtual machine.

But, in more detail in Leroy, there's this complication that jsr introduced:

While any must-alias analysis can be used, Sun’s verifier uses a fairly simple analysis, whereas an uninitialized object is identified by the position (program counter value) of the new instruction that created it. More precisely, the type algebra is enriched by the types Cp denoting an uninitialized instance of class C created by a new instruction at PC p. [...]

Subroutines [meaning jsr-ret] complicate the verification of object initialization. As discovered by Freund and Mitchell [15], a new instruction inside a subroutine can result in distinct uninitialized objects having the same static type Cp, thus fooling Sun’s verifier into believing that all of them become initialized after invoking an initialization method on one of them. The solution is to prohibit or set to 'top' [=unitialized] all registers and stack locations that have type Cp across a subroutine call.

Coglio [9] observes that Sun’s restriction on backward branches as well as Freund and Mitchell’s restriction on new are unnecessary for a bytecode verifier based on monovariant dataflow analysis. More precisely, [9, section 5.8.2] shows that, in the absence of subroutines, a register or stack location cannot have the type Cp just before a program point p containing a new C instruction. Thus, the only program points where uninitialized object types in stack types or register types must be prohibited (or turned into 'top') are subroutine calls.

The relevant citations there being:

[15] Stephen N. Freund and John C. Mitchell. A type system for object initialization in the Java bytecode language. ACM Transactions on Programming Languages and Systems, 21(6):1196–1250, 1999.

[9] Alessandro Coglio. Improving the official specification of Java bytecode verification. Concurrency and Computation: Practice and Experience, 15(2):155–179, 2003.

The conclusions of the latter (2003) paper:

As evidenced by the above discussions, subroutines are a major source of complexity for bytecode verification. Even though the approach described in Section 5.9.5 is quite simple, it impacts on the whole verification algorithm, requiring the use of sets of type assignments. If subroutines did not exist, single type assignments would be sufficient; moreover, the harmful interaction with object initialization described in Section 5.8.3 would not happen.

Subroutines were introduced in Java bytecode to avoid code duplication when compiling finally blocks, but it has been found that very little space is actually saved in mundane code [21,27]. It is widely conjectured that it might have been better not to introduce subroutines in the first place. While future Java compilers could simply avoid the generation of subroutines, future versions of the JVM must be able to accept previously compiled code that may have subroutines. In other words, the need for backward compatibility prevents the total elimination of subroutines.

A subsequent (2004) paper by Coglio notes (on p. 666) that most Java bytecode verifier implementations violated the JVM spec and rejected some valid (spec-wise) programs involving subroutines.

Another titbit/criticism from Leroy:

While effective in practice, Sun’s approach to subroutine verification raises a challenging issue: determining the subroutine structure is difficult. Not only are subroutines not syntactically delimited, but return addresses are stored in general-purpose registers rather than on a subroutine-specific stack, which makes tracking return addresses and matching ret/jsr pairs more difficult. To facilitate the determination of the subroutine structure, the JVM specification states a number of restrictions on correct JVM code, such as “two different subroutines cannot ‘merge’ their execution to a single ret instruction” [33, section 4.9.6]. These restrictions seem rather ad-hoc and specific to the particular subroutine labeling algorithm that Sun’s verifier uses. Moreover, the description of subroutine labeling given in the JVM specification is very informal and incomplete.

Coglio's 2004 paper also noted that CLR has a built-in endfinally at their VM opcode level, which avoids some issues with jsr/ret. It looks like that's because you can't freely jump in or out of such blocks in CLR by means of other instructions, while you can goto in/out of a 'subroutine', which has no specific/enforced boundaries as such in the JVM, which complicated matters.

Burlington answered 6/7 at 6:44 Comment(1)
The relevant bit in the JVM spec was in 4.9.6 as cited by Coglio, but later (Java 8+) is in 4.10.2.5 (up to Java 21; I didn't check any newer). Not much has changed text-wise though, other than adding the remark that the section is for class file versions <= 50 (Java 7).Chemisorb

© 2022 - 2024 — McMap. All rights reserved.