Why not remove type erasure from the next JVM?
Asked Answered
C

3

43

Java introduced type erasure with generics in Java 5 so they would work on old versions of Java. It was a tradeoff for compatibility. We've since lost that compatibility[1] [2] [3]--bytecode can be run on later versions of the JVM but not earlier ones. This looks like the worse possible choice: we've lost type information and we still can't run bytecode compiled for newer versions of the JVM on older versions. What happened?

Specifically I'm asking if there are any technical reasons why type erasure couldn't be removed in the next version of the JVM (assuming, like previous releases, its bytecode won't be able to run on the last version anyway).

[3]: Type erasure could be backported in a manner similar to retrolambda for those who really like it.

Edit: I think the discussion of the definition of backwards vs. forwards compatibility is obscuring the question.

Carencarena answered 27/6, 2016 at 13:56 Comment(7)
Backwards compatibility was never lost. An old Java 1.1 program will most probably still run smoothly on a JRE 8 VM, which is what the linked posts say.Heterophyllous
@Heterophyllous correct me if I'm wrong, but that's forwards compatibility: your code will run on future JVMs. If we remove type erasure in JVM version X, it'll still be removed in JVM X+1, so our code will still run.Carencarena
Not exactly, refer also to #4693126Heterophyllous
@Heterophyllous I'm aware of that answer and I think we're confusing terms rather than ideas. It's patently obvious that certain Java 8 features will not work on JVM 7, hence the existence of projects like retrolambda. Here's a list of incompabilities from Oracle: oracle.com/technetwork/java/javase/… . Quoting it: "Class files built with the Java SE 8 compiler will not run on earlier releases of Java SE".Carencarena
@Carencarena - but that is NOT what backwards compatilbiity means in the context of Java.Serranid
Java 8 is backwards compatible with code written for older versions of Java. You may think of your code being forwards compatible with future JREs, but it is actually those future JREs that are backwards compatible with your old code. Your question shows a lack of understanding what backwards compatible means in Java.Alfred
@Carencarena you are correct, posters are wrong. the code you write is forward compatible. it compiles and runs on JVM 8,9,10,11... the byte code generated is not backward compatible. i. e you cannot run it on jvm 5,4,3,2. the JVM itself is backward compatible because it can run JVM 5,4,3,2 code. and so is the JDK because it supports compilation of older code. Type errasure is a known problem and research and discussion is going on to either change it or remove it.Clavicorn
A
19

To some extent erasure will be removed in the future with project valhalla to enable specialized implementations for value types.

Or to put it more accurately, type erasure really means the absence of type specialization for generics, and valhalla will introduce specialization over primitives.

Specifically I'm asking if there are any technical reasons why type erasure couldn't be removed in the next version of the JVM

Performance. You don't have to generate specialized code for all combinations of generic types, instances or generated classes don't have to carry type tags, polymorphic inline caches and runtime type checks (compiler-generated instanceof checks) stay simple and we still get most of the type-safety through compile-time checks.

Of course there are also plenty of downsides, but the tradeoff has already been made, and the question what would motivate the JVM devs to change that tradeoff.

And it might also be a compatibility thing, there could be code that performs unchecked casts to abuse generic collections by relying on type erasure that would break if the type constraints were enforced.

Ammoniate answered 27/6, 2016 at 15:31 Comment(3)
Why does having values of generic type parameters at runtime imply specialization?Diplopia
Your last paragraph is very insightful. I've heard that type erasure breaks encapsulation, but I haven't yet found an example of that. Maybe by abusing type erasure by casting to a raw collection it would be possible to create a type and then cast the object to that in such a way so one could get access to otherwise encapsulated object state. Or maybe using reflection. I don't know.Bernie
@IgorDonin to name a real life example, when you do streamOfStrings.collect(Collectors .groupingBy(someFunction, Collectors.joining())), the underlying implementation will use the same HashMap which is eventually returned as Map<…, String> to store the intermediate StringBuilder instances, which are converted to the result strings in a final step. This is a violation of the generic type system, to avoid having to copy the entire Map<…,StringBuilder> to another Map<…,String> at the end. This is the kind of code that would break if generic types were suddenly enforced at runtime.Coeternal
C
22

Type erasure is more than just a byte code feature that you can turn on or off.

It affects the way the entire runtime environment works. If you want to be able to query the generic type of every instance of a generic class, it implies that meta information, comparable to a runtime Class representation, is created for each object instantiation of a generic class.

If you write new ArrayList<String>(); new ArrayList<Number>(); new ArrayList<Object>() you are not only creating three objects, you are potentially creating three additional meta objects reflecting the types, ArrayList<String>, ArrayList<Number>, and ArrayList<Object>, if they didn’t exist before.

Consider that there are thousand of different List signatures in use in a typical application, most of them never used in a place where the availability of such Reflection is required (due to the absence of this feature, we could conclude that currently, all of them work without such a Reflection).

This, of course, multiplies, thousand different generic list types imply thousand different generic iterator types, thousand spliterator and Stream incarnations, not even counting the internal classes of the implementation.

And it even affects places without an object allocation which are currently exploting the type erasure under the hood, e.g. Collections.emptyList(), Function.identity() or Comparator.naturalOrder(), etc. return the same instance each time they are invoked. If you insist on having the particalar captured generic type reflectively inspectable, this won’t work anymore. So if you write

List<String> list=Collections.emptyList();
List<Number> list=Collections.emptyList();

you would have to receive two distinct instances, each of them reporting a different on getClass() or the future equivalent.


It seems, people wishing for this ability have a narrow view on their particular method, where it would be great if they could reflectively find out whether one particular parameter is actually one out of two or three types, but never think about the weight of carrying meta information about potentially hundreds or thousands generic instantiations of thousands of generic classes.

This is the place where we have to ask what we gain in return: the ability to support a questionable coding style (this is what altering the code’s behavior due to information found via Reflection is all about).


The answer so far only addressed the easy aspect of removing type erasure, the desire the introspect the type of an actual instance. An actual instance has a concrete type, which could be reported. As mentioned in this comment from the user the8472, the demand for removal of type erasure often also implies the wish for being able to cast to (T) or create an array via new T[] or access the type of a type variable via T.class.

This would raise the true nightmare. A type variable is a different beast than the actual type of a concrete instance. A type variable could resolve to a, e.g. ? extends Comparator<? super Number> to name one (rather simple) example. Providing the necessary meta information would imply that not only object allocation becomes much more expensive, every single method invocation could impose these additional cost, to an even bigger extend as we are now not only talking about the combination of generic classes with actual classes, but also every possible wildcarded combination, even of nested generic types.

Keep in mind that the actual type of a type parameter could also refer to other type parameters, turning the type checking into a very complex process, which you not only have to repeat for every type cast, if you allow to create an array out of it, every storage operation has to repeat it.

Besides the heavy performance issue, the complexity raises another problem. If you look at the bug tracking list of javac or related questions of Stackoverflow, you may notice that the process is not only complex, but also error prone. Currently, every minor version of javac contains changes and fixes regarding generic type signature matching, affecting what will be accepted or rejected. I’m quite sure, you don’t want intrinsic JVM operations like type casts, variable assignments or array stores to become victim of this complexity, having a different idea of what is legal or not in every version or suddenly rejecting what javac accepted at compile-time due to mismatching rules.

Coeternal answered 27/6, 2016 at 17:38 Comment(17)
you are assuming that it would only be used via reflection. but non-erased generics wouild mean that the type variables in classes (e.g. T) would be real, i.e. casting via (T) would be an actual cast and provide fail-fast behavior, new T[] would create an array T.class could give you a class object. Also, while you're right that the metadata objects would need to be created they would still only be a constant factor over the number of classes since the generic signatures are ultimately driven by allocation callsites of which there only is a finite amount per class.Ammoniate
@the8472: I focused on the easy part, as what you are describing makes things much worse. Since there are also generic methods, the features you describe would imply that each invocation can bear that overhead, not only object allocation sites. I expended my answer to address some of the related issues. Of course, the number of call sites is finite as well, but the number of stars in our universe might be finite too…Coeternal
good point, distinguishing between type bounds and concrete types adds another source of complexity.Ammoniate
This answer is not convincing. Meta information is available at runtime (through getGenericType(), so it's already managed; we are just not allowed to write (non-reflective) code that uses it. Regarding performance, you'll have to compare the compound runtime with all the nasty workarounds that exist to get nice designs out of Java generics, not with a world where you don't use them (or don't try to work around the limitations).Diplopia
@Diplopia getGenericType() will only provide you the declaration, not the actual type of an instantiation. You can use Reflection to find out that the declaration of the type java.util.List is List<T>; that doesn’t say anything about the actual parameterization of the thousands of list instances. You can query the generic type of variables, i.e. fields and parameters (not local variables), but the same object can be referenced by dozens of different variables of different type; the object itself does not have a generic type. As shown by example in the answer.Coeternal
@Coeternal Ah, I misunderstood. Thanks.Diplopia
@Coeternal getGenericType() will only provide you the declaration, not the actual type of an instantiation that's a great point here. I don't understand why people say that type erasure erases everything, while public static int size(List<Integer> list) { return list.size(); } compiles to public static int size(java.util.List<java.lang.Integer>);. So type erasure works at call-sites, really, in my understanding at least...Weighty
@Weighty First of all, Generics do not alter the way compiled code works at all. Letting Reflection aside, you can run compiled generic code on a JVM which doesn’t know anything about Generics at all. That’s why for int size(List<Integer> list), the generic signature will be stored, but you still can’t have a method int size(List<String> list) in the same class, as having two int size(List) methods in one class is forbidden.Coeternal
@Coeternal I understand that (I think). my point was that the generic declaration is stored after compilation in the byte code, it is erased from the byte code at all call-sites that use that. And that example with size(List<String> list) could work if the return type would be different and a different compiler, not javacWeighty
I don't accept these trade offs as either necessary or inevitable. It seems to me a slightly intelligent system could easily be implemented where as long as the class in the generic collection was a final type, that the type was remembered. Even if that was the only type of type erasure implemented it would provide immense benefits casting List<String> for instance instead of List<?>, which is so useless as to be worthless,Basque
@ggb667 A type system that works only in a few cases, is not worth the effort. When I write List<String> l = List.of("foo", "bar");, I’m invoking a generic method whose implementation code instantiates a generic List, without a hint that the type is the final type String. What if I write List<Object> l = List.of("foo", "bar"); instead? Exactly the same code, but now, the code within the of method should magically determine that this is not a list with a final element type? How many levels should the magic propagate? stream.map(f).collect(toList()), final type or not?Coeternal
Is a system that works in NONE of the cases better? The benefit desired (IMHO) are methods which when passed different types are handled intelligently, not that they are generic intrinsically. Casts defeat the entire purpose. What we have is (effectively) List<?> - which is what many dislike intensely.Basque
@ggb667 You don’t have List<?>, you have a tool for compile-time checking of the correctness of generic code. The compiled code only has a List. You want some kind of uber-reflection feature, which generics were never meant for. Implementing such a feature would be a lot of heavy work, which the Java core developers won’t do just for the sake of handling a few corner cases. There is no point in making further attempts to convince me, I’m just telling the status quo.Coeternal
Given that C# reifies generics, presumably it must also deal with the overhead of "hundreds or thousands generic instantiations of thousands of generic classes". Yet, AFAIK, its performance and memory usage are comparable to Java. So perhaps this concern is overstated?Openhanded
@PaulCarey Does it have an equivalent of Function.identity()? When I do Function<String,String> f1 = Function.identity(); Function<Integer,Integer> f2 = Function.identity(); Function.identity().getType(); what will it provide?Coeternal
@Coeternal I have no real knowledge of C#; I'm making a black-box type observation that the presence of reified generics in a widely used language suggests that the runtime overhead objection may be successfully surmounted.Openhanded
Also, if you've ever used jackson's TypeReference<T>, you should know that it's sole purpose is to cheat out type erasure making generic type to be present at runtime, creating - whoopsie! - not an object with meta information about type, but a completely new type. And every lambda does the same, by the way.Priggish
A
19

To some extent erasure will be removed in the future with project valhalla to enable specialized implementations for value types.

Or to put it more accurately, type erasure really means the absence of type specialization for generics, and valhalla will introduce specialization over primitives.

Specifically I'm asking if there are any technical reasons why type erasure couldn't be removed in the next version of the JVM

Performance. You don't have to generate specialized code for all combinations of generic types, instances or generated classes don't have to carry type tags, polymorphic inline caches and runtime type checks (compiler-generated instanceof checks) stay simple and we still get most of the type-safety through compile-time checks.

Of course there are also plenty of downsides, but the tradeoff has already been made, and the question what would motivate the JVM devs to change that tradeoff.

And it might also be a compatibility thing, there could be code that performs unchecked casts to abuse generic collections by relying on type erasure that would break if the type constraints were enforced.

Ammoniate answered 27/6, 2016 at 15:31 Comment(3)
Why does having values of generic type parameters at runtime imply specialization?Diplopia
Your last paragraph is very insightful. I've heard that type erasure breaks encapsulation, but I haven't yet found an example of that. Maybe by abusing type erasure by casting to a raw collection it would be possible to create a type and then cast the object to that in such a way so one could get access to otherwise encapsulated object state. Or maybe using reflection. I don't know.Bernie
@IgorDonin to name a real life example, when you do streamOfStrings.collect(Collectors .groupingBy(someFunction, Collectors.joining())), the underlying implementation will use the same HashMap which is eventually returned as Map<…, String> to store the intermediate StringBuilder instances, which are converted to the result strings in a final step. This is a violation of the generic type system, to avoid having to copy the entire Map<…,StringBuilder> to another Map<…,String> at the end. This is the kind of code that would break if generic types were suddenly enforced at runtime.Coeternal
P
5

Your understanding of backwards compatibility is wrong.

The desired goal is for new JVM's to be able to run old library code correctly and unchanged even with new code. This allows users to upgrade their Java versions reliably even to much newer versions than the code was written for.

Pga answered 27/6, 2016 at 14:10 Comment(4)
How would removing type erasure hinder this? Old code wouldn't rely on type information (because it doesn't exist) and new code would only be run on the new JVM. Everyone could upgrade seamlessly.Carencarena
@Carencarena - It wouldn't. However you use your bogus idea of backwards compatibility as the justification for throwing out >>real<< backwards compatibility. Also, your bold assertion that "everyone could seamlessly upgrade" requires a lot more justification. I for one DO NOT believe it.Serranid
@StephenC type erasure is subtractive--if your program assumes type information has been erased, it will run fine so long as there is a java.util.List[Object] in the classpath. Adding more type information might mean something like adding specializations, so the JVM would be aware of e.g. java.util.List[Integer] and could avoid runtime type-checking. It is possible to get this information at runtime by using reified types (as scala does, see docs.scala-lang.org/overviews/reflection/…). Just like how you can always get information available at compile...Carencarena
time at runtime if you are determined enough (in the case of scala, by having the compiler add it all in for you). I'm not sure this would be a good idea, but my question was about if there were any technical barriers to doing it. And the answers given thus far have been enlightening in that respect.Carencarena

© 2022 - 2024 — McMap. All rights reserved.