You are correct that, at the bytecode level, much information gets lost when you define and interact with generic types. Type erasure was nice for preserving compatibility: if you mostly enforce type safety at compile time, you don't need to do much at runtime, so you can reduce generic types to their 'raw' equivalents.
And that's the key: compile time verification. If you want the flexibility and type safety of generics, your compiler has to know a lot about the generic types you interact with. In many cases, you won't have the source code for those classes, so it has to get the information from somewhere. And it does: metadata. Embedded in the .class
file alongside the bytecode is wealth of information: everything the compiler needs to know you're using generic library types safely. So what kind of generics information gets preserved?
Type variables and constraints
The most basic thing a compiler needs to know in order to consume a generic type is the list of type variables. For any generic type or generic method, the names and positions of the type variables are preserved. Moreover, any constraints (upper or lower bounds) get included as well.
Generic supertype signatures
Sometimes you write a class that extends a generic class or implements a generic interface. If you write a StringList
that extends ArrayList<String>
, you inherit a lot of functionality. If someone wants to use your StringList
as intended and without the source code, it's not enough for the compiler to know that you extended ArrayList
; it has to know you extended ArrayList<String>
. This applies transitively up the hierarchy: it has to know ArrayList<>
extends AbstractList<>
, and so on. So this information gets preserved. Your class file a will include the complete generic signatures of any generic supertypes (classes or interfaces).
Member signatures
The compiler can't verify that you're using a generic type correctly if it doesn't know the full generic types of fields, method parameters and return types. So, you guessed it: that information gets included. If any part of a class member contains a generic type, wildcard, or type variable, that member will get its signature information saved in the metadata.
Local variables
It's not necessary to preserve information about local variable types in order to consume a type. It can be useful for debugging, but that's about it. There are metadata tables that can be used to record the names and types of variables, and the bytecode ranges at which they exist. Depending on the compiler, they may or may not be written by default. You can force javac
to emit them by passing -g:vars
, but I believe they're omitted by default
Call sites
One of the biggest issues for decompilers, mostly affecting generic inference within method bodies, is that call sites invoking generic methods retain no information about type arguments. That creates huge headaches for APIs like Java 8 Streams, where generic operators get chained together, each one accepting anonymously typed lambdas (which may be contravariant in their argument types and covariant in their return types). That's a type inference nightmare, but it's an issue for any code that happens to interact with generics. That kind of code doesn't become substantially harder to decompile simply because it exists within a generic type.
How this affects decompilation
Modern Java decompilers like Procyon and CFR should be able to reconstruct generic types reasonably well. If the local variable metadata is available, the results should be pretty close to the original code. If not, they'll have to try to infer generic type arguments in method bodies based on data flow analysis. Essentially, the decompiler must look at what data flows in and out of generic instantiations, and use what it knows about the type of that data to guess the type arguments. Sometimes it works really well; other times, not so much (see earlier comment about Java 8 Streams).
At the API level, though—type and member signatures—the results should be spot-on.
Caveats
Strictly speaking, all of the metadata described here is optional: it's only needed at compile time (or decompile time). If someone has run their compiled classes through an obfuscator, optimizer, or some other utility, all of this information could get stripped out. It won't make a difference at runtime.
tldr; Conclusion
Yes, it is certainly possible to decompile generic types and methods with their type parameters intact. Assuming the required metadata is present, getting the type and member signatures right is the 'easy' part. Correctly inferring the type arguments of generic instances and method invocations is the tricky bit, but that's a problem for any code that happens to interact with generics.
As mentioned, Procyon and CFR should both do a pretty decent job of restoring generic types and methods.
ArrayList.class
and get the source code for the generic typeArrayList<T>
, including the declaration ofT
? Or are you asking if, upon decompiling a method containing a variableList<String> myList
, you would see the variable typed asList<String>
in the decompiled method, as opposed to simplyList
? These are two very different possibilities. – Lordsandladies<T>
in the reverted code – Auric