If you do a git blame on that source file, you will find that this code was introduced as part of the commit 343ecd806bb050, whose commit message
8231800: Better listing of arrays
references the issue JDK-8231800 in the JDK bugtracker. Alas, this issue is not visible to the public, but googling it reveals it has been shipped with security updates to all Java versions supported at the time, providing strong evidence that this was related to a security issue (which also explains why the details of the issue haven't been made public, and why the commit message is not descriptive).
However, googling also finds an enhancement request filed by the autor of that code, which does describe the challenges this piece of code is intended to solve:
Consider new ArrayList(Collection arg). This calls arg.toArray() and makes a defensive copy of it for use use as the internal array to the ArrayList. This copy is necessary in case the arg's toArray() implementation produces an array whose class is something other than Object[].class or if it retains a reference to the returned array. (Both of these are spec violations, but they're unenforceable.)
That is, ArrayList needs its array to
- admit elements of type
E
- not be accessible (and therefore modifiable) by other code
Implementations of Collection.toArray
are specified to satisfy this:
Object[] toArray()
Returns an array containing all of the elements in this collection. If this collection makes any guarantees as to what order its elements are returned by its iterator, this method must return the elements in the same order. The returned array's runtime component type is Object.
The returned array will be "safe" in that no references to it are maintained by this collection. (In other words, this method must allocate a new array even if this collection is backed by an array). The caller is thus free to modify the returned array.
however, we can't be sure that an implementation actually does this. For instance, an implementation might do:
class StringList implements Collection<String> {
String[] elements = new String[10];
int size = 0;
// other methods omitted
@Override
Object[] toArray() {
return Arrays.copyOf(elements, size);
}
}
which looks totally innocent, right? But this implementation actually violates the spec, because copyOf
will return a String[]
rather than an Object[]
, so if somebody did:
var strings = new StringList("Hello", "World");
var objects = new ArrayList<Object>(strings);
objects.add(42);
That code is totally correct as far as the compile time type system is concerned, but would fail at runtime with an ArrayStoreException
if ArrayList
reused the array returned by StringList.toArray()
.
The second issue is more nefarious. Suppose you have code like this:
class AccessManager {
final List<Permission> permissions;
public AccessManager(List<Permission> requestedPermissions) {
permissions = new ArrayList<>(requestedPermissions);
verifyPermissions();
}
private verifyPermissions() {
for (var p : permissions) {
if (!currentUser.has(p)) {
throw new SecurityException();
}
}
}
boolean checkAccess(Permission p) {
return permissions.contains(p);
}
}
and some nefarious hacker declared:
class NefariousList implements Collection<Permission> {
Object[] permissions = { new InnocentPermission() };
@Override
public Object[] toArray() {
return permissions;
}
}
and then did:
var nefarious = new NefariousList();
var accessManager = new AccessManager(nefarious);
var service = new Service(accessManager);
nefarious.permissions[0] = new AdminPermission();
service.deleteDatabase();
they could successfully delete the database, because AccessManager
finds an AdminPermission
in the list of verified permissions ...
(if you believe such hand-written security code with reference sharing bugs to be contrived, behold this real world example from a recent stackoverflow question)
Since core classes of the Java Language are required to work correctly even if untrusted code executes within the same JVM, ArrayList
can not, in general, rely on Collection.toArray
to be implemented correctly unless it has verified that the implementation is, in fact, safe, because it comes from a trustworthy JDK class.
ArrayList.toArray();
does create a new array, while other implementations ofCollection
don't necessarily do that, but might return the internal working array. So it assumes that it can simply use that newly created array without having to copy it, but in the other cases it has to copy it, because sharing a read-write data source in multipleCollection
s usually bears great risk. But old code in Java is usually a mess, with 100s of classes (mostly inner ones) being unnecessary rewrites of already existing ones, and sometimes horribly slow and convoluted code. – Thaliac.getClass()
could return a sub-class ofArrayList
(The classArrayList
is not final and users can subclass it). – Nyctophobia