Is there a good reason to split in two different cases the following ArrayList constructor in the OpenJDK code?
Asked Answered
I

2

12

When reading through the OpenJDK code for class ArrayList, for instance, in JDK17

(https://github.com/openjdk/jdk17/blob/master/src/java.base/share/classes/java/util/ArrayList.java)

I stumbled upon the following constructor:

public ArrayList(Collection<? extends E> c) {
    Object[] a = c.toArray();
    if ((size = a.length) != 0) {
        if (c.getClass() == ArrayList.class) {
            elementData = a;
        } else {
            elementData = Arrays.copyOf(a, size, Object[].class);
        }
    } else {
        // replace with empty array.
        elementData = EMPTY_ELEMENTDATA;
    }
}

What is the reason behind the distinction whether c.getClass() is an ArrayList.class or not? Is this case splitting necessary at all?

(I was just trying to understand Java code that is part of the OpenJDK distribution for class ArrayList.)

Indiction answered 29/11, 2023 at 22:55 Comment(2)
The code assumes that ArrayList.toArray(); does create a new array, while other implementations of Collection don't necessarily do that, but might return the internal working array. So it assumes that it can simply use that newly created array without having to copy it, but in the other cases it has to copy it, because sharing a read-write data source in multiple Collections usually bears great risk. But old code in Java is usually a mess, with 100s of classes (mostly inner ones) being unnecessary rewrites of already existing ones, and sometimes horribly slow and convoluted code.Thalia
c.getClass() could return a sub-class of ArrayList (The class ArrayList is not final and users can subclass it).Nyctophobia
P
6

Disclaimer: I was not involved in writing the code for java.util.ArrayList, nor know anyone who was.


The constructor you're looking at is a "copy constructor"; it is meant to create a shallow copy of the given collection c. That means the elements of c must be copied into the new collection, but the new collection's "structure" (the array in the case of ArrayList) must be distinct. Otherwise, structural modifications to c would also be visible via the new collection and vice versa.

This part:

if (c.getClass() == ArrayList.class) {
    elementData = a;
} else {
    elementData = Arrays.copyOf(a, size, Object[].class);
}

Ensures the new ArrayList and c do not share the same array. But ArrayList knows its own implementation of toArray() returns a distinct array. So, when c is an ArrayList, it avoids copying the array, thus avoiding unecessary work. Otherwise, if c is an instance of any other collection type, including a subtype of ArrayList (which could override toArray()), then that knowledge of toArray() can no longer be relied upon, so a defensive copy of the array is made.

That said, the contract of Collection#toArray() states:

The returned array will be "safe" in that no references to it are maintained by this collection. (In other words, this method must allocate a new array even if this collection is backed by an array). The caller is thus free to modify the returned array.

So, it would appear copying the array is not necessary, making the above code seem superfluous. But I can think of one reason why the array should be copied:

  • Security. A malicious implementation of Collection could break the contract of toArray() and keep a reference to the array, allowing an actor to unexpectedly view and/or modify the new ArrayList. Since ArrayList is a very widely used class, it's important this sort of thing is guarded against.

  • Bonus reason (from meriton's answer): Ensure the component type of the array is Object. The component type of elementData must be Object in order to work correctly with the generic nature of ArrayList. A bad implementation of Collection, malicious or otherwise, could force an ArrayStoreException by having toArray() return an array with the wrong component type. By passing Object[].class to Arrays.copyOf, the above code avoids this problem.

In both cases, however, ArrayList still knows that its own implementation of toArray() returns a distinct array with a component type of Object, so copying the array can still be avoided when c.getClass() == ArrayList.class.


Demonstration

Here is some code demonstrating the problems described by the two points above when the array is not copied correctly.

Main.java:

import java.util.Arrays;
import java.util.Objects;

public class Main {

    public static void main(String[] args) {
        MaliciousStringCollection malicious = new MaliciousStringCollection();

        MyArrayList<Object> objects = new MyArrayList<>(malicious);

        System.out.println("Current elements of 'objects':");
        for (int i = 0; i < objects.size(); i++) {
            System.out.printf("    objects[%d] = %s%n", i, objects.get(i));
        }
        System.out.println();

        // Demonstrate how not copying the array can lead to the list's array being
        // modified unexpectedly.
        System.out.println("Changing elements of 'objects' via 'malicious'. New elements of 'objects':");
        malicious.changeData();
        for (int i = 0; i < objects.size(); i++) {
            System.out.printf("    objects[%d] = %s%n", i, objects.get(i));
        }
        System.out.println();

        // Demonstrate how not ensuring the component type of the backing array is `Object` can
        // lead to an ArrayStoreException.
        System.out.println("Adding '42' to 'objects'.");
        objects.add(42); // throws ArrayStoreException
    }

    public interface MyCollection<E> {

        /**
         * @return a new array, with a component type of {@code Object}, that contains
         *         the elements of this collection in encounter order
         */
        Object[] toArray();

        /**
         * @return the size of this collection
         */
        int size();
    }

    public static class MyArrayList<E> implements MyCollection<E> {

        private Object[] data;
        private int size;

        public MyArrayList(MyCollection<? extends E> col) {
            size = col.size();
            /*
             * Fails two things:
             * 
             *     1. To ensure the component type of 'data' is 'Object'
             *     2. To ensure the array is not shared with other code.
             * 
             * Both things can be fixed by changing the code to:
             * 
             *     data = Arrays.copyOf(col.toArray(), size, Object[].class);
             */
            data = col.toArray();
        }

        public void add(E e) {
            if (data.length == size) {
                int newLength = data.length + Math.max(data.length / 2, 1);
                data = Arrays.copyOf(data, newLength);
            }
            data[size++] = e;
        }

        @SuppressWarnings("unchecked")
        public E get(int index) {
            Objects.checkIndex(index, size);
            return (E) data[index];
        }

        @Override
        public int size() {
            return size;
        }

        @Override
        public Object[] toArray() {
            return Arrays.copyOf(data, size);
        }
    }

    public static class MaliciousStringCollection implements MyCollection<String> {

        private final String[] data = {"Hello", "World"};

        @Override
        public int size() {
            return data.length;
        }

        public void changeData() {
            data[0] = "Goodbye";
        }
        
        @Override
        public Object[] toArray() {
            /*
             * Malicious implementation:
             * 
             *     1. Does not return a new, distinct array.
             *     2. Returns an array whose component type is 'String', not 'Object'.
             */
            return data;
        }
    }
}

Output:

Current elements of 'objects':
    objects[0] = Hello
    objects[1] = World

Changing elements of 'objects' via 'malicious'. New elements of 'objects':
    objects[0] = Goodbye
    objects[1] = World

Adding '42' to 'objects'.
Exception in thread "main" java.lang.ArrayStoreException: java.lang.Integer
        at Main$MyArrayList.add(Main.java:71)
        at Main.main(Main.java:29)
Prudential answered 30/11, 2023 at 3:37 Comment(6)
I think you got it right when you say that the code knows that the array a is not shared with the original Collection if it comes from another ArrayList, and otherwise it prefers to make an explicit copy of it, by means of Arrays.copyOf. On the contrary, I think that even if toArray() would return an array of wrong component type, since we have assigned it to a, which is of type Object[], that would not be an issue anymore. Probably this is what the comment in link refers to.Indiction
Arrays know their component type at run-time. Assigning a String[], for example, to an Object[] reference does not change the component type; the array is still a String[]. I added some code to my answer demonstrating this. Note that Arrays#copyOf(T[],int) will return a copy of the array with the same component type as the source array, whereas Arrays#copyOf(T[],int,Class) will return a copy of the array with the component type specified by the third argument.Prudential
I do not agree that the contract of toArray() allows to return an array of a different size than the collection. Even the number of null elements in the returned array should match the number of null elements in the collection. “Nothing in the documentation… says anything about the length of the returned array.” should be read as “Nothing in the documentation … says that returning an array of a different length was allowed.” And well, the size used in the Arrays.copyOf(…) step has been initialized from the array length anyway.Carping
@Carping I agree it's common sense that a.length == c.size(). But I find the lack of requiring that explicitly to be jarring, given how specific the documentation is in most other respects (including the fact that any null elements in the collection must be included in the array). But you're right, that point doesn't really matter here, because size is taken from a.length instead of c.size() (I overlooked that part; it's also not how I would have implemented it). I'll remove that from my answer.Prudential
The way it has been implemented is the most robust. It only requires the incoming collection to implement a single operation correctly; it even works with concurrent collections having updates while the constructor is executed, as long as the toArray operation returns a valid result, either atomically or weakly consistent.Carping
@Carping Thanks for pointing out the need to consider concurrent collections. I hadn't considered that.Prudential
D
5

If you do a git blame on that source file, you will find that this code was introduced as part of the commit 343ecd806bb050, whose commit message

8231800: Better listing of arrays

references the issue JDK-8231800 in the JDK bugtracker. Alas, this issue is not visible to the public, but googling it reveals it has been shipped with security updates to all Java versions supported at the time, providing strong evidence that this was related to a security issue (which also explains why the details of the issue haven't been made public, and why the commit message is not descriptive).

However, googling also finds an enhancement request filed by the autor of that code, which does describe the challenges this piece of code is intended to solve:

Consider new ArrayList(Collection arg). This calls arg.toArray() and makes a defensive copy of it for use use as the internal array to the ArrayList. This copy is necessary in case the arg's toArray() implementation produces an array whose class is something other than Object[].class or if it retains a reference to the returned array. (Both of these are spec violations, but they're unenforceable.)

That is, ArrayList needs its array to

  • admit elements of type E
  • not be accessible (and therefore modifiable) by other code

Implementations of Collection.toArray are specified to satisfy this:

Object[] toArray()

Returns an array containing all of the elements in this collection. If this collection makes any guarantees as to what order its elements are returned by its iterator, this method must return the elements in the same order. The returned array's runtime component type is Object.

The returned array will be "safe" in that no references to it are maintained by this collection. (In other words, this method must allocate a new array even if this collection is backed by an array). The caller is thus free to modify the returned array.

however, we can't be sure that an implementation actually does this. For instance, an implementation might do:

class StringList implements Collection<String> {
    String[] elements = new String[10];
    int size = 0;
    
    // other methods omitted

    @Override 
    Object[] toArray() {
        return Arrays.copyOf(elements, size);
    }
}

which looks totally innocent, right? But this implementation actually violates the spec, because copyOf will return a String[] rather than an Object[], so if somebody did:

    var strings = new StringList("Hello", "World");
    var objects = new ArrayList<Object>(strings);
    objects.add(42); 

That code is totally correct as far as the compile time type system is concerned, but would fail at runtime with an ArrayStoreException if ArrayList reused the array returned by StringList.toArray().

The second issue is more nefarious. Suppose you have code like this:

class AccessManager {
    final List<Permission> permissions;

    public AccessManager(List<Permission> requestedPermissions) {
        permissions = new ArrayList<>(requestedPermissions);
        verifyPermissions();
    }

    private verifyPermissions() { 
        for (var p : permissions) {
            if (!currentUser.has(p)) {
                throw new SecurityException();
            }
        }
    }

    boolean checkAccess(Permission p) {
        return permissions.contains(p);
    }
}

and some nefarious hacker declared:

class NefariousList implements Collection<Permission> {
    Object[] permissions = { new InnocentPermission() };

    @Override
    public Object[] toArray() {
        return permissions;
    }
}

and then did:

var nefarious = new NefariousList();
var accessManager = new AccessManager(nefarious);
var service = new Service(accessManager);

nefarious.permissions[0] = new AdminPermission();
service.deleteDatabase();

they could successfully delete the database, because AccessManager finds an AdminPermission in the list of verified permissions ...

(if you believe such hand-written security code with reference sharing bugs to be contrived, behold this real world example from a recent stackoverflow question)

Since core classes of the Java Language are required to work correctly even if untrusted code executes within the same JVM, ArrayList can not, in general, rely on Collection.toArray to be implemented correctly unless it has verified that the implementation is, in fact, safe, because it comes from a trustworthy JDK class.

Denison answered 30/11, 2023 at 5:44 Comment(1)
Note, in old implementations Arrays.toList("foo").toArray().getClass() == String[].class.Wanting

© 2022 - 2025 — McMap. All rights reserved.