Sorting ArrayList can fail when using list in comparator. Is this documented?
Asked Answered
A

1

9

ArrayLists seem to be sorted with TimSort where the underlying list is not always consistent during sorting. It can happen that list entries disappear or appear twice upon calling the comparator.

In our comparator we are comparing keys for which we are using a function to get a value to compare for this key. As this function is used in other contexts we have a test whether the key is actually present in the list (something that wouldn't be necessary in the sorting):

        if (keys.contains(itemId)) {
          ...

As keys is the list we are sorting it can happen in the comparator that a key is not found in the list due to the internal mechanics of TimSort.

Question: Is this mentioned somewhere in Javadoc (couldn't find it) that you are not supposed to access underlying list in Comparator? Is this a poor implementation of TimSort that should sort a copy? Or was it a stupid idea in the first place to access underlying list in comparator?


The program below, provided by T.J. Crowder, demonstrates that the contents of the underlying list may not be consistent during calls to the Comparator. (This program demonstrates the phenomenon in question, but it isn't representative of the actual application that's affected by the issue.)

import java.util.*;

public class Example {
    private static String[] chars = {
        "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"
    };

    private List<String> list;
    private String[] entries;

    private Example() {
        this.entries = new String[1000];
        for (int n = 0; n < 1000; ++n) {
            this.entries[n] = chars[n % chars.length] + n;
        }
        // Ensure it's an ArrayList, specifically
        this.list = new ArrayList<String>(Arrays.asList(this.entries));
    }

    public static void main(String[] args) {
        (new Example()).run();
    }

    class ListComparator implements Comparator<String> {
        public int compare(String a, String b) {
            for (String s : entries) {
                int i1 = Example.this.list.indexOf(s);
                if (i1 == -1) {
                    System.out.println(s + ": Missing");
                } else {
                    int i2 = Example.this.list.lastIndexOf(s);
                    if (i2 != i1) {
                        System.out.println(s + ": Duplicated, at " + i1 + " and " + i2);
                    }
                }
            }
            return a.compareTo(b);
        }
    }

    private void run() {
        this.list.sort(new ListComparator());
    }
}

Here are the first few lines of output from a run:

b1: Missing
a52: Duplicated, at 2 and 32
b27: Missing
a52: Duplicated, at 2 and 32
c2: Missing
a52: Duplicated, at 2 and 32
c2: Missing
c28: Missing
a52: Duplicated, at 2 and 32
b53: Duplicated, at 5 and 33
c28: Missing
d29: Missing
a52: Duplicated, at 2 and 32
b53: Duplicated, at 5 and 33
d3: Missing
d29: Missing
a52: Duplicated, at 2 and 32
b53: Duplicated, at 5 and 33
d3: Missing
d29: Missing
e30: Missing
Azikiwe answered 6/1, 2019 at 11:37 Comment(9)
Your question is not clear. Can you post the code in which you are sorting the ArrayList, as well as explain how is the keys.contains(itemId) statement related to the ArrayList you are trying to sort?Mauriac
The sort performed thanks to a Comparator will never delete elements in a List but if you explicitly remove them in the Comparator implementation. While it could be the case in a TreeSet for objects with the same rank in terms of comparison. If you don't understand the behavior, please provide a minimal, short reproducible example.Tempera
Using the list you are trying to sort in the comparator you are using to sort it just sounds like a bad idea anyway. Even if the behaviour you describe didn't happen, I can imagine it leading to all sorts of other inconsistencies in the total ordering.Keeling
It's hard to prove a negative. I don't see anything in sort that says "don't access the list from a comparator" (nor in Comparator), but sort does document that it uses TimSort, and links to a description of it, and if you read that closely enough it does seem to suggest that runs are sorted and then merged in, which could lead to temporary inconsistency.Ambulate
But as @Andy points out above: accessing the list during the sort just feels wrong. (Which doesn't mean it shouldn't be documented that if you do, it may be inconsistent.)Ambulate
@T.J.Crowder I think if you read between the lines it kinda does. sort sorts according to the Comparator, and Comparator imposes a total ordering. If you change that total ordering relation part-way through, it is reasonable to assume that has the potential to invalidate any work already done. But maybe this is simply a post-hoc reading on my part, because I think you shouldn't be doing this.Keeling
@T.J.Crowder Thanks for adding a sample.Azikiwe
@AndyTurner I mean it would certainly be a bad idea to change the ordering or the content of the underlying list but just to access its elements and assuming they are all still there isn't too far fetched in my opinion or the documentation should at least say so. To sort keys with which you need to access values is kind of a legitimate scenario.Azikiwe
If I were documenting the JDK, I would definitely add a something along the lines of "The sort algorithm may temporarily create an inconsistent list while running. Accessing the list during the sort (for instance, from a comparator) is not recommended as doing so may observe the inconsistent state, in the form of apparently-duplicated or -missing entries."Ambulate
M
3

A bit of history here: in JDK 7, the TimSort algorithm replaced the previous "legacy merge sort" algorithm. In JDK 8, Collections.sort() delegates to the new default method List.sort(). This default method is overridden by ArrayList, which does the sort in-place. The previous Collections.sort() implementation would copy the list to a temporary array, perform the sort on that temporary array, and then copy the elements from the temporary array back to the original list.

If the sort comparator looks in the list being sorted, then its behavior will certainly be affected by the new in-place sorting behavior of ArrayList introduced in JDK 8. The change from the "legacy merge sort" to TimSort in JDK 7 probably has no impact in this case, since JDK 7 still did the sorting on a temporary copy.

The copy-sort-copyback behavior of List.sort() is described in an "Implementation Requirements" section which specifies the behavior of the default method implementation, but which isn't part of the interface's contract that is imposed on all implementations. Thus, ArrayList (and other subclasses) are free to change this behavior. I note that there is no documentation for the overriding implementation ArrayList.sort(). I suppose it would be a small improvement if some documentation were added that specified the in-place sorting behavior.

If the in-place sorting of ArrayList is a problem, you could copy the list before sorting it:

List<Key> list = ... ;
List<Key> newList = new ArrayList<>(list);
newList.sort(keyComparator); // uses the old list
list = newList;

Alternatively, perhaps you could give some more details about the way the comparator works, and we might be able to figure out a way to rewrite it so that it doesn't need to look in the list being sorted. (I'd suggest asking another question for this.)

Melisandra answered 8/1, 2019 at 1:44 Comment(8)
Interesting points. The in-place sorting wouldn't be a problem if all elements remained in the list. As mentioned above one shouldn't of course change the order or the content of the list in the comparator but accessing the elements can be necessary if we are sorting keys rather than values. Once you know about the behavior it's easy to fix. As you suggested too I just copy the list before sorting.Azikiwe
@ThomasM I’m curious; what does sorting keys instead of values entail? Also, the contains() method does a linear search so it would seem like it would make the comparator, and thus the sort, run extremely slowly. I guess I’m still confused about what this comparator does.Melisandra
We are sorting columns of a listbox so after the sorting the listbox is refreshed based on the newly ordered list of keys. The comparator is instantiated with the name of the column and the sort order. When comparing the compare function gets two keys and the comparator gets the values of the respective listbox column to compare. Does that make sense now?Azikiwe
@ThomasM Thanks. It's starting to make sense what you're trying to do. But I'm still sort of guessing about why the comparator needs to look into the original list in order to perform the comparison. Is a key some kind of unique ID for an entry in the listbox? Isn't there some kind of row data structure that maintains the association between the key and the various column values?Melisandra
It is a generic viewing web solution for various archiving systems (IBM OnDemand, Oracle WCC, CSV files, generic JDBC databases, etc.). For all these systems there are so called adapters which are also responsible for holding the data. The core viewer does not know how the data is stored and is not making a local copy. When the user wants to view certain data he gets a dynamically created search form. The search is executed within the adapter which returns a list of opaque keys. Whenever the core needs data it uses these opaque keys to obtain the data from the adapter.Azikiwe
@ThomasM Sorry, I'm still not quite getting it. You need to sort a list of keys, and there's some way of getting the column data corresponding to a particular key. The typical way to do that is for the comparator to be something like Comparator.comparing(key -> key.getDataFor(columnName)). I don't see where the need to go back into the original list arises. That said, if you're satisified with the copying workaround, great. If you want to avoid this workaround by improving the comparator, let me know.Melisandra
you're right. The problem lies in using a function for obtaining the value that is used in another context too. So the function is like doc = container.getDoc(key); value = doc.getValue(property); The getDoc(key) function is used in another context too where it seemed appropriate for getDoc to first test whether the key is even in the (original) list. Something that is unnecessary for our sorting. So the whole problem comes from that test.Azikiwe
@ThomasM Ah! OK, I get it now, thanks. You had said as much in the original question but I didn't really understand what that meant until now. Well, I hope you get the opportunity to do some refactoring at some point.Melisandra

© 2022 - 2024 — McMap. All rights reserved.