ArrayLists seem to be sorted with TimSort where the underlying list is not always consistent during sorting. It can happen that list entries disappear or appear twice upon calling the comparator.
In our comparator we are comparing keys for which we are using a function to get a value to compare for this key. As this function is used in other contexts we have a test whether the key is actually present in the list (something that wouldn't be necessary in the sorting):
if (keys.contains(itemId)) {
...
As keys is the list we are sorting it can happen in the comparator that a key is not found in the list due to the internal mechanics of TimSort.
Question: Is this mentioned somewhere in Javadoc (couldn't find it) that you are not supposed to access underlying list in Comparator? Is this a poor implementation of TimSort that should sort a copy? Or was it a stupid idea in the first place to access underlying list in comparator?
The program below, provided by T.J. Crowder, demonstrates that the contents of the underlying list may not be consistent during calls to the Comparator. (This program demonstrates the phenomenon in question, but it isn't representative of the actual application that's affected by the issue.)
import java.util.*;
public class Example {
private static String[] chars = {
"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"
};
private List<String> list;
private String[] entries;
private Example() {
this.entries = new String[1000];
for (int n = 0; n < 1000; ++n) {
this.entries[n] = chars[n % chars.length] + n;
}
// Ensure it's an ArrayList, specifically
this.list = new ArrayList<String>(Arrays.asList(this.entries));
}
public static void main(String[] args) {
(new Example()).run();
}
class ListComparator implements Comparator<String> {
public int compare(String a, String b) {
for (String s : entries) {
int i1 = Example.this.list.indexOf(s);
if (i1 == -1) {
System.out.println(s + ": Missing");
} else {
int i2 = Example.this.list.lastIndexOf(s);
if (i2 != i1) {
System.out.println(s + ": Duplicated, at " + i1 + " and " + i2);
}
}
}
return a.compareTo(b);
}
}
private void run() {
this.list.sort(new ListComparator());
}
}
Here are the first few lines of output from a run:
b1: Missing a52: Duplicated, at 2 and 32 b27: Missing a52: Duplicated, at 2 and 32 c2: Missing a52: Duplicated, at 2 and 32 c2: Missing c28: Missing a52: Duplicated, at 2 and 32 b53: Duplicated, at 5 and 33 c28: Missing d29: Missing a52: Duplicated, at 2 and 32 b53: Duplicated, at 5 and 33 d3: Missing d29: Missing a52: Duplicated, at 2 and 32 b53: Duplicated, at 5 and 33 d3: Missing d29: Missing e30: Missing
keys.contains(itemId)
statement related to the ArrayList you are trying to sort? – MauriacComparator
will never delete elements in a List but if you explicitly remove them in the Comparator implementation. While it could be the case in aTreeSet
for objects with the same rank in terms of comparison. If you don't understand the behavior, please provide a minimal, short reproducible example. – Temperasort
that says "don't access the list from a comparator" (nor inComparator
), butsort
does document that it uses TimSort, and links to a description of it, and if you read that closely enough it does seem to suggest that runs are sorted and then merged in, which could lead to temporary inconsistency. – Ambulatesort
sorts according to theComparator
, andComparator
imposes a total ordering. If you change that total ordering relation part-way through, it is reasonable to assume that has the potential to invalidate any work already done. But maybe this is simply a post-hoc reading on my part, because I think you shouldn't be doing this. – Keelingsort
algorithm may temporarily create an inconsistent list while running. Accessing the list during the sort (for instance, from a comparator) is not recommended as doing so may observe the inconsistent state, in the form of apparently-duplicated or -missing entries." – Ambulate