Iteration order of HashSet
Asked Answered
L

9

24

If every object added to a java.util.HashSet implements Object.equals() and Object.hashCode() in a deterministic fashion, is the iteration order over the HashSet guaranteed to be identical for every identical set of elements added, irrespective of the order in which they were added?

Bonus question: what if the insertion order is identical as well?

(Assuming Sun JDK6 with same HashSet initialization.)

Edit: My original question was not clear. It is not about the general contract of HashSet, but what Sun's implementation of HashSet in JDK6 offers as guarantees concerning determinism. Is it inherently non-deterministic? What influences the order used by its Iterator?

Lyontine answered 24/4, 2010 at 13:26 Comment(3)
I think Michael Borgwardt nails it: insertion-order will effect collision behaviour. Péter Török's point about initialization (e.g. size and load-factor) are important, too. Other than that it's going to be deterministic. Same JVM, same initialization, same order? How could it possibly NOT be deterministic? I've looked at the JDK6 code and it's clearly deterministic - no use of Math.random() in there!!!Dysthymia
It's possible to write deterministic programs which use Math.random(). Same holds true for non-deterministic programs which do not use Math.random().Bias
Despite your Edit, the question that we should all want to have answered is "What behavior does Java guarantee HashSet for iteration order?", or, more specifically, "Given a specific set of elements, does Java guarantee a deterministic iteration order for HashSet?" For the OP, this is important because you cannot guarantee that your code will always be run by a specific Java JDK.Fragmentary
K
21

Absolutely not.

The insertion order directly influences the iteration order whenever you have a bucket collision:

When two elements end up in the same bucket, the first one that was inserted will also be the first one returned during iteration, at least if the implementation of collision handling and iteration is straightforward (and the one in Sun's java.util.HashMap is)

Kirman answered 24/4, 2010 at 13:39 Comment(3)
Great answer. I edited in a small bonus question: what if the insertion order remains identical as well? In other words: is there anything inherently non-deterministic in the implementation of the "standard" java.util.HashMap?Lyontine
@eljenso: I'm pretty sure there isn't - but I don't see how to prove that conclusively.Kirman
@Lyontine if there isn't today, there might be tomorrow, if the spec (Hashmap doc) doesn't say otherwise.Evanthe
P
12

There is no "official" guarantee for anything like this. I would say it is most probably true for instances of the same HashSet implementation, initialized the same way. But I have seen cases for the iteration order being different between Java 5 and 6, for example.

Also, it may be different for instances of the same HashSet implementation, initialized with different size, due to rehashing. I.e. if you have 100 elements and two sets, one initialized with a size greater than 100, the other with a much smaller size, the second one will get reallocated and its elements rehashed several times while filling up. This may result in elements mapped to the same bucket being added (and thus iterated over) in different order.

In Java4 and later, you have LinkedHashSet which guarantees that the iteration order will be the order in which its elements were inserted.

Ponzo answered 24/4, 2010 at 13:30 Comment(0)
A
10

Wanted to confirm / upvote earlier comments. In short, Do Not Rely on HashSet iteration in consistent order. This can and will introduce bugs in your system.

We just found and fixed a bug where the iteration order was inconsistent in HashSet even with:

  • Identical insertion order.
  • Objects of a class with a valid equals() and hashCode() method.

And fixed it by using LinkedHashSet.

Thanks to the earlier posters :)

Apocalypse answered 5/11, 2010 at 21:2 Comment(2)
Here's a further discussion indicating that the GC being in a separate thread can introduce unpredictability even in completely "deterministic" scenarios: #4419396Rogatory
+1 for commenting on results even when using identical insertion order, good addition to the discussionPeriodic
B
9

As per the javadoc:

This class implements the Set interface, backed by a hash table (actually a HashMap instance). It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. [...] The iterators returned by this class's iterator method are fail-fast: if the set is modified at any time after the iterator is created

And the method iterator:

Returns an iterator over the elements in this set. The elements are returned in no particular order.

So I don't think you can make such an assumption.

Breadnut answered 24/4, 2010 at 13:29 Comment(1)
My original question was not clear. Sorry for that. Your answer is correct though in a general sense.Lyontine
R
4

Never ever make assumptions about the iteration order of anything you put into a HashSet because its contract explicitly says that you can't count on it in any way. Use LinkedHashSet if you want to maintain insertion order or TreeSet if you want to maintain a natural sorting order.

Respectable answered 24/4, 2010 at 13:42 Comment(0)
N
2

The order objects appear will depend on the final number of buckets of the HashSet. By changing the load factor and/or initial capacity you can change the order the elements end up in.

In the following example, you can see these confirguations each result in a different order.

public static void main(String...args) throws IOException {
    printOrdersFor(8, 2);
    printOrdersFor(8, 1);
    printOrdersFor(8, 0.5f);
    printOrdersFor(32, 1f);
    printOrdersFor(64, 1f);
    printOrdersFor(128, 1f);
}

public static void printOrdersFor(int size, float loadFactor) {
    Set<Integer> set = new HashSet<Integer>(size, loadFactor);
    for(int i=0;i<=100;i+=10) set.add(i);
    System.out.println("new HashSet<Integer>("+size+", "+loadFactor+") adding 0,10, ... 100 => "+set);
}

prints

new HashSet<Integer>(8, 2.0) adding 0,10, ... 100 => [0, 50, 100, 70, 40, 10, 80, 20, 90, 60, 30]
new HashSet<Integer>(8, 1.0) adding 0,10, ... 100 => [0, 50, 100, 70, 20, 80, 10, 40, 90, 30, 60]
new HashSet<Integer>(8, 0.5) adding 0,10, ... 100 => [0, 100, 70, 40, 10, 50, 20, 80, 90, 30, 60]
new HashSet<Integer>(32, 1.0) adding 0,10, ... 100 => [0, 100, 70, 40, 10, 50, 80, 20, 90, 60, 30]
new HashSet<Integer>(64, 1.0) adding 0,10, ... 100 => [0, 70, 10, 80, 20, 90, 30, 100, 40, 50, 60]
new HashSet<Integer>(128, 1.0) adding 0,10, ... 100 => [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
Neomineomycin answered 5/11, 2010 at 21:37 Comment(0)
W
1

No, this is not guaranteed.

First, different JVM may implement the HashSet algorithm differently (as long as it complies with the HashSet specification) so you will get different results on different JVMs.

Second, the algorithm may rely on non-deterministic factors when it builds the different buckets (part of the hash-table algorithm).

Weywadt answered 24/4, 2010 at 13:29 Comment(1)
I'm using the same JVM. I specifically mention that all hash codes are deterministic (i.e. Object.hashCode() is always overrided in a meaningful and deterministic way).Lyontine
S
0

I am sure that the Java developers want you to assume the answer is "no". In particular, for hash tables, why would they make it slower for everyone else who doesn't need this property to guarantee that objects whose hashes clash (identical hashCode % size) are observed in the same order regardless of the order in which they were put in?

Sherikasherill answered 24/4, 2010 at 13:31 Comment(0)
L
0

Such assumption cannot be made. The javadoc says that:

This class implements the Set interface, backed by a hash table (actually a HashMap instance). It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time.

The closest you can get is to use a LinkedHashSet, which maintains the insertion order.

Lowpressure answered 24/4, 2010 at 13:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.