This is more a response to Python 3.41 A set before it was closed as a duplicate.
The others are right: don't rely on the order. Don't even pretend there is one.
That said, there is one thing you can rely on:
list(myset) == list(myset)
That is, the order is stable.
Understanding why there is a perceived order requires understanding a few things:
From the top:
A hash set is a method of storing random data with really fast lookup times.
It has a backing array:
# A C array; items may be NULL,
# a pointer to an object, or a
# special dummy object
_ _ 4 _ _ 2 _ _ 6
We shall ignore the special dummy object, which exists only to make removes easier to deal with, because we won't be removing from these sets.
In order to have really fast lookup, you do some magic to calculate a hash from an object. The only rule is that two objects which are equal have the same hash. (But if two objects have the same hash they can be unequal.)
You then make in index by taking the modulus by the array length:
hash(4) % len(storage) = index 2
This makes it really fast to access elements.
Hashes are only most of the story, as hash(n) % len(storage)
and hash(m) % len(storage)
can result in the same number. In that case, several different strategies can try and resolve the conflict. CPython uses "linear probing" 9 times before doing pseudorandom probing, so it will look to the right of the slot for up to 9 places before looking elsewhere.
CPython's hash sets are stored like this:
A hash set can be no more than 60% full (note: this load factor was previously 66% it was reduced in Python 3.7). If there are 20 elements and the backing array is 30 elements long, the backing store will resize to be larger. This is because you get collisions more often with small backing stores, and collisions slow everything down.
When the backing store becomes too full, it will be automatically resized to increase the ratio of unused space (a higher ratio of unused space means it's faster to find a slot when handling a hash collision). For small sets, storage will be quadrupled in size, and for large sets (>50,000) it will be doubled in size (source).
So when you create an array the backing store is length 8. Once it is 4 full and you add an element, it will contain 5 elements. 5 > ³⁄₅·8
so this triggers a resize, and the backing store quadruples to size 32.
>>> import sys
>>> s = set()
>>> for i in range(10):
... print(len(s), sys.getsizeof(s))
... s.add(i)
...
0 216
1 216
2 216
3 216
4 216
5 728
6 728
7 728
8 728
9 728
Finally, hash(n)
just returns n
for integers (except for hash(-1)
which returns -2
because the value -1
is reserved for another usage).
So, let's look at the first one:
v_set = {88,11,1,33,21,3,7,55,37,8}
len(v_set)
is 10, so the backing store is at least 15(+1) after all items have been added. The relevant power of 2 is 32. So the backing store is:
__ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __
We have
hash(88) % 32 = 24
hash(11) % 32 = 11
hash(1) % 32 = 1
hash(33) % 32 = 1
hash(21) % 32 = 21
hash(3) % 32 = 3
hash(7) % 32 = 7
hash(55) % 32 = 23
hash(37) % 32 = 5
hash(8) % 32 = 8
so these insert as:
__ 1 __ 3 __ 37 __ 7 8 __ __ 11 __ __ __ __ __ __ __ __ __ 21 __ 55 88 __ __ __ __ __ __ __
33 ← Can't also be where 1 is;
either 1 or 33 has to move
So we would expect an order like
{[1 or 33], 3, 37, 7, 8, 11, 21, 55, 88}
with the 1 or 33 that isn't at the start somewhere else. This will use linear probing, so we will either have:
↓
__ 1 33 3 __ 37 __ 7 8 __ __ 11 __ __ __ __ __ __ __ __ __ 21 __ 55 88 __ __ __ __ __ __ __
or
↓
__ 33 1 3 __ 37 __ 7 8 __ __ 11 __ __ __ __ __ __ __ __ __ 21 __ 55 88 __ __ __ __ __ __ __
You might expect the 33 to be the one that's displaced because the 1 was already there, but due to the resizing that happens as the set is being built, this isn't actually the case. Every time the set gets rebuilt, the items already added are effectively reordered.
Now you can see why
{7,5,11,1,4,13,55,12,2,3,6,20,9,10}
might be in order. There are 14 elements, so the backing store is at least 21+1, which means 32:
__ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __
1 to 13 hash in the first 13 slots. 20 goes in slot 20.
__ 1 2 3 4 5 6 7 8 9 10 11 12 13 __ __ __ __ __ __ 20 __ __ __ __ __ __ __ __ __ __ __
55 goes in slot hash(55) % 32
which is 23:
__ 1 2 3 4 5 6 7 8 9 10 11 12 13 __ __ __ __ __ __ 20 __ __ 55 __ __ __ __ __ __ __ __
If we chose 50 instead, we'd expect
__ 1 2 3 4 5 6 7 8 9 10 11 12 13 __ __ __ __ 50 __ 20 __ __ __ __ __ __ __ __ __ __ __
And lo and behold:
>>> {1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 20, 50}
{1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 50, 20}
pop
is implemented quite simply by the looks of things: it traverses the underlying array and pops the first element, skipping over unused slots and "dummy" entries (tombstone markers from removed elements).
This is all implementation detail.
dict
s anyway.set
s remain arbitrarily ordered (and there are no plans to change that; the use cases fordict
s made the design that gets insertion-ordering more useful, and the costs are paid less frequently; forset
s it's the other way around, so insertion-orderedset
s are unlikely to ever become a thing). – Escobar