Why is set order maintained sometimes?
Asked Answered
H

2

5

When running this code the results change as expected since a set is unordered:

my_set_1 = {'a','b','c',}
print([i for i in my_set_1])

That is, multiple runs would give different lists, e.g.

['a', 'c', 'b']
['b', 'a', 'c']
['a', 'c', 'b']
['c', 'b', 'a']

etc.

(Note: You might be getting the same result instead, if you don't have PYTHONHASHSEED=random, as suggested in the comments. Also, if you are using the console to replicate it, make sure you Rerun the console every time you run the code.)


However, when placing the above code in a for loop the results are rather surprising:

for i in range(10):
    my_set_1 = {'a','b','c',}
    print([i for i in my_set_1])
# Prints: 
# ['a', 'c', 'b']
# ['a', 'c', 'b']
# ['a', 'c', 'b']
# ....

A single run of the for loop will print the same list. Rerunning the for loop can print a different list (e.g. ['c', 'b', 'a']) but it will still be printed 10 times without changing.

Why doesn't it change?

Hubble answered 23/8, 2015 at 8:58 Comment(10)
My tests seem to show that set does keep order between runs (same items, unmodified), it just doesn't guarantee a specific order. Can not reproduce your first statement of set yielding different ordering across list comprehension runs.Piles
@ReutSharabani Are you running that code in the console? If so try this code inside a module and run the module.Hubble
The order in set is arbitrary.Indiscreet
Can you provide one piece of code that shows the problem?Piles
@ReutSharabani What do you mean? Isn't the code above sufficient to reproduce the problem? Perhaps I didn't explain it correctly; if so, let me know so that i can rephrase it. Also, did you insert this code in a module and run it? Didn't it produce the described results?Hubble
They don't have PYTHONHASHSEED=random, that's why they can't reproduce. Probably you can note it in your answer.Camarena
I did as suggested and ran it in the REPL as well as in a module. This did not reproduce your problem. Post a single piece of code demonstrating the problem. Currently there are two separate pieces and both act like the second.Piles
Python guarantees that the order be consistent within each run, but not between. Other's may happen to see consistency between runs, but that only by coincidence (another version or another environment).Vanover
@ReutSharabani On Python 3.3 and greater, hash randomization is turned on by default. Could that be the reason you cant reproduce? If so, let me know so that I can edit the question accordingly.Hubble
I'm on python3.4 / windows. It may be random, but the probability of the same order across 10 runs is around (1/(setlength!))^10.Piles
C
7

@ReblochonMasque has a correct point: set is based on the hash table and if hashes calculated between runs are the same, you'll have the same order between runs. However such behaviour is vulnerable to attacks.

To prevent these attacks special variable PYTHONHASHSEED was introduced. When it is set to random each run Python will generate different hashes for the same items. That's why you get different order.

To check this, you can run your program with PYTHONHASHSEED set to the same number. Order will be the same among runs.

$ export PYTHONHASHSEED=random
$ python t.py
['a', 'b', 'c']
$ python t.py
['a', 'c', 'b']
$ python t.py
['c', 'b', 'a']
$ export PYTHONHASHSEED=4
$ python t.py
['a', 'b', 'c']
$ python t.py
['a', 'b', 'c']
$ python t.py
['a', 'b', 'c']

If you look at the object.__hash__(). There is a note at the bottom (exactly about your case):

Note By default, the __hash__() values of str, bytes and datetime objects are "salted" with an unpredictable random value. Although they remain constant within an individual Python process, they are not predictable between repeated invocations of Python.

Camarena answered 23/8, 2015 at 9:18 Comment(0)
H
4

You shall not expect that the order of a set will change; sets are unordered in a sense that the order is not an invariant i/e there is no guarantee that it will not change.

The implementation is in the form of a hash table (dictionary); as long as there are no collisions of keys, the order will likely not change, but there is no telling. It is also not possible to predict if or when that will occur.

Be careful with drawing conclusions from your experimentation: the results you get cannot be predicted and will depend on the state of your system at the moment you run. They also won't hold across platforms, versions of python, etc...

Hogarth answered 23/8, 2015 at 9:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.