I was tinkering around with Python's set
and frozenset
collection types.
Initially, I assumed that frozenset
would provide a better lookup performance than set
, as its immutable and thus could exploit the structure of the stored items.
However, this does not seem to be the case, regarding the following experiment:
import random
import time
import sys
def main(n):
numbers = []
for _ in xrange(n):
numbers.append(random.randint(0, sys.maxint))
set_ = set(numbers)
frozenset_ = frozenset(set_)
start = time.time()
for number in numbers:
number in set_
set_duration = time.time() - start
start = time.time()
for number in numbers:
number in frozenset_
frozenset_duration = time.time() - start
print "set : %.3f" % set_duration
print "frozenset: %.3f" % frozenset_duration
if __name__ == "__main__":
n = int(sys.argv[1])
main(n)
I executed this code using both CPython and PyPy, which gave the following results:
> pypy set.py 100000000
set : 6.156
frozenset: 6.166
> python set.py 100000000
set : 16.824
frozenset: 17.248
It seems that frozenset
is actually slower regarding the lookup performance, both in CPython and in PyPy. Does anybody have an idea why this is the case? I did not look into the implementations.
set
has too. – Compactiontimeit
module for proper timing experiments. Try with numbers not in either set too.frozenset
andset
share the same implementation, so the timing differences you see are entirely local to your test. – Effable