A supposed optimization made the code over twice as slow.
I counted how often a value x
occurs in a sorted list a
by finding the range where it occurs:
from bisect import bisect_left, bisect_right
def count(a, x):
start = bisect_left(a, x)
stop = bisect_right(a, x)
return stop - start
But hey, it can't stop before it starts, so we can optimize the second search by leaving out the part before start (doc):
def count(a, x):
start = bisect_left(a, x)
stop = bisect_right(a, x, start)
return stop - start
But when I benchmarked, the optimized version took over twice as long:
254 ms ± 1 ms original
525 ms ± 2 ms optimized
Why?
The benchmark builds a sorted list of ten million random ints from 0 to 99999, and then counts all different ints (just for benchmarking, no use to point out Counter
) (Try it online!):
import random
from bisect import bisect_left, bisect_right
from timeit import repeat
from statistics import mean, stdev
def original(a, x):
start = bisect_left(a, x)
stop = bisect_right(a, x)
return stop - start
def optimized(a, x):
start = bisect_left(a, x)
stop = bisect_right(a, x, start)
return stop - start
a = sorted(random.choices(range(100_000), k=10_000_000))
unique = set(a)
def count_all():
for x in unique:
count(a, x)
for count in original, optimized:
times = repeat(count_all, number=1)
ts = [t * 1e3 for t in sorted(times)[:3]]
print(f'{round(mean(ts)):4} ms ± {round(stdev(ts)):2} ms ', count.__name__)
stop = bisect_right(a, x, start-start)
so it would pass an explicit second argument. That made it a tiny bit slower, but still nothing likeoptimized
. – Skimmerint
s with the same value are the same object, and that they're generated in order all at once (so they're likely mostly contiguous in memory). All you do is changerange(100_000)
totuple(range(100_000)
. Doing that improves "original" by ~34%, and "optimized" by ~61%. Reducing the impact of the cache improves optimized more, implying it's in fact a cache miss issue. – Haynor