Get average run time from `%timeit` ipython magic

Asked 12/2, 2016 at 20:14 Answered 12/2, 2016 at 20:59

Trying to time different random functions to see fastest way to choose random item from list. %timeit wants to give me "best of 3" fastest times, but because the runs are random, there's high variance in access times (grab from back of list, will be slow; grab from front, will be fast).

How do I get average across all loops, not best of?

a = [0,6,3,1,3,9,4,3,2,6]

%timeit random.choice(a)
%timeit a[random.randint(0,len(a)-1)]
%timeit a[np.random.randint(0,len(a)-1)]
%timeit np.random.choice(a,1)[0]

Currently output (acknowledging variance in timing):

%timeit random.choice(a)
The slowest run took 9.87 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 1.23 µs per loop

Update: a kludge approach:

%time for i in range(100000): random.choice(a)
%time for i in range(100000): a[random.randint(0,len(a)-1)]
%time for i in range(100000): a[np.random.randint(0,len(a)-1)]
%time for i in range(100000): np.random.choice(a,1)[0]

Fingerboard answered 12/2, 2016 at 20:14 Comment(4)

"Trying to time different random functions to see fastest way to choose random item from list. " -- This kind of (probably) preemptive micro-optimization will get you nowhere. – Cleanshaven 12/2, 2016 at 20:24

@Kay I'm simulating random walks on a network of tens of millions of nodes. I promise -- even small differences will matter a lot. At the moment, the random draws are 60% of my running time. (And no, it's not pre-emptive -- I'm profiling like crazy) – Fingerboard 12/2, 2016 at 20:29

Have you tried drawing from a numpy array rather than a list? I think that np.random.choice casts a to an array, which can be quite expensive. I see about a factor of 6 difference for a len(10) list vs array. – Joselyn 12/2, 2016 at 20:55

Hmm -- Don't think I can do that. I have a c-compiled function giving me the list as input in the actual program, I just created the a list as an example. But thanks for thought! – Fingerboard 12/2, 2016 at 21:3

You could use timeit.repeat:

import timeit
import numpy as np

reps = timeit.repeat(repeat=3, n=10000,
                     stmt="np.random.choice(a)",
                     setup="import numpy as np; a=[0,6,3,1,3,9,4,3,2,6]")

# taking the median might be better, since I suspect the distribution of times will
# be heavily skewed
avg = np.mean(reps)

One potential issue is that you are likely to run into caching effects that could render your timings less meaningful (see here). You might, for example, want to generate a new random list on each iteration using the setup= argument.

Joselyn answered 12/2, 2016 at 20:59 Comment(0)

How fast is

random_fd = open('/dev/urandom', 'rb')

a = array.array('I')
a.read(random_fd, 10**8)

get_next_rand = iter(a).next

for you? I'd just generate a huge amount of random numbers at once if that is your bottleneck.

In my aged PC:

%timeit array.array('I').read(open('/dev/urandom', 'rb'), 10**6)
1 loop, best of 3: 200 ms per loop

Cleanshaven answered 12/2, 2016 at 20:51 Comment(2)

That's a nice idea. I'll need to tweak them after -- each list I sample from is a different length -- but may still be more efficient to generate random floats from 0-1, multiple by length, and round to an integer. Thanks! – Fingerboard 12/2, 2016 at 21:5

You're welcome! Just make sure not to introduce a bias when you use modulo: https://mcmap.net/q/15753/-why-do-people-say-there-is-modulo-bias-when-using-a-random-number-generator/416224 – Cleanshaven 12/2, 2016 at 21:12

Recommended topics

Hot tags