IPython %timeit what is loop and iteration in the options?
Asked Answered
S

3

37

I am wondering about the %timeit command in IPython

From the docs:

%timeit [-n<N> -r<R> [-t|-c] -q -p<P> -o] setup_code

Options:

-n: execute the given statement times in a loop. If this value is not given, a fitting value is chosen.

-r: repeat the loop iteration times and take the best result. Default: 3

For example, if I write:

%timeit -n 250 -r 2 [i+1 for i in range(5000)]

So, -n 250 executes [i+1 for i in range(5000)] 250 times? Then what does -r 2?

Spinoff answered 5/9, 2017 at 0:22 Comment(4)
It does two runs of 250.Hearn
Why run twice the 250 runs? I didn't understand logic behind why these options are provided.Spinoff
What is unclear?Hearn
@Spinoff After reading this a while (and MSeiferts link, which is very detailed), I think the most straight forward answer is that you need r for the the std dev. If r is 1, you only get the average run time (total time / n), and the std dev is 0. If r > 1, you still get the average run time (but now it is total time / (n*r)), but you also get the std dev of r1, r2, r3, r4, where r1 = run 1 average rune time = total time of run 1 / n; r2 is the same, etcPrefabricate
S
31

It specifies the number of repeats, the number of repeats are used to determine the average. For example:

%timeit -n 250 a = 2
# 61.9 ns ± 1.01 ns per loop (mean ± std. dev. of 7 runs, 250 loops each)

%timeit -n 250 -r 2 a = 2
# 62.6 ns ± 0 ns per loop (mean ± std. dev. of 2 runs, 250 loops each)

The number of executions will be n * r but the statistic is based on the number of repeats (r) but the number of "loops" for each repeat is determined based on the number (n).

Basically you need a large enough n so the minimum of the number of loops is accurate "enough" to represent the fastest possible execution time, but you also need a large enough r to get accurate "statistics" on how trustworthy that "fastest possible execution time" measurement is (especially if you suspect that some caching could be happening).

For superficial timings you should always use an r of 3, 5 or 7 (in most cases that's large enough) and choose n as high as possible - but not too high, you probably want it to finish in a reasonable time :-)

Substantial answered 5/9, 2017 at 0:37 Comment(2)
I come back to this answer every few months and I still have no idea what r is for, that's too vague.Sovran
@Sovran I answered a similar question in more detail here. Let me know if that's less vague. :)Substantial
E
10
timeit -n 250 <statement>

The statement will get executed 3 * 250 = 750 times (-r has a default value of 3)

timeit -n 250 -r 4 <statement>

The statement will get executed 4 * 250 = 1000 times

-r - how many times to repeat the timer (in the examples above, each time the timer is called with -n 250 which means 250 executions)

Encouragement answered 5/9, 2017 at 0:44 Comment(0)
N
0

A more statistical way of explaining is as the bootstrapping estimation of the distribution of some statistics (specifically, its mean and standard deviation), in such context: "r" can be seen as the number of samples and "n" as the size of each sample.

Nazler answered 2/6, 2020 at 12:33 Comment(1)
Are you implying that taking a standard deviation of the samples (each consisting of n runs) would yield a more accurate estimate of the standard deviation than just taking a sample standard deviation over all (nr) of the runs? Otherwise I don't see why one would want to split the results into r samples, rather than just basing the inference on all nr runs. It seems to me that the real reason is the resolution of the timer itself, but let me know there's another, statistical reason.Psychobiology

© 2022 - 2024 — McMap. All rights reserved.