Python timeit module execution confusion
Asked Answered
A

2

6

I'm trying to use the timeit module in Python (EDIT: We are using Python 3) to decide between a couple of different code flows. In our code, we have a series of if-statements that test for the existence of a character code in a string, and if it's there replace it like this:

if "<substring>" in str_var:
    str_var = str_var.replace("<substring>", "<new_substring>")

We do this a number of times for different substrings. We're debating between that and using just the replace like this:

str_var = str_var.replace("<substring>", "<new_substring>")

We tried to use timeit to determine which one was faster. If the first code-block above is "stmt1" and the second is "stmt2", and our setup string looks like

str_var = '<string><substring><more_string>',

our timeit statements will look like this:

timeit.timeit(stmt=stmt1, setup=setup)

and

timeit.timeit(stmt=stmt2, setup=setup)

Now, running it just like that, on 2 of our laptops (same hardware, similar processing load) stmt1 (the statement with the if-statement) runs faster even after multiple runs (3-4 hundredths of a second vs. about a quarter of a second for stmt2).

However, if we define functions to do both things (including the setup creating the variable) like so:

def foo():
    str_var = '<string><substring><more_string>'
    if "<substring>" in str_var:
        str_var = str_var.replace("<substring>", "<new_substring>")

and

def foo2():
    str_var = '<string><substring><more_string>'
    str_var = str_var.replace("<substring>", "<new_substring>")

and run timeit like:

timeit.timeit("foo()", setup="from __main__ import foo")
timeit.timeit("foo2()", setup="from __main__ import foo2")

the statement without the if-statement (foo2) runs faster, contradicting the non-functioned results.

Are we missing something about how Timeit works? Or how Python handles a case like this?

edit here is our actual code:

>>> def foo():
    s = "hi 1 2 3"
    s = s.replace('1','5')

>>> def foo2():
    s = "hi 1 2 3"
    if '1' in s:
        s = s.replace('1','5')


>>> timeit.timeit(foo, "from __main__ import foo")
0.4094226634183542
>>> timeit.timeit(foo2, "from __main__ import foo2")
0.4815539780738618

vs this code:

>>> timeit.timeit("""s = s.replace("1","5")""", setup="s = 'hi 1 2 3'")
0.18738432400277816
>>> timeit.timeit("""if '1' in s: s = s.replace('1','5')""", setup="s = 'hi 1 2 3'")
0.02985000199987553
Assonance answered 6/1, 2014 at 22:17 Comment(10)
Can you post your exact testing code?Ballyrag
@Ballyrag I've added our exact testing code in my post above.Assonance
That's odd. For me, the first is slower than the second for both tests. Which version of Python are you using?Ballyrag
I rearranged things a little bit to keep the tests the same. I may have done that before you tried testing things. If that's not the case, then I'm not sure what's going on!Assonance
I see basically the same thing as the OP. When not using the foo function method, the if statement method is always around 0.06, but the non-if method is around 0.3. When I do use the foo functions, then, in that case, the if statement method is around 0.61 and the non-if method is around 0.53. (Those are the means from using timeit 10 times for each of the four possibilities.) I'm on a pretty fast Desktop machine using IPython with Python 2.7.3.Calctufa
I should mention this is Python 3. I will add that to the question, I forgot to include it.Assonance
I would note that the run time goes up for both with the function version. Is there some kind of function parsing overhead going on here? Since the second function is longer, could greater parsing overhead explain it? Out of curiosity, try changing your functions so that they take the string as a parameter instead of setting it to a variable internally and see what happens.Erida
foo2 still ends up taking longer, though it is down to a far smaller difference (.311 for foo vs .326 for foo2).Assonance
Hm. Now that I think about it, the really weird thing is that the if one is shorter at all. It's actually doing more work than the replace alone since the if test is True (so the replace runs in both cases). The only way the if can save you time is if the string does not contain the string-to-replace, thereby saving you some time in the replace call itself, and that should only be true if in has quicker time order than replace. I don't know why it would make much of a difference, but you're using a different string-to-replace in the non-function with if version.Erida
@Erida I've used the same string as well. I should have copied the versions that use use all the same string, but even with the same strings the not-function versions get the same results. I've changed the statements so they use the same string in the question.Assonance
E
5

I think I've got it.

Look at this code:

timeit.timeit("""if '1' in s: s = s.replace('1','5')""", setup="s = 'hi 1 2 3'")

In this code, setup is run exactly once. That means that s becomes a "global". As a result, it gets modified to hi 5 2 3 in the first iteration and in now returns False for all successive iterations.

See this code:

timeit.timeit("""if '1' in s: s = s.replace('1','5'); print(s)""", setup="s = 'hi 1 2 3'")

This will print out hi 5 2 3 a single time because the print is part of the if statement. Contrast this, which will fill up your screen with a ton of hi 5 2 3s:

timeit.timeit("""s = s.replace("1","5"); print(s)""", setup="s = 'hi 1 2 3'")

So the problem here is that the non-function with if test is flawed and is giving you false timings, unless repeated calls on an already processed string is what you were trying to test. (If it is what you were trying to test, your function versions are flawed.) The reason the function with if doesn't fair better is because it's running the replace on a fresh copy of the string for each iteration.

The following test does what I believe you intended since it doesn't re-assign the result of the replace back to s, leaving it unmodified for each iteration:

>>> timeit.timeit("""if '1' in s: s.replace('1','5')""", setup="s = 'hi 1 2 3'"
0.3221409016812231
>>> timeit.timeit("""s.replace('1','5')""", setup="s = 'hi 1 2 3'")
0.28558505721252914

This change adds a lot of time to the if test and adds a little bit of time to the non-if test for me, but I'm using Python 2.7. If the Python 3 results are consistent, though, these results suggest that in saves a lot of time when the strings rarely need any replacing. If they usually do require replacement, it appears in costs a little bit of time.

Erida answered 7/1, 2014 at 1:50 Comment(1)
Thank you! I think this makes the most sense. We are actually trying to test the assignment operator as well (as taking the replaced value and running the following operations on it is important to how we handle the data). I ran the following: >>> s = """ s = 'hi 1 2 3' if '1' in s: s = s.replace('1','5') """ >>> timeit.timeit(s) 0.26153747701027896 >>> s2 = """ s = 'hi 1 2 3' s = s.replace('1','5') """ >>> timeit.timeit(s2) 0.24757718200271484 and it seems to confirm your findings and our previous findings. Thanks!Assonance
C
0

Made even weirder by looking at the disassembled code. The second block has the if version (which clocks in faster for me using timeit just as in the OP's example).

Yet, by looking at the op codes, it purely appears to have 7 extra op codes, starting with the first BUILD_MAP and also involving one extra POP_JUMP_IF_TRUE (presumably for the if statement check itself). Before and after that, all codes are the same.

This would suggest that building and performing the check in the if statement somehow reduces the computation time for then checking within the call to replace. How can we see specific timing information for the different op codes?

In [55]: dis.disassemble_string("s='HI 1 2 3'; s = s.replace('1','4')")
          0 POP_JUMP_IF_TRUE 10045
          3 PRINT_NEWLINE
          4 PRINT_ITEM_TO
          5 SLICE+2
          6 <49>
          7 SLICE+2
          8 DELETE_SLICE+0
          9 SLICE+2
         10 DELETE_SLICE+1
         11 <39>
         12 INPLACE_MODULO
         13 SLICE+2
         14 POP_JUMP_IF_TRUE 15648
         17 SLICE+2
         18 POP_JUMP_IF_TRUE 29230
         21 LOAD_NAME       27760 (27760)
         24 STORE_GLOBAL    25955 (25955)
         27 STORE_SLICE+0
         28 <39>
         29 <49>
         30 <39>
         31 <44>
         32 <39>
         33 DELETE_SLICE+2
         34 <39>
         35 STORE_SLICE+1

In [56]: dis.disassemble_string("s='HI 1 2 3'; if '1' in s: s = s.replace('1','4')")
          0 POP_JUMP_IF_TRUE 10045
          3 PRINT_NEWLINE
          4 PRINT_ITEM_TO
          5 SLICE+2
          6 <49>
          7 SLICE+2
          8 DELETE_SLICE+0
          9 SLICE+2
         10 DELETE_SLICE+1
         11 <39>
         12 INPLACE_MODULO
         13 SLICE+2
         14 BUILD_MAP        8294
         17 <39>
         18 <49>
         19 <39>
         20 SLICE+2
         21 BUILD_MAP        8302
         24 POP_JUMP_IF_TRUE  8250
         27 POP_JUMP_IF_TRUE 15648
         30 SLICE+2
         31 POP_JUMP_IF_TRUE 29230
         34 LOAD_NAME       27760 (27760)
         37 STORE_GLOBAL    25955 (25955)
         40 STORE_SLICE+0
         41 <39>
         42 <49>
         43 <39>
         44 <44>
         45 <39>
         46 DELETE_SLICE+2
         47 <39>
         48 STORE_SLICE+1
Calctufa answered 6/1, 2014 at 23:27 Comment(2)
If I'm reading this right, then the only difference is the insertion of lines 14 to 24 in the second instruction set. Is that right?Erida
Yes. But it looks like the issue was with the way that the setup feature was being used with timeit, and not with the actual code itself.Calctufa

© 2022 - 2024 — McMap. All rights reserved.