Python: any() unexpected performance

Asked 28/6, 2017 at 12:38 Answered 28/6, 2017 at 22:51

Solved python python-2.7 performance python-3.x any

I am comparing the performance of the any() built-in function with the actual implementation the docs suggest:

I am looking for an element greater than 0 in the following list:

lst = [0 for _ in range(1000000)] + [1]

This is the supposedly equivalent function:

def gt_0(lst):
    for elm in lst:
        if elm > 0:
            return True
    return False

And these are the results of the performance tests:

>> %timeit any(elm > 0 for elm in lst)
>> 10 loops, best of 3: 35.9 ms per loop

>> %timeit gt_0(lst)
>> 100 loops, best of 3: 16 ms per loop

I would expect both of the to have the exact same performance, however any() if two times slower. Why?

Anallise answered 28/6, 2017 at 12:38 Comment(5)

Have you tried with a more heterogeneouslst that doesn't start with 0? – Underhill 28/6, 2017 at 12:44

A more equivalent version would be: %timeit any(True for elm in lst if elm > 0). – Picul 28/6, 2017 at 12:46

Also is the actual implementation of any() is in Python or is that just the equivalent Python syntax? – Underhill 28/6, 2017 at 12:47

@Underhill I assume it is just the equivalent syntax? I would expect a built-in function to be implemented in C or whatever. – Anallise 28/6, 2017 at 13:53

@AshwiniChaudhary how is that different from any(elm > 0 for elm in lst)? – Anallise 28/6, 2017 at 13:53

The reason is that you've passed a generator expression to the any() function. Python needs to convert your generator expression to a generator function and that's why it performs slower. Because a generator function needs to call the __next__() method each time for generating the item and passing it to the any. This is while in a manual defined function you are passing the whole list to your function which has all the items prepared already.

You can see the difference better by using a list comprehension rather than a generator expression:

In [4]: %timeit any(elm > 0 for elm in lst)
10 loops, best of 3: 66.8 ms per loop

In [6]: test_list = [elm > 0 for elm in lst]

In [7]: %timeit any(test_list)
100 loops, best of 3: 4.93 ms per loop

Also another bottleneck in your code which has more cost than extra calls on next is the way you do the comparison. As mentioned in comment the better equivalent of your manual function is:

any(True for elm in lst if elm > 0)

In this case you're doing the comparison with the generator expression and it'll perform almost in an equal time as your manual defined function (the slightest difference is because of the generator, I guess.) For a deeper understanding of the underlying reasons read the Ashwini's answer.

Raycher answered 28/6, 2017 at 12:54 Comment(16)

in gt_0 the OP still does the comparisons in the function though – Underhill 28/6, 2017 at 13:21

I think your data is misleading. You cannot just compare %timeit any(elm > 0 for elm in lst) with %timeit any(test_list), you also need to consider the time it takes to build up the test_list. These are my results: %timeit test_list = [elm > 0 for elm in lst]; any(test_list); outputs 52.5 ms per loop, while %timeit any(elm > 0 for elm in lst) reports 38.4 ms per loop. So the generator expression is actually better. – Anallise 28/6, 2017 at 13:49

@Anallise That's not the point I'm trying to make. Of course creating the list and passing it to any is slower than just passing the generator expression to it. The point is the difference between your timings. You're passing the list to your manual function while for any you're using the generator expression. – Raycher 29/6, 2017 at 8:19

@Kasramvd oh so your point basically is that generating a new element from the generator expression with next() is more costly than simply iterating an already built list? – Anallise 29/6, 2017 at 8:21

@Anallise Yes. You can see the difference using the following example %timeit sum(i for i in lst) and %timeit sum(lst) while lst = list(range(100)). – Raycher 29/6, 2017 at 8:28

@Kasramvd ok I get it now, but I don't get why it is like that. Generating a next element with the for elm in lst generator expression is simply iterating lst which is ready to be iterated. Why is this slower than iterating it in the for loop of the gt_0 function? Isn't that precisely what a for loop does, use next()? – Anallise 29/6, 2017 at 8:41

@Anallise Yes, that's not what it seems to be. As a matter of fact these are all abstraction problems. i.e. when you're dealing with built-in functions you need to be careful of combining python codes with them. As they say a chain is as powerful as its weakest link. It means that it doesn't matter if your chain (code) is consist of a lot of strong links (C codes) the performance will be defined by its weakest links (the Python codes). – Raycher 29/6, 2017 at 9:2

%timeit sum(i for i in lst) and %timeit sum(lst) are not valid comparisons in context of this question. sum(list) basically runs at C level with 1 or 2 byte-steps at Python level. While in OP's code examples there is no such case where everything is delegated to C, in both cases we have explicit loops at Python level. – Picul 29/6, 2017 at 9:17

@AshwiniChaudhary Yes, I just mentioned that example for demonstrating the difference between performing a function on generator expression than a predefined list. – Raycher 29/6, 2017 at 9:19

@Kasramvd yeah I didn't understand your response. What I meant to ask is how is the generator expression for elm in lst any different than a for loop for elm in lst:? Why does the first have a worse performance than the second? – Anallise 29/6, 2017 at 11:46

@Anallise I didn't mention that a regular loop has a better performance than a loop in generator expression. You're mixing those things altogether. If you want to compare such thing you better to benchmark a generator expression with a regular generator function, which I don't think if there's a significant difference. The reason that passing a generator expression rather than a list to your function have different performance is because if you are passing them to a built-in function the first one's iteration will performs in C level whereas in a gen-exp it's a combination of Python and C. – Raycher 29/6, 2017 at 12:14

"I don't think if there's an extra comparison here because in both cases we are comparing the items". Where does it says it has an extra comparison? It actually prevents an extra comparison. Not sure what you're trying to explain here? And in most cases it is slightly faster than the gt_0 version. And what different C and Python calls are happening between this and any(elm > 0 for elm in lst) and that makes the latter so slow? – Picul 29/6, 2017 at 18:0

@AshwiniChaudhary The reason for mentioning the extra comparison was actually because of the miss understanding that might happens when one thinks of the structure of those two any() approaches, and your answer also had such confusion in it. I mean this part

Now we have reduced the work done by any() by adding the condition and it will have to check truthy twice only once when [...]

and what I said is that to me it doesn't seem any reduction of truthiness check in both cases because when you pass elm > 0 to the any it will evaluates the condition then calls PyObject_IsTrue. – Raycher 29/6, 2017 at 21:33

While in second case it will evaluates the condition within any() so the condition gets evaluated anyway, the difference is where it gets evaluated. I don't think if your answer is wrong though, I just tried to made some clarifications here. – Raycher 29/6, 2017 at 21:35

@Kasramvd Like I mentioned in the last section of my answer PyObject_IsTrue is called only once for each if condition because POP_JUMP_IF_FALSE receives either PyTrue and PyFalse and can decide based on that right away without calling PyObject_IsTrue on it again. Only the True will result in another PyObject_IsTrue call. So, if a list has N 0 items then any with if condition will result in N calls to PyObject_IsTrue, while OP's version results in 2N calls. – Picul 29/6, 2017 at 21:49

@AshwiniChaudhary Yes, now it makes sense and is more reasonable. Thanks for response and updating the answer. – Raycher 29/6, 2017 at 21:54

Surely a loop over a generator expression is slower compared to a list. But in this case the iteration within the generator is basically a loop over the list itself, hence the next() calls on generator basically delegate to list's next() method.

For example in this case there is no 2x performance difference.

>>> lst = list(range(10**5))

>>> %%timeit
... sum(x for x in lst)
...
100 loops, best of 3: 6.39 ms per loop

>>> %%timeit
... c = 0
... for x in lst: c += x
...

100 loops, best of 3: 6.69 ms per loop

First let's check the byte codes of both the approaches:

def gt_0(lst):
    for elm in lst:
        if elm > 0:
            return True
    return False


def any_with_ge(lst):
    return any(elm > 0 for elm in lst)

Bytecodes:

>>> dis.dis(gt_0)
 10           0 SETUP_LOOP              30 (to 33)
              3 LOAD_FAST                0 (lst)
              6 GET_ITER
        >>    7 FOR_ITER                22 (to 32)
             10 STORE_FAST               1 (elm)

 11          13 LOAD_FAST                1 (elm)
             16 LOAD_CONST               1 (0)
             19 COMPARE_OP               4 (>)
             22 POP_JUMP_IF_FALSE        7

 12          25 LOAD_GLOBAL              0 (True)
             28 RETURN_VALUE
             29 JUMP_ABSOLUTE            7
        >>   32 POP_BLOCK

 13     >>   33 LOAD_GLOBAL              1 (False)
             36 RETURN_VALUE
>>> dis.dis(any_with_ge.func_code.co_consts[1])
 17           0 LOAD_FAST                0 (.0)
        >>    3 FOR_ITER                17 (to 23)
              6 STORE_FAST               1 (elm)
              9 LOAD_FAST                1 (elm)
             12 LOAD_CONST               0 (0)
             15 COMPARE_OP               4 (>)
             18 YIELD_VALUE
             19 POP_TOP
             20 JUMP_ABSOLUTE            3
        >>   23 LOAD_CONST               1 (None)
             26 RETURN_VALUE

As you can see there's no jump condition in the any() version, it basically gets the value of the > comparison and then again checks its truthy value using PyObject_IsTrue again. On the other hand the gt_0 checks the truthy value of the condition once and returns True or False based on that.

Now let's add another any() based version that has an if-condition like in the for-loop.

def any_with_ge_and_condition(lst):
    return any(True for elm in lst if elm > 0)

Bytecode:

>>> dis.dis(any_with_ge_and_condition.func_code.co_consts[1])
 21           0 LOAD_FAST                0 (.0)
        >>    3 FOR_ITER                23 (to 29)
              6 STORE_FAST               1 (elm)
              9 LOAD_FAST                1 (elm)
             12 LOAD_CONST               0 (0)
             15 COMPARE_OP               4 (>)
             18 POP_JUMP_IF_FALSE        3
             21 LOAD_GLOBAL              0 (True)
             24 YIELD_VALUE
             25 POP_TOP
             26 JUMP_ABSOLUTE            3
        >>   29 LOAD_CONST               1 (None)
             32 RETURN_VALUE

Now we have reduced the work done by any() by adding the condition(check last section for more details) and it will have to check truthy twice only once when the condition is going to be True, else it will basically skip to next item.

Now let's compare the timings of these 3:

>>> %timeit gt_0(lst)
10 loops, best of 3: 26.1 ms per loop
>>> %timeit any_with_ge(lst)
10 loops, best of 3: 57.7 ms per loop
>>> %timeit any_with_ge_and_condition(lst)
10 loops, best of 3: 26.8 ms per loop

Let's modify gt_0 to include two checks as in the simple any() version and check its timing.

from operator import truth
# This calls `PyObject_IsTrue` internally
# https://github.com/python/cpython/blob/master/Modules/_operator.c#L30


def gt_0_truth(lst, truth=truth): # truth=truth to prevent global lookups
    for elm in lst:
        condition = elm > 0
        if truth(condition):
            return True
    return False

Timing:

>>> %timeit gt_0_truth(lst)
10 loops, best of 3: 56.6 ms per loop

Now, let's see what happens when we try to check truthy value of an item twice using operator.truth.

>> %%timeit t=truth
... [t(i) for i in xrange(10**5)]
...
100 loops, best of 3: 5.45 ms per loop
>>> %%timeit t=truth
[t(t(i)) for i in xrange(10**5)]
...
100 loops, best of 3: 9.06 ms per loop
>>> %%timeit t=truth
[t(i) for i in xrange(10**6)]
...
10 loops, best of 3: 58.8 ms per loop
>>> %%timeit t=truth
[t(t(i)) for i in xrange(10**6)]
...
10 loops, best of 3: 87.8 ms per loop

That's quite a difference even though we are simply calling truth()(i.e PyObject_IsTrue) on an already boolean object, I guess that sort of explains the slowness of basic any() version.

You may argue that if condition in any() will also result in two truthiness check, but that's not the case when the comparison operation returns Py_True or Py_False. POP_JUMP_IF_FALSE simply jumps to the next OP code and no call to PyObject_IsTrue is made.

Picul answered 28/6, 2017 at 22:51 Comment(6)

I don't think if there are extra comparisons here, because all we can see is that in both we are performing one comparison and in both any follows it's regular procedure. It seems that the difference is because of the ways that python evaluates this comparison (in python and/or delegating it to the built-in function). When we add the extra condition = elm > 0 to the manual function it'll preforms in Python layer not in a compiled code object as in the generator expression. I think that's what happens when we pass elm > 0 to the any instead of a bool object. – Raycher 29/6, 2017 at 9:7

@Kasramvd I didn't say extra comparisons, it's just that elm > 0 is first getting converted to a boolean value and then any() is checking its truthiness again every time. – Picul 29/6, 2017 at 9:10

Sorry that's the way I interpreted your answer. Because in second case if we do the comparison within the generator expression there will be still creating a boolean value and again checking the truthiness. I think the reason that we see an extra POP_JUMP_IF_FALSE in second case's byte-code is because the any is encountering to a bool object rather than a comparison, and in both cases the number of operations are the same, it seems that the difference is in the way that python evaluates that comparison. – Raycher 29/6, 2017 at 9:15

@Kasramvd If there's an if condition in any() and a falsy value comes in then True won't even come into picture, it is evaluated only and only when the if condition is True and that would be only once in case of any(), as it will short-circuit after the first truthy value.(This is related to any(True for elm in lst if elm > 0)). Hence, I am not sure why you think there would be same number of operations. – Picul 29/6, 2017 at 9:35

Secondly POP_JUMP_IF_FALSE simply jumps to next op code when it receives Py_True or Py_False(will be provided by the comparison in this case), no call to PyObject_IsTrue in that case. – Picul 29/6, 2017 at 9:44

I agree with your second comment, but regarding the first one you're saying that when we are passing the elm > 0 to any it will stop the iteration right after it evaluates as True, (evaluating the condition then PyObject_IsTrue ) and in other case it will evaluates the condition, if it's True it will returns the value but since it's passing True or False to any it will escape the PyObject_IsTrue so in second case we don't have PyObject_IsTrue right? – Raycher 29/6, 2017 at 21:45

print(timeit('any(True for elm in lst if elm > 0)',setup='lst = [0 for _ in range(1000000)] + [1]', number=10))
print(timeit('any([elm > 0 for elm in lst])',setup='lst = [0 for _ in range(1000000)] + [1]', number=10))
print(timeit('any(elm > 0 for elm in lst)',setup='lst = [0 for _ in range(1000000)] + [1]', number=10))

produces:

2.1382904349993623
3.1172365920028824
4.580027656000311

As explained by Kasramvd, the last version is slowest because it is using a generator expression; a list comprehension is a bit faster, but - surprisingly - using a generator expression with a conditional clause as proposed by Ashwini Chaudhary is even faster.

Butyraceous answered 28/6, 2017 at 12:58 Comment(1)

I am not getting those results. I am getting 17.4 ms, 39.1 ms and 52.4 ms. It makes sense that the list comprehension is the slowest because it needs to build an entirely new list, while generator expressions are faster, and also stop when the condition is satisfied. Here that last thing does not have a big impact because we know the condition is not met until the last element, but watch out if I move the 1 to the beginning of the list: 47 ms per loop with list comprehension and 430 ns with a generator expression. Yeah, nanoseconds. Massive difference. – Anallise 28/6, 2017 at 14:4

The main chunk of performance boils down to the for loops.

In your any, there are two for loops: for elm in lst and the for loop carried out by any. So, any iterates over a generator that looks like False, False, False, ..., True

In your gt_0, there is only one for loop.

If you change it to check if the element is truthy at all, so they both only loop once:

def _any(lst):
    for elm in lst:
        if elm:
            return True
    return False

_any(lst)

any(lst)

There is a clear winner:

$ python2 -m timeit "from test import lst, _any" "any(lst)"
100 loops, best of 3: 5.68 msec per loop

$ python2 -m timeit "from test import lst, _any" "_any(lst)"
10 loops, best of 3: 17 msec per loop

Odine answered 28/6, 2017 at 12:54 Comment(1)

I don't see how there are two for loops as you say. – Anallise 28/6, 2017 at 13:59

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags