Why is max() function so much slower when comparing 2 elements vs direct comparison with an if statement?

Asked 23/5, 2019 at 19:17 Answered 30/9, 2024 at 19:46

By running the below code I get for direct comparison with an if statement almost 4x the speed vs using the max function.

I am trying to understand the reason behind this.

comparison : 0.63s, max : 2.3s

import time

if _name_ == '_main_':
    sim = 10**7

    s = time.time()
    for _ in range(sim):
        if 1 > 2:
            pass
    res1 = time.time()-s

    s = time.time()
    for _ in range(sim):
        max(1, 2)
    res2 = time.time()-s

    print('comparison : {:.2}s, max : {:.2}s'.format(res1, res2))

Bisayas answered 23/5, 2019 at 19:17 Comment(2)

Interestingly, pylint suggests to use the latter (max() function): https://pylint.pycqa.org/en/latest/user_guide/messages/refactor/consider-using-max-builtin.html – Tracitracie 12/3, 2024 at 10:46

Consider using timeit module for benchmarking code. No need to manually calculate diff of time.time(). E.g. timeit.timeit('if 1>2: pass') vs timeit.timeit('max(1,2)'). – Tracitracie 12/3, 2024 at 10:46

because max involves a dictionary lookup for the function name, then a function call, whereas direct < operator does not.

max is beginning to become interesting speed-wise when you have more elements.

Related / same speed difference:

Bungalow answered 23/5, 2019 at 19:19 Comment(8)

Two dictionary lookups actually; one in module globals, and when that fails, a second one in builtin scope. If you're calling max a lot in a given function, caching it (e.g. via a keyword only argument, a la def myfunc(arg1, arg2, *, max=max):) is a cheap hack to remove the lookup costs (replacing the two dict lookups of LOAD_GLOBAL with a single C array indexing operation from LOAD_FAST). The function call overhead is the larger cost though (generic function dispatch has a lot more overhead than anything that goes straight for specialized slotted functions, e.g. <). – Arri 23/5, 2019 at 19:27

yes, dictionary lookup is O(1). Thanks for the precisions though, interesting. – Pedroza 23/5, 2019 at 19:29

Well, it's all basically O(1) (aside from "how many items were passed to max). dict lookup is O(1) average case, but with a higher constant overhead than arrays and frequently one or two bucket collisions before it finds the right item. On my machine, a function that loads, but doesn't call, max 1000 times takes about 45 μs to run without caching in the arguments, about 25.5 μs with the max=max caching, and 21.5 μs if the 1000 long loop body is pass instead of max. So loading from the local 1000x costs ~4 μs, loading from builtin scope cost ~23.5 μs, nearly 6x higher overhead. – Arri 23/5, 2019 at 19:38

Though like I said, that overhead pales in comparison to actually calling it (having it call max(1, 2) increases time to 206 μs uncached, and 173 μs, so the benefit of caching is dwarfed by the overhead of calling the function). – Arri 23/5, 2019 at 19:39

now my answer feels undocumented :) Also O(1) doesn't mean that it isn't costly for super-long function names because hashing takes longer to compute (not the case for max, though). If you want speed, use a compiled language, or a framework like numpy or pandas where you pass one request for a lot of data, not small requests. – Pedroza 23/5, 2019 at 19:40

that also reminds me of a <= b < c versus b in range(a,c) – Pedroza 23/5, 2019 at 19:43

Well, ultra-detailed now, but CPython (and I'd assume most alternate interpreters, because it makes sense) interns all names during compilation, which also precomputes & caches the hash codes, so the name length doesn't matter for hashing. And being interned in all the dicts, the comparison operation ends up being identity based (assuming you haven't manually inserted an uninterned str into the dicts that implement globals and builtins), so the contents of the string are never actually read for comparison. And yeah, range testing is definitely similar to this case. – Arri 23/5, 2019 at 19:44

all this information in comments seems wasteful to me (because comments, are, you know, comments and can be easily discarded). Would be better in a real answer (even if OP didn't ask that much detail). Thanks, very instructive anyway – Pedroza 23/5, 2019 at 19:47

I have a hypothesis, but I'm not entirely sure of it. Here's what I tried:

from time import time
from random import random

factor = 100

def lesser(a,b):
    if a < b:
        return a
    return b

if __name__ == "__main__":

    start = time()

    for _ in range(10000000):
        a = random() * factor
        b = random() * factor
        l = lesser(a,b)

    end = time()
    print(end-start)


    start = time()

    for _ in range(10000000):
        a = random() * factor
        b = random() * factor
        l = min(a,b)

    end = time()
    print(end-start)

When I run this on my laptop I get:

5.072136402130127
7.319249391555786

So what explains this? The lookup for the function call? I don't know, because mine is a function too. Is the global vs local really that different?

But here's an idea. The python min function does something more than mine: It takes an arbitrary number of arguments. Look at the code for the python builtin function. It's on line 1915 and it immediately calls a more general function for both max and min on line 1796. That function has alot more going on than my short one. Maybe that stuff takes longer.

Universalize answered 30/9, 2024 at 19:46 Comment(0)

Recommended topics

Hot tags