Faster to assign dictionary value to variable, or continually access?
Asked Answered
L

3

9

I have a dictionary such as:

d = {'a':'a-val', 'b':'b-val', 'c':'c-long-val'*1000}

And I need to repeatedly access d['c'] as in:

print('value of c is', d['c'])
x_queue.put(d['c'])
some_function(d['c'])

But I'm wondering if it would be faster to assign d['c'] to a variable and use it each time:

c_value = d['c']` # is this assignment "worth it"?
print('value of c is', c_value)
x_queue.put(c_value)
some_function(c_value)

My hunch is it may depend on

  • number of elements in d (finding key is more costly with bigger d)
  • size of d['c'] (assignment is more costly with bigger d['c'])

But I'm really not sure if one of those options (or another?) is faster or more pythonic.

Lathy answered 6/9, 2019 at 16:2 Comment(6)
Just try, measure and decide...Overslaugh
Thanks, I could (and will) try and measure for my specific case, but I still would like to know why one turns out better - if it does. Are my assumptions of the cost tradeoffs above correct?Lathy
In almost all languages, getting a value from a variable is usually faster than retrieving it from a container.Deadradeadweight
One just reads from a memory location, the other has to perform some extra calculation.Deadradeadweight
There are some exceptions, like variables in closures, which need to go through activation records.Deadradeadweight
For hunch #1: the size of the dict doesn't really matter, the only thing that can make access slower would be collisions in the hash table. For hunch #2: No copy takes place, your variable is just a new name for the object it refers to, so again the size doesn't matter. For #1, you can have a look at #43690691Overslaugh
P
2

Although accessing a dict value by key has an average time complexity of O(1), it has a worst-case time complexity of O(n) after incurring the overhead of calculating the hash value of the key. A variable assignment and the retrieval of its value on the other hand, take very minimal, constant time, so avoid re-evaluating a dict value by key whenever possible.

Plinth answered 6/9, 2019 at 16:13 Comment(4)
They're both O(1), but hashing has a larger coefficient.Deadradeadweight
Again, when I mentioned O(n) I was talking about the worst-case time complexity for the retrieval of a value from the dict heap after calculating the hash value.Plinth
In other words, you're confusing hashing (which takes O(1)) with hash collision resolution (which takes O(n) in worst case).Plinth
Big-O notation is the wrong thing to worry about here. Ignoring collision resolution, calculating a hash value is still much slower than reading a variable.Deadradeadweight
S
0

The keys are hashed in a dictionary. So when you ask for d['c'], a hash is calculated first and then the corresponding value found. Lot more steps then a simple assignment.

Succor answered 6/9, 2019 at 16:9 Comment(0)
P
0

Note: Before you spend your time (a.k.a your employer's money) on shaving nano- or milli- seconds off of your program, make sure you positively need to optimize it.

Read: Premature optimization is the root of all evil -- DonaldKnuth

TL;DR

Yes reading directly from a variable will always be faster than reading from a dictionary, which requires to compute a hash.

In-Depth

Both algorithms have a complexity of O(1), however, the fundamental operation of a dictionary is to compute a hash, then read an address in memory. This is in comparison with a variable, which simply reads from memory.

hash + read > read

This begs the question "When is worth it allocate a variable in memory to not re-use a dictionary?" or "How large is the difference between both operations?"

Quick and dirty benchmark

If we trust timeit, then it would always be faster to cache the value, even when we call the value only twice. Tested in JupyterLab:

Initialize a test dictionary:

asdf = dict(a=1, b=2, c=3)

1st test

Read the same dictionary value twice:

%timeit (asdf['a'], asdf['a'])
82.5 ns ± 7.6 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

Read from dictionary once, then cache it:

%timeit (u := asdf['a'], u)
62.2 ns ± 2.54 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

2nd test

If you have to call a value a hundred and one times, you can speed up you calls by x5:

%%timeit [asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a']]
2.85 µs ± 269 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

And here is the faster variant:

%timeit [u := asdf['a'], u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u]
528 ns ± 29.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Patellate answered 15/7, 2024 at 15:23 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.