Faster to assign dictionary value to variable, or continually access?

Asked 6/9, 2019 at 16:2 Answered 15/7, 2024 at 15:23

python python-3.x performance dictionary

I have a dictionary such as:

d = {'a':'a-val', 'b':'b-val', 'c':'c-long-val'*1000}

And I need to repeatedly access d['c'] as in:

print('value of c is', d['c'])
x_queue.put(d['c'])
some_function(d['c'])

But I'm wondering if it would be faster to assign d['c'] to a variable and use it each time:

c_value = d['c']` # is this assignment "worth it"?
print('value of c is', c_value)
x_queue.put(c_value)
some_function(c_value)

My hunch is it may depend on

number of elements in d (finding key is more costly with bigger d)
size of d['c'] (assignment is more costly with bigger d['c'])

But I'm really not sure if one of those options (or another?) is faster or more pythonic.

Lathy answered 6/9, 2019 at 16:2 Comment(6)

Just try, measure and decide... – Overslaugh 6/9, 2019 at 16:3

Thanks, I could (and will) try and measure for my specific case, but I still would like to know why one turns out better - if it does. Are my assumptions of the cost tradeoffs above correct? – Lathy 6/9, 2019 at 16:11

In almost all languages, getting a value from a variable is usually faster than retrieving it from a container. – Deadradeadweight 6/9, 2019 at 16:11

One just reads from a memory location, the other has to perform some extra calculation. – Deadradeadweight 6/9, 2019 at 16:12

There are some exceptions, like variables in closures, which need to go through activation records. – Deadradeadweight 6/9, 2019 at 16:13

For hunch #1: the size of the dict doesn't really matter, the only thing that can make access slower would be collisions in the hash table. For hunch #2: No copy takes place, your variable is just a new name for the object it refers to, so again the size doesn't matter. For #1, you can have a look at #43690691 – Overslaugh 6/9, 2019 at 16:29

Although accessing a dict value by key has an average time complexity of O(1), it has a worst-case time complexity of O(n) after incurring the overhead of calculating the hash value of the key. A variable assignment and the retrieval of its value on the other hand, take very minimal, constant time, so avoid re-evaluating a dict value by key whenever possible.

Plinth answered 6/9, 2019 at 16:13 Comment(4)

They're both O(1), but hashing has a larger coefficient. – Deadradeadweight 6/9, 2019 at 16:13

Again, when I mentioned O(n) I was talking about the worst-case time complexity for the retrieval of a value from the dict heap after calculating the hash value. – Plinth 6/9, 2019 at 16:15

In other words, you're confusing hashing (which takes O(1)) with hash collision resolution (which takes O(n) in worst case). – Plinth 6/9, 2019 at 16:28

Big-O notation is the wrong thing to worry about here. Ignoring collision resolution, calculating a hash value is still much slower than reading a variable. – Deadradeadweight 6/9, 2019 at 16:31

The keys are hashed in a dictionary. So when you ask for d['c'], a hash is calculated first and then the corresponding value found. Lot more steps then a simple assignment.

Succor answered 6/9, 2019 at 16:9 Comment(0)

Note: Before you spend your time (a.k.a your employer's money) on shaving nano- or milli- seconds off of your program, make sure you positively need to optimize it.

Read: Premature optimization is the root of all evil -- DonaldKnuth

TL;DR

Yes reading directly from a variable will always be faster than reading from a dictionary, which requires to compute a hash.

In-Depth

Both algorithms have a complexity of O(1), however, the fundamental operation of a dictionary is to compute a hash, then read an address in memory. This is in comparison with a variable, which simply reads from memory.

hash + read > read

This begs the question "When is worth it allocate a variable in memory to not re-use a dictionary?" or "How large is the difference between both operations?"

Quick and dirty benchmark

If we trust timeit, then it would always be faster to cache the value, even when we call the value only twice. Tested in JupyterLab:

Initialize a test dictionary:

asdf = dict(a=1, b=2, c=3)

1st test

Read the same dictionary value twice:

%timeit (asdf['a'], asdf['a'])
82.5 ns ± 7.6 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

Read from dictionary once, then cache it:

%timeit (u := asdf['a'], u)
62.2 ns ± 2.54 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

2nd test

If you have to call a value a hundred and one times, you can speed up you calls by x5:

%%timeit [asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a'], asdf['a']]
2.85 µs ± 269 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

And here is the faster variant:

%timeit [u := asdf['a'], u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u, u]
528 ns ± 29.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Patellate answered 15/7, 2024 at 15:23 Comment(0)

TL;DR

In-Depth

Quick and dirty benchmark

1st test

2nd test

Recommended topics

Hot tags