Maximum size of a dictionary in Python?
Asked Answered
G

4

18

I'm using a dictionary to hold a large number of objects, and have a string name for each of them. To be specific, here is my code:

from itertools import product
for (i,j,k) in product(range(N),range(M),range(K)):
    var_name='x_'+'_'+str(i)+str(j)+'_'+str(k)
    var_dict[var_name] = f(var_name,other_params)
print len(var_dict)

f(...) returns an object. In my code N=363, M=500, and K=2. So I expect 363000 entries in the dictionary. But when I check the length of var_dict, it is 330860!

(Pdb) len(var_dict)
330860

Here are my questions:

  1. Is there any explanation for that? E.g. is there any limit for the number of items that built-in hash table of python can address?

  2. What can I do to solve this problem?

Guimond answered 8/5, 2014 at 3:55 Comment(2)
As a suggestion, when seemingly strange things like this occur you should verify (such as by incrementing a counter in the loop) that you are actually putting as many items into the container as you expect.Haggle
This was obviously caused by a typo; 'x_'+'_'+str(i)+str(j)+'_'+str(k) makes no sense as there would be no reason to use two consecutive string literals, and it would be strange to specify a double underscore followed by two variable quantities. Surely OP meant to write 'x_'+str(i)+'_'+str(j)+'_'+str(k) instead, although of course there are better ways to format a string.Sessions
H
20

The problem is here:

str(i)+str(j)

This does not produce unique identifiers. For example, the value set when i=1 and j=11 will be overwritten by the value set when i=11 and j=1 (there are many more instances as well).

You can fix the problem by inserting some delimiter character between the two numbers (such as an underscore like you have between j and k).

Haggle answered 8/5, 2014 at 4:4 Comment(1)
Thanks! I was really confused :)Guimond
G
14

Access times for a string key in a python dictionary is in the order of 1 microsecond (1s / 1000 / 1000).

The time taken increases slightly dependent on the number of entries in the dictionary, possibly with something like a log(N) scaling.

Performance degrades significantly for dictionaries larger than 2^26 = 67,108,864. It takes 30x longer to read from a dictionary of size 2^27 = 134,217,728, and 9000x longer for a dictionary of size 2^28 = 268,435,456. My computer ran out of memory before reaching 2^29.

Therefore, the practical answer to your question of the maximum size of a dictionary in python is:

2^26 = 67,108,864

>>> for i in range(1,sys.maxsize):
...   key = str(i)
...   d[key] = key
...   if math.log2(i) % 1 == 0: 
...     time_start = time.perf_counter()
...     value = d[key]
...     time_taken = time.perf_counter() - time_start
...     print(time_taken*1000*1000, i)
... 
0.682000063534360 1
0.521999936609063 2
0.394000153391971 4
0.365999994755839 8
0.424000063503626 16
0.380000074073905 32
0.365000005331239 64
0.447000047643086 128
0.413999941883957 256
0.481999904877739 512
0.641000042378436 1024
0.906999957805965 2048
0.616000079389778 4096
0.995999926090007 8192
1.115000031859381 16384
1.142999963121838 32768
1.144999941971036 65536
1.156000053015304 131072
1.231999931405880 262144
1.225999994858284 524288
1.196000084746629 1048576
1.308000037170131 2097152
1.232000158779556 4194304
1.314999963142327 8388608
1.178000047730165 16777216
1.179000037154764 33554432
1.669000084802974 67108864
33.22600014143973 134217728
9655.005000013261 268435456
Killed: 9
Gaius answered 8/6, 2020 at 16:26 Comment(3)
"possibly with something like a log(N) scaling."? Why guess at something like this? Dicts are O(1) for key access, unless you've built a pathological dict. Before you run out of memory, you might be seeing a paging penalty, which would explain the slow-down you saw.Fleawort
For the record I just ran this to the same point with no catastrophic slow down. though most of the times were around 4. gist.github.com/altendky/ecf9690dda339501099c6d3e862ddf05Synaeresis
Access time to python dict is constant in general. The increase that you see in the last two lines is because of RAM swapping. You probably had around 16GB of RAM on the computer that you run it and it ended when dict had 134217728 entries.Nembutal
C
10

You don't have a delimiter between i and j in your constructed strings, so tuples like (12, 1, 0) and (1, 21, 0) produce the same name. If possible, don't make names for these things at all; just use the numbers directly:

var_dict[i, j, k] = f(i, j, k, other_params)

If f really needs to take a string, change the name construction to put a delimiter between i and j:

var_name = 'x_{}_{}_{}'.format(i, j, k)

and if possible, use the tuple as a dict key even if f needs a string:

var_dict[i, j, k] = f(var_name, other_params)
Carpophagous answered 8/5, 2014 at 4:6 Comment(1)
Yuppers! Simple fix, but a tired brain after many hours of coding can't see such minor things ;)Guimond
D
-1

No size of limitation on dict

d = {}
for i in xrange(999999):
    d[i] = i
len(d)

It prints

999999
Docia answered 8/5, 2014 at 4:9 Comment(1)
there is just no size limit under 999999Elyse

© 2022 - 2024 — McMap. All rights reserved.