Memory-usage of dictionary in Python?
Asked Answered
A

4

29

I am slightly confused when I use the getsizeof method in the sys module for dictionaries. Below I have created a simple dictionary of two strings. The two strings' sizes are clearly larger than the one of the dictionary. The dictionary size is probably the dictionary overhead only, i.e., it doesn't take the actual data into account. What is the best way to figure out the memory-usage of the whole dictionary (keys, values, dictionary overhead)?

>>> first = 'abc'*1000
>>> second = 'def'*1000
>>> my_dictionary = {'first': first, 'second': second}
>>> getsizeof(first)
3021
>>> getsizeof(second)
3021
>>> getsizeof(my_dictionary)
140
Amphidiploid answered 5/7, 2011 at 8:23 Comment(0)
R
17

From the PythonDocs

See recursive sizeof recipe for an example of using getsizeof() recursively to find the size of containers and all their contents.

So it only counts the overhead, but you can use the function in this link to calculate it for containers like dicts.

Rozek answered 5/7, 2011 at 8:30 Comment(2)
it is not working for simple dict like d = dict(a="4", b="4", c="4", d="4") it is skipping values corresponding to b,c,dSelfsupporting
sorry i guess your code is correct, in above case python is reusing object "4"Selfsupporting
V
9

The recursive getsizeof would get the actual size, but if you have multiple layers of dictionaries and only want to get a rough estimate. The json comes handy.

>>> first = 'abc'*1000
>>> second = 'def'*1000
>>> my_dictionary = {'first': first, 'second': second}
>>> getsizeof(first)
3049
>>> getsizeof(second)
3049
>>> getsizeof(my_dictionary)
288
>>> getsizeof(json.dumps(my_dictionary))
6076
>>> size = getsizeof(my_dictionary)
>>> size += sum(map(getsizeof, my_dictionary.values())) + sum(map(getsizeof, my_dictionary.keys()))
>>> size
6495
Vallation answered 7/6, 2016 at 19:35 Comment(1)
Definitely points for creativity, but it needs everything to be serializable, it is slower, and as you say it's an approximation...Esma
K
4

Well, dictionaries don't store the actual string inside them, it works a bit like C/C++ pointers, so you only get a constant overhead in the dictionary for every element.

The total size is

size = getsizeof(d)
size += sum(map(getsizeof, d.itervalues())) + sum(map(getsizeof, d.iterkeys()))
Karlynkarma answered 5/7, 2011 at 8:28 Comment(1)
To be pedantic, if any of the values is a container (rather than a scalar) it needs to drill down that container as well.Idalla
F
3

method: Serialise the dictionnary into a string, then get the size of the string.

I suggest to use 'dumps' from the pickle or from the json library. It serialise the dictionary into a string. Then you can get the size of the string. Like this:

getsizeof(pickle.dumps(my_dictionary)))

or

getsizeof(json.dumps(my_dictionary)))

If there are ndarray in the dictionary, use "pickle" because "json" can't process ndarray.

Here is you modified example:

from sys import getsizeof
import json
import pickle

first = 'abc'*1000
second = 'def'*1000
my_dictionary = {'first': first, 'second': second}

print('first:', getsizeof(first))
print('second',getsizeof(second))
print('dict_:', getsizeof(my_dictionary))

print('size of json dumps my_dictionary: ', getsizeof(json.dumps(my_dictionary)))
print('size of pickle dumps my_dictionary: ', getsizeof(pickle.dumps(my_dictionary)))

results:

first: 3049
second 3049
dict_: 232
size of json dumps my_dictionary:  6076
size of pickle dumps my_dictionary:  6078
Fatimafatimah answered 16/6, 2023 at 7:57 Comment(1)
You could even just use sys.getsizeof(str(my_dictionary)), which gives the same result as json.dumpsBuggs

© 2022 - 2024 — McMap. All rights reserved.