I'm creating a dictionary d
of one million of items which are tuples, and ideally I'd like to access them with:
d[1634].id # or d[1634]['id']
d[1634].name # or d[1634]['name']
d[1634].isvalid # or d[1634]['isvalid']
rather than d[1634][0]
, d[1634][1]
, d[1634][2]
which is less explicit.
According to my test:
import os, psutil, time, collections, typing
Tri = collections.namedtuple('Tri', 'id,name,isvalid')
Tri2 = typing.NamedTuple("Tri2", [('id', int), ('name', str), ('isvalid', bool)])
t0 = time.time()
# uncomment only one of these 4 next lines:
d = {i: (i+1, 'hello', True) for i in range(1000000)} # tuple
# d = {i: {'id': i+1, 'name': 'hello', 'isvalid': True} for i in range(1000000)} # dict
# d = {i: Tri(id=i+1, name='hello', isvalid=True) for i in range(1000000)} # namedtuple
# d = {i: Tri2(id=i+1, name='hello', isvalid=True) for i in range(1000000)} # NamedTuple
print('%.3f s %.1f MB' % (time.time()-t0, psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2))
"""
tuple: 0.257 s 193.3 MB
dict: 0.329 s 363.6 MB
namedtuple: 1.253 s 193.3 MB (collections)
NamedTuple: 1.250 s 193.5 MB (typing)
"""
- using a
dict
doubles the RAM usage, compared to atuple
- using a
namedtuple
orNamedTuple
multiplies by 5 the time spent, compared to atuple
!
Question: is there a tuple-like data structure in Python 3 which allows to access the data with x.id
, x.name
, etc. and also is RAM and CPU efficient?
Notes:
in my real use case, the
tuple
is something like a C-struct of type(uint64, uint64, bool)
.I've also tried with:
slots
(to avoid the interal object's__dict__
, see Usage of __slots__?)dataclass
:@dataclasses.dataclass class Tri3: id: int ...
ctypes.Structure
:class Tri7(ctypes.Structure): _fields_ = [("id", ctypes.c_int), ...]
but it was not better (all of them ~ 1.2 sec.), nothing close to a genuine
tuple
in terms of performanceHere are other options: C-like structures in Python
psutil
method you use gives you just a snapshot of the current memory usage. It is notoriously inaccurate in the face of garbage-collection etc. In fact, multiple runs will show you how it fluctuates. – Counterweight@dataclass
(slower than yournamedtuple
), and making a larger container at once such as apandas.DataFrame
(also slower, but quite compact). The latter also wouldn't give you directly a dot accessor; instead you'd have to do:df.iloc[1634].name
. – Counterweightd
when the RAM usage is printed with psutil, so here it's rather accurate. – Lackadayslots
may be of interest: https://mcmap.net/q/18891/-usage-of-__slots__/… – Impoliterss
is also the total ram consumed by the whole process, not just what used by your structure. – Counterweightstruct
. – Lackaday