Why is `dataclasses.asdict(obj)` > 10x slower than `obj.__dict__()`
Asked Answered
N

2

63

I am using Python 3.6 and the dataclasses backport package from ericvsmith.

It seems that calling dataclasses.asdict(my_dataclass) is ~10x slower than calling my_dataclass.__dict__:

In [172]: @dataclass
     ...: class MyDataClass:
     ...:     a: int
     ...:     b: int
     ...:     c: str
     ...: 

In [173]: %%time
     ...: _ = [MyDataClass(1, 2, "A" * 1000).__dict__ for _ in range(1_000_000)]
     ...: 
CPU times: user 631 ms, sys: 249 ms, total: 880 ms
Wall time: 880 ms

In [175]: %%time
     ...: _ = [dataclasses.asdict(MyDataClass(1, 2, "A" * 1000)) for _ in range(1_000_000)]
     ...: 
CPU times: user 11.3 s, sys: 328 ms, total: 11.6 s
Wall time: 11.7 s

Is this expected behavior? In what cases should I have to use dataclasses.asdict(obj) instead of obj.__dict__?


Edit: Using __dict__.copy() does not make a big difference:

In [176]: %%time
     ...: _ = [MyDataClass(1, 2, "A" * 1000).__dict__.copy() for _ in range(1_000_000)]
     ...: 
CPU times: user 922 ms, sys: 48 ms, total: 970 ms
Wall time: 970 ms
Numerable answered 7/9, 2018 at 20:53 Comment(1)
Well, for starters, asdict will create and return new dict object, and recursive and convert any other data-class instances into dicts, whereas __dict__ simply returns a reference to the namespace of the object, something you probably don't want to mutate, for example...Macdougall
W
77

In most cases where you would have used __dict__ without dataclasses, you should probably keep using __dict__, maybe with a copy call. asdict does a lot of extra work that you may not actually want. Here's what it does.


First, from the docs:

Each dataclass is converted to a dict of its fields, as name: value pairs. dataclasses, dicts, lists, and tuples are recursed into. For example:

@dataclass
class Point:
     x: int
     y: int

@dataclass
class C:
     mylist: List[Point]

p = Point(10, 20)
assert asdict(p) == {'x': 10, 'y': 20}

c = C([Point(0, 0), Point(10, 4)])
assert asdict(c) == {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}

So if you want recursive dataclass dictification, use asdict. If you don't want it, then all the overhead that goes into providing it is wasted. Particularly, if you use asdict, then changing the implementation of contained objects to use dataclass will change the result of asdict on outer objects.

The recursive logic also has no handling for circular references. If you're using dataclasses to represent, say, a graph, or any other data structure with circular references, asdict will crash:

import dataclasses

@dataclasses.dataclass
class GraphNode:
    name: str
    neighbors: list['GraphNode']

x = GraphNode('x', [])
y = GraphNode('y', [])
x.neighbors.append(y)
y.neighbors.append(x)

dataclasses.asdict(x) # crash here!

The asdict call in this example hits a RecursionError: maximum recursion depth exceeded while calling a Python object.


Aside from that, asdict builds a new dict, while __dict__ simply accesses the object's attribute dict directly. The return value of asdict will not be affected by reassignment of the original object's fields. Also, asdict uses fields, so if you add attributes to a dataclass instance that don't correspond to declared fields, asdict won't include them.

Finally, the docs don't mention it at all, but asdict will call deepcopy on everything that isn't a dataclass object, dict, list, or tuple:

else:
    return copy.deepcopy(obj)

(Dataclass objects, dicts, lists, and tuples go through the recursive logic, which also builds a copy, just with recursive dictification applied.)

deepcopy is really expensive on its own, and the lack of any memo handling means that asdict is likely to create multiple copies of shared objects in nontrivial object graphs. Watch out for that:

>>> from dataclasses import dataclass, asdict
>>> @dataclass
... class Foo:
...     x: object
...     y: object
... 
>>> a = object()
>>> b = Foo(a, a)
>>> c = asdict(b)
>>> b.x is b.y
True
>>> c['x'] is c['y']
False
>>> c['x'] is b.x
False
Wharf answered 7/9, 2018 at 20:58 Comment(4)
OK, this makes sense. I guess I just looking for a "pythonic" way to convert a dataclass into a dictionary, without relying on dunder attributes like __dict__. I guess vars(my_data_class) accomplishes that.Numerable
@Numerable don't do that. It does't convert to a dict it returns the objects' namespace dict.Macdougall
My use-case is df = pd.DataFrame([vars(dc) for dc in dcs]), so a copy is made eventually. But in general, yes, you are right.Numerable
@RickyLevi: That doesn't make sense. It sounds like you screwed up something else somewhere.Wharf
F
0

they are not doing the same thing:

import dataclasses


@dataclasses.dataclass
class Item:
    o: str


@dataclasses.dataclass
class Obj:
    o: list[Item]


print(Obj(o=[Item(o="1")]).__dict__)  # {'o': [Item(o='1')]}

print(dataclasses.asdict(Obj(o=[Item(o="1")])))  # {'o': [{'o': '1'}]}
Freshwater answered 4/5, 2024 at 19:16 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.