Make the Python json encoder support Python's new dataclasses
Asked Answered
O

13

193

Starting with Python 3.7, there is something called a dataclass:

from dataclasses import dataclass

@dataclass
class Foo:
    x: str

However, the following fails:

>>> import json
>>> foo = Foo(x="bar")
>>> json.dumps(foo)
TypeError: Object of type Foo is not JSON serializable

How can I make json.dumps() encode instances of Foo into json objects?

Ounce answered 11/7, 2018 at 13:29 Comment(3)
I'd be curious to know how to do the reverse if anyone knows i.e. what about the reverse? what if I have the data as a json file and want to load it back to my data class object?Cog
@CharlieParker, if it's a simple dataclass with primitive types, I supposed you could do json_file_as_dict = json.load(path_to_json_file) and then Foo(**json_file_as_dict).Waitress
related: https://mcmap.net/q/41907/-how-to-make-a-class-json-serializableSanborn
O
227

Much like you can add support to the JSON encoder for datetime objects or Decimals, you can also provide a custom encoder subclass to serialize dataclasses:

import dataclasses, json

class EnhancedJSONEncoder(json.JSONEncoder):
        def default(self, o):
            if dataclasses.is_dataclass(o):
                return dataclasses.asdict(o)
            return super().default(o)

json.dumps(foo, cls=EnhancedJSONEncoder)
Ounce answered 11/7, 2018 at 13:29 Comment(10)
It is important to note, that for a dataclass called Foo and an instance foo_instance = Foo(...) both dataclasses.is_dataclass(Foo) and dataclasses.is_dataclass(foo_instance) evaluate to True leading to a TypeError for dataclasses.asdict(o) if o ist the dataclass itself and not an instance of it.Casaba
How about nesting - e.g. serializing d={ 'a': Foo(1,2), 'b': Foo(3,4)}Biggin
How do you handle object of type ndarray with this?Fancie
Using an json encoder like this will handle nesting just fine - the json serializer will use it to transform the sub-elements recursively.Bernadette
As for handling ndarray, something like if isinstance(obj, np.ndarray) and obj.ndim == 1: return obj.tolist() will do the trick. Though keep in mind that while python will happily serialize things like NaN or Infinity, that isn't legal according the the json spec and many decoders will reject it, so handling those in your encoder is possibly advisable.Bernadette
whats wrong with dataclasses.asdict()?Cog
also, what about the reverse? what if I have the data as a json file and want to load it back to my data class object?Cog
curious, why did you choose dumps vs dumps?Cog
I really, really wish the Python JSON serialiser looked for an asdict() method and used it if it's there.Karlene
@CharlieParker It doesn't handle nesting very well. If you have a dataclass with dataclass fields, the fields are recursively converted to dicts instead of being handled by the data class serialization code. That makes it impossible to identify the concrete class of the field during deserialization. I perfer ´.__dict__()`Lindeberg
B
113

Can't you just use the dataclasses.asdict() function to convert the dataclass to a dict? Something like:

>>> @dataclass
... class Foo:
...     a: int
...     b: int
...     
>>> x = Foo(1,2)
>>> json.dumps(dataclasses.asdict(x))
'{"a": 1, "b": 2}'
Boat answered 10/1, 2019 at 0:59 Comment(5)
The dataclass may be deeply nested part of a large structure. By using a custom encoder, you can do json.dumps({"obj": [something_that_may_or_may_not_contain_a_dataclass]}) asOunce
asdict() will correctly process all nested data classes, as a result, we get nested dictionaries that normally load in jison's string (but! e.g., datetime type must be processed additionally, before loading into a string)Luisaluise
what about the reverse? what if I have the data as a json file and want to load it back to my data class object?Cog
curious, why did you choose dumps vs dumps?Cog
Or, perhaps json.dumps(x, default=vars) could work? Ah, wait, here's from the asdict() docs: "... dataclasses, dicts, lists, and tuples are recursed into. Other objects are copied with copy.deepcopy()."Sanborn
S
80

Ways of getting JSONified dataclass instance

There are couple of options to accomplish that goal, selection of each imply analyze on which approach suits best for your needs:

Standard library: dataclass.asdict

import dataclasses
import json


@dataclass.dataclass
class Foo:
    x: str

foo = Foo(x='1')
json_foo = json.dumps(dataclasses.asdict(foo)) # '{"x": "1"}'

Picking it back to dataclass instance isn't trivial, so you may want to visit that answer https://mcmap.net/q/82365/-python-dataclass-from-a-nested-dict

Marshmallow Dataclass

from dataclasses import field
from marshmallow_dataclass import dataclass


@dataclass
class Foo:
    x: int = field(metadata={"required": True})

foo = Foo(x='1') # Foo(x='1')
json_foo = foo.Schema().dumps(foo) # '{"x": "1"}'

# Back to class instance.
Foo.Schema().loads(json_foo) # Foo(x=1)

As a bonus for marshmallow_dataclass you may use validation on the field itself, that validation will be used when someone deserialize the object from json using that schema.

Dataclasses Json

from dataclasses import dataclass
from dataclasses_json import dataclass_json


@dataclass_json
@dataclass
class Foo:
    x: int

foo = Foo(x='1')
json_foo = foo.to_json() # Foo(x='1')
# Back to class instance
Foo.from_json(json_foo) # Foo(x='1')

Also, in addition to that notice that marshmallow dataclass did type conversion for you whereas dataclassses-json(ver.: 0.5.1) ignores that.

Write Custom Encoder

Follow accepted miracle2k answer and reuse custom json encoder.

Subulate answered 3/7, 2020 at 13:41 Comment(4)
Thank you for marshmallow_dataclass it is really a good way to get a real object from JSON with validation (and even YAML).Michelemichelina
@Michelemichelina you welcome :) btw I haven't mentioned that it will also works on python36 ( has dataclass backport under the hood)Subulate
coming for years of c# + Newtonsoft, dataclasses_json was the most "natural" - thanks!Easley
If you are interested I would also take a look at dataclass-wizard. Its very similar to dataclasses-json and slightly more efficient. It also performs type conversions in most basic cases. Disclaimer: I am the creator of this library.Parsimonious
C
15

If you are ok with using a library for that, you can use dataclasses-json. Here is an example:

from dataclasses import dataclass

from dataclasses_json import dataclass_json


@dataclass_json
@dataclass
class Foo:
    x: str


foo = Foo(x="some-string")
foo_json = foo.to_json()

It also supports embedded dataclasses - if your dataclass has a field typed as another dataclass - if all dataclasses envolved have the @dataclass_json decorator.

Corky answered 10/1, 2020 at 19:47 Comment(1)
I tried this with embedded dataclass and it dioes not workBiggin
J
14

You can also implement the asdict and json.dumps method within the class. In this case it wouldn't be necessary to import json.dumps into other parts of your project:


from typing import List
from dataclasses import dataclass, asdict, field
from json import dumps


@dataclass
class TestDataClass:
    """
    Data Class for TestDataClass
    """
    id: int
    name: str
    tested: bool = False
    test_list: List[str] = field(default_factory=list)

    @property
    def __dict__(self):
        """
        get a python dictionary
        """
        return asdict(self)

    @property
    def json(self):
        """
        get the json formated string
        """
        return dumps(self.__dict__)


test_object_1 = TestDataClass(id=1, name="Hi")
print(test_object_1.__dict__)
print(test_object_1.json)

Output:

{'id': 1, 'name': 'Hi', 'tested': False, 'test_list': []}
{"id": 1, "name": "Hi", "tested": false, "test_list": []}

You can also create a parent class to inherit the methods:

from typing import List
from dataclasses import dataclass, asdict, field
from json import dumps


@dataclass
class SuperTestDataClass:

    @property
    def __dict__(self):
        """
        get a python dictionary
        """
        return asdict(self)

    @property
    def json(self):
        """
        get the json formated string
        """
        return dumps(self.__dict__)


@dataclass
class TestDataClass(SuperTestDataClass):
    """
    Data Class for TestDataClass
    """
    id: int
    name: str
    tested: bool = False
    test_list: List[str] = field(default_factory=list)


test_object_1 = TestDataClass(id=1, name="Hi")
print(test_object_1.__dict__)
print(test_object_1.json)


Joshuajoshuah answered 30/11, 2021 at 8:30 Comment(0)
P
13

The simplest way to encode dataclass and SimpleNamespace objects is to provide the default function to json.dumps() that gets called for objects that can't be otherwise serialized, and return the object __dict__:

json.dumps(foo, default=lambda o: o.__dict__)
Psychologism answered 27/11, 2021 at 11:21 Comment(1)
This is a good idea, and should work in general for serializing nested models with simple types. The only cases which I guess this wouldn't support, would be complex Python types like Enum or datetime, or edge cases where you'd have a dataclass within a Union type, like A | B. This approach should nevertheless work for a simple case in general, as outlined in the original question.Parsimonious
W
11

I'd suggest creating a parent class for your dataclasses with a to_json() method:

import json
from dataclasses import dataclass, asdict

@dataclass
class Dataclass:
    def to_json(self) -> str:
        return json.dumps(asdict(self))

@dataclass
class YourDataclass(Dataclass):
    a: int
    b: int

x = YourDataclass(a=1, b=2)
x.to_json()  # '{"a": 1, "b": 2}'

This is especially useful if you have other functionality to add to all your dataclasses.

Writeoff answered 30/12, 2020 at 9:15 Comment(0)
P
4

pydantic

With pydantic models you get a dataclasses-like experience and full support for dict and Json conversions (and much more).

Python 3.9 and above:

from typing import Optional
from pydantic import BaseModel, parse_obj_as, parse_raw_as


class Foo(BaseModel):
    count: int
    size: Optional[float] = None


f1 = Foo(count=10)

# Parse to dict
print(f1.dict())
# OUT: {'count': 10, 'size': None}

# Load from dict
f2 = Foo.parse_obj({"count": 20})

# Parse to json
print(f2.json())
# OUT: {"count": 20, "size": null}

More options:

# Load from json string
f3 = Foo.parse_raw('{"count": 30}')

# Load from json file
f4 = Foo.parse_file("path/to/data.json")

# Load from list of dicts
f_list1 = parse_obj_as(list[Foo], [{"count": 110}, {"count": 120}])
print(f_list1)
# OUT: [Foo(count=110, size=None), Foo(count=120, size=None)]

# Load from list in json string
f_list2 = parse_raw_as(list[Foo], '[{"count": 130}, {"count": 140}]')
print(f_list2)
# OUT: [Foo(count=130, size=None), Foo(count=140, size=None)]

Complex hierarchical data structures

class Bar(BaseModel):
    apple = "x"
    banana = "y"


class Spam(BaseModel):
    foo: Foo
    bars: list[Bar]


m = Spam(foo={"count": 4}, bars=[{"apple": "x1"}, {"apple": "x2"}])
print(m)
# OUT: foo=Foo(count=4, size=None) bars=[Bar(apple='x1', banana='y'), Bar(apple='x2', banana='y')]

print(m.dict())
# OUT:
# {
#     'foo': {'count': 4, 'size': None},
#     'bars': [
#         {'apple': 'x1', 'banana': 'y'},
#         {'apple': 'x2', 'banana': 'y'},
#     ],
# }

Pydantic supports many standard types (like datetime) and special commonly used types (like EmailStr and HttpUrl):

from datetime import datetime
from pydantic import HttpUrl


class User(BaseModel):
    name = "John Doe"
    signup_ts: datetime = None
    url: HttpUrl = None


u1 = User(signup_ts="2017-07-14 00:00:00")
print(u1)
# OUT: signup_ts=datetime.datetime(2017, 7, 14, 0, 0) url=None name='John Doe'

u2 = User(url="http://example.com")
print(u2)
# OUT: signup_ts=None url=HttpUrl('http://example.com', ) name='John Doe'

u3 = User(url="ht://example.com")
# OUT:
#  ValidationError: 1 validation error for User
#  url
#    URL scheme not permitted (type=value_error.url.scheme; allowed_schemes={'http', 'https'})

If you really need to use json.dumps, write a Custom Encoder:

import json


class EnhancedJSONEncoder(json.JSONEncoder):
    def default(self, o):
        if isinstance(o, BaseModel):
            return o.dict()
        return super().default(o)


foo = Foo(count=20)
json.dumps([{"foo": foo}], cls=EnhancedJSONEncoder)
# OUT: '[{"foo": {"count": 20, "size": null}}]'
Porscheporsena answered 10/1, 2023 at 18:55 Comment(1)
Note that it's different in Pydantic V2. See migration guide: docs.pydantic.dev/2.4/migrationPorscheporsena
P
3

dataclass-wizard is a modern option that can work for you. It supports complex types such as date and time, most generics from the typing module, and also a nested dataclass structure.

The "new style" annotations introduced in PEPs 585 and 604 can be ported back to Python 3.7 via a __future__ import as shown below.

from __future__ import annotations  # This can be removed in Python 3.10
from dataclasses import dataclass, field
from dataclass_wizard import JSONWizard


@dataclass
class MyClass(JSONWizard):
    my_str: str | None
    is_active_tuple: tuple[bool, ...]
    list_of_int: list[int] = field(default_factory=list)


string = """
{
  "my_str": 20,
  "ListOfInt": ["1", "2", 3],
  "isActiveTuple": ["true", false, 1]
}
"""

instance = MyClass.from_json(string)
print(repr(instance))
# MyClass(my_str='20', is_active_tuple=(True, False, True), list_of_int=[1, 2, 3])

print(instance.to_json())
# '{"myStr": "20", "isActiveTuple": [true, false, true], "listOfInt": [1, 2, 3]}'

# True
assert instance == MyClass.from_json(instance.to_json())

You can install the Dataclass Wizard with pip:

$ pip install dataclass-wizard

A bit of background info:

For serialization, it uses a slightly modified (a bit more efficient) implementation of dataclasses.asdict. When de-serializing JSON to a dataclass instance, the first time it iterates over the dataclass fields and generates a parser for each annotated type, which makes it more efficient when the de-serialization process is run multiple times.

Disclaimer: I am the creator (and maintainer) of this library.

Parsimonious answered 30/8, 2021 at 15:29 Comment(0)
A
2

Simply use orjson.

import orjson
foo = Foo(x="bar")
orjson.dumps(foo).decode('utf-8')
Anyplace answered 20/8, 2023 at 10:46 Comment(0)
D
1

A much simpler answer can be found on Reddit using dictionary unpacking

>>> from dataclasses import dataclass
>>> @dataclass
... class MyData:
...   prop1: int
...   prop2: str
...   prop3: int
...
>>> d = {'prop1': 5, 'prop2': 'hi', 'prop3': 100}
>>> my_data = MyData(**d)
>>> my_data
MyData(prop1=5, prop2='hi', prop3=100)
Draughts answered 5/9, 2020 at 14:41 Comment(4)
This doesn't support nested dataclasses.Liking
This can introduce bug if the d comes from a json object. We often suppose that adding a field in a payload is not a breaking change, but here, if you add a field to d, it will break with "got an unexpected keyword argument 'prop4'"Deduction
As far as regarding the original question, this is the simplest correct answer.Eldridge
@Eldridge it doesn't even address the original question, which is how to serialize a dataclass to JSON.Somatotype
T
1

A dataclass providing json formating method

import json
from dataclasses import dataclass

@dataclass
class Foo:
    x: str
   
    def to_json(self):
      return json.dumps(self.__dict__)

Foo("bar").to_json()
>>> '{"x":"bar"}'
Tourer answered 30/9, 2022 at 15:22 Comment(0)
A
0

Okay so here is what I did when I was in similar situation.

  1. Create a custom dictionary factory that converts nested data classes into dictionary.

    def myfactory(data): return dict(x for x in data if x[1] is not None)

  2. If foo is your @dataclass, then simply provide your dictionary factory to use "myfactory()" method:

    fooDict = asdict(foo, dict_factory=myfactory)

  3. Convert fooDict to json

    fooJson = json.dumps(fooDict)

This should work !!

Assyriology answered 29/7, 2021 at 16:6 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.