Making object JSON serializable with regular encoder
Asked Answered
P

6

66

The regular way of JSON-serializing custom non-serializable objects is to subclass json.JSONEncoder and then pass a custom encoder to json.dumps().

It usually looks like this:

class CustomEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, Foo):
            return obj.to_json()

        return json.JSONEncoder.default(self, obj)

print(json.dumps(obj, cls=CustomEncoder))

What I'm trying to do, is to make something serializable with the default encoder. I looked around but couldn't find anything. My thought is that there would be some field in which the encoder looks at to determine the json encoding. Something similar to __str__. Perhaps a __json__ field. Is there something like this in python?

I want to make one class of a module I'm making to be JSON serializable to everyone that uses the package without them worrying about implementing their own [trivial] custom encoders.

Proulx answered 28/8, 2013 at 2:4 Comment(2)
I don't see anything like that in the source of the json module's encoder.py file.Tungstic
For those wondering why something like __json__ isn't already supported, see e.g. these discussions: cpython issue #79292, cpython issue #71549Gerianne
T
95

As I said in a comment to your question, after looking at the json module's source code, it does not appear to lend itself to doing what you want. However the goal could be achieved by what is known as monkey-patching (see question What is a monkey patch?). This could be done in your package's __init__.py initialization script and would affect all subsequent json module serialization since modules are generally only loaded once and the result is cached in sys.modules.

The patch changes the default json encoder's default method—the default default().

Here's an example implemented as a standalone module for simplicity's sake:

Module: make_json_serializable.py

""" Module that monkey-patches json module when it's imported so
JSONEncoder.default() automatically checks for a special "to_json()"
method and uses it to encode the object if found.
"""
from json import JSONEncoder

def _default(self, obj):
    return getattr(obj.__class__, "to_json", _default.default)(obj)

_default.default = JSONEncoder.default  # Save unmodified default.
JSONEncoder.default = _default # Replace it.

Using it is trivial since the patch is applied by simply importing the module.

Sample client script:

import json
import make_json_serializable  # apply monkey-patch

class Foo(object):
    def __init__(self, name):
        self.name = name
    def to_json(self):  # New special method.
        """ Convert to JSON format string representation. """
        return '{"name": "%s"}' % self.name

foo = Foo('sazpaz')
print(json.dumps(foo))  # -> "{\"name\": \"sazpaz\"}"

To retain the object type information, the special method can also include it in the string returned:

        return ('{"type": "%s", "name": "%s"}' %
                 (self.__class__.__name__, self.name))

Which produces the following JSON that now includes the class name:

"{\"type\": \"Foo\", \"name\": \"sazpaz\"}"

Magick Lies Here

Even better than having the replacement default() look for a specially named method, would be for it to be able to serialize most Python objects automatically, including user-defined class instances, without needing to add a special method. After researching a number of alternatives, the following — based on an answer by @Raymond Hettinger to another question — which uses the pickle module, seemed closest to that ideal to me:

Module: make_json_serializable2.py

""" Module that imports the json module and monkey-patches it so
JSONEncoder.default() automatically pickles any Python objects
encountered that aren't standard JSON data types.
"""
from json import JSONEncoder
import pickle

def _default(self, obj):
    return {'_python_object': pickle.dumps(obj)}

JSONEncoder.default = _default  # Replace with the above.

Of course everything can't be pickled—extension types for example. However there are ways defined to handle them via the pickle protocol by writing special methods—similar to what you suggested and I described earlier—but doing that would likely be necessary for a far fewer number of cases.

Deserializing

Regardless, using the pickle protocol also means it would be fairly easy to reconstruct the original Python object by providing a custom object_hook function argument on any json.loads() calls that used any '_python_object' key in the dictionary passed in, whenever it has one. Something like:

def as_python_object(dct):
    try:
        return pickle.loads(str(dct['_python_object']))
    except KeyError:
        return dct

pyobj = json.loads(json_str, object_hook=as_python_object)

If this has to be done in many places, it might be worthwhile to define a wrapper function that automatically supplied the extra keyword argument:

json_pkloads = functools.partial(json.loads, object_hook=as_python_object)

pyobj = json_pkloads(json_str)

Naturally, this could be monkey-patched it into the json module as well, making the function the default object_hook (instead of None).

I got the idea for using pickle from an answer by Raymond Hettinger to another JSON serialization question, whom I consider exceptionally credible as well as an official source (as in Python core developer).

Portability to Python 3

The code above does not work as shown in Python 3 because json.dumps() returns a bytes object which the JSONEncoder can't handle. However the approach is still valid. A simple way to workaround the issue is to latin1 "decode" the value returned from pickle.dumps() and then "encode" it from latin1 before passing it on to pickle.loads() in the as_python_object() function. This works because arbitrary binary strings are valid latin1 which can always be decoded to Unicode and then encoded back to the original string again (as pointed out in this answer by Sven Marnach).

(Although the following works fine in Python 2, the latin1 decoding and encoding it does is superfluous.)

from decimal import Decimal

class PythonObjectEncoder(json.JSONEncoder):
    def default(self, obj):
        return {'_python_object': pickle.dumps(obj).decode('latin1')}


def as_python_object(dct):
    try:
        return pickle.loads(dct['_python_object'].encode('latin1'))
    except KeyError:
        return dct


class Foo(object):  # Some user-defined class.
    def __init__(self, name):
        self.name = name

    def __eq__(self, other):
        if type(other) is type(self):  # Instances of same class?
            return self.name == other.name
        return NotImplemented

    __hash__ = None


data = [1,2,3, set(['knights', 'who', 'say', 'ni']), {'key':'value'},
        Foo('Bar'), Decimal('3.141592653589793238462643383279502884197169')]
j = json.dumps(data, cls=PythonObjectEncoder, indent=4)
data2 = json.loads(j, object_hook=as_python_object)
assert data == data2  # both should be same
Tungstic answered 1/9, 2013 at 17:35 Comment(5)
This is clearly a good (the?) solution. But it induces one (unavoidable) limitation: this patch should be imported to load such a serialized data. So what would be the rules to follow in order for the returned content to still be loadable by standard json (not completely obviously, but which does not fail)?Clemen
@Juh_: As long as the string returned by the to_json() method is valid JSON, it will be loadable by a standard json parser, regardless of whether the patch itself has been imported or not.Tungstic
Thank you @Tungstic for this code. By the way, do you think it's possible to subclass JSONEncoder to do this shortly : https://mcmap.net/q/41906/-pretty-print-json-dumps ?Endearment
@Basj: No, I don't think it's feasible to subclass JSONEncoder to pretty-print JSON dumps that way shown in your question.Tungstic
Thanks for a great answer.Breach
D
14

You can extend the dict class like so:

#!/usr/local/bin/python3
import json

class Serializable(dict):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        # hack to fix _json.so make_encoder serialize properly
        self.__setitem__('dummy', 1)

    def _myattrs(self):
        return [
            (x, self._repr(getattr(self, x))) 
            for x in self.__dir__() 
            if x not in Serializable().__dir__()
        ]

    def _repr(self, value):
        if isinstance(value, (str, int, float, list, tuple, dict)):
            return value
        else:
            return repr(value)

    def __repr__(self):
        return '<%s.%s object at %s>' % (
            self.__class__.__module__,
            self.__class__.__name__,
            hex(id(self))
        )

    def keys(self):
        return iter([x[0] for x in self._myattrs()])

    def values(self):
        return iter([x[1] for x in self._myattrs()])

    def items(self):
        return iter(self._myattrs())

Now to make your classes serializable with the regular encoder, extend 'Serializable':

class MySerializableClass(Serializable):

    attr_1 = 'first attribute'
    attr_2 = 23

    def my_function(self):
        print('do something here')


obj = MySerializableClass()

print(obj) will print something like:

<__main__.MySerializableClass object at 0x1073525e8>

print(json.dumps(obj, indent=4)) will print something like:

{
    "attr_1": "first attribute",
    "attr_2": 23,
    "my_function": "<bound method MySerializableClass.my_function of <__main__.MySerializableClass object at 0x1073525e8>>"
}
Delacourt answered 31/7, 2015 at 9:8 Comment(3)
# hack to fix _json.so make_encoder serialize properly <-- Whoa how does this work? Also if I set class members to other serializable classes, then the repr() value of the class is emitted instead of its dictLogographic
@Logographic Sorry, just saw this. I wrote this a long time back, and I haven't worked with python in quite a bit. As for the # hack ... I remember looking at the json.dumps implementation in the library code. If i remember correctly, It was using a native implementation _json.so. If that could not be used, it would fall back to the python implementation (which is possibly slower). _json.so would not serialize the class if there were no values in the underlying dict. So when I added the dummy value, it called items() or whatever like it should.Delacourt
and regarding your second problem, you may need to play around with __repr__ (or remove it altogether) or check if a nested value is a dict on serializing. Like I said, I'm a little rusty, I need to take a look at how json.dumps() works againDelacourt
A
5

I suggest putting the hack into the class definition. This way, once the class is defined, it supports JSON. Example:

import json

class MyClass( object ):

    def _jsonSupport( *args ):
        def default( self, xObject ):
            return { 'type': 'MyClass', 'name': xObject.name() }

        def objectHook( obj ):
            if 'type' not in obj:
                return obj
            if obj[ 'type' ] != 'MyClass':
                return obj
            return MyClass( obj[ 'name' ] )
        json.JSONEncoder.default = default
        json._default_decoder = json.JSONDecoder( object_hook = objectHook )

    _jsonSupport()

    def __init__( self, name ):
        self._name = name

    def name( self ):
        return self._name

    def __repr__( self ):
        return '<MyClass(name=%s)>' % self._name

myObject = MyClass( 'Magneto' )
jsonString = json.dumps( [ myObject, 'some', { 'other': 'objects' } ] )
print "json representation:", jsonString

decoded = json.loads( jsonString )
print "after decoding, our object is the first in the list", decoded[ 0 ]
Autoicous answered 5/9, 2013 at 23:38 Comment(1)
A notable limitation of this approach is that, as currently written, it doesn't play well with others in the sense that it wouldn't work to have more than one class using this approach at a time because otherwise they would step on each others JSON support code. Even if it did work in such a situation, it would require similar support code to be duplicated and placed in each class. It might be possible to fix both of these issues, however.Tungstic
M
1

The problem with overriding JSONEncoder().default is that you can do it only once. If you stumble upon anything a special data type that does not work with that pattern (like if you use a strange encoding). With the pattern below, you can always make your class JSON serializable, provided that the class field you want to serialize is serializable itself (and can be added to a python list, barely anything). Otherwise, you have to apply recursively the same pattern to your json field (or extract the serializable data from it):

# base class that will make all derivatives JSON serializable:
class JSONSerializable(list): # need to derive from a serializable class.

  def __init__(self, value = None):
    self = [ value ]

  def setJSONSerializableValue(self, value):
    self = [ value ]

  def getJSONSerializableValue(self):
    return self[1] if len(self) else None


# derive  your classes from JSONSerializable:
class MyJSONSerializableObject(JSONSerializable):

  def __init__(self): # or any other function
    # .... 
    # suppose your__json__field is the class member to be serialized. 
    # it has to be serializable itself. 
    # Every time you want to set it, call this function:
    self.setJSONSerializableValue(your__json__field)
    # ... 
    # ... and when you need access to it,  get this way:
    do_something_with_your__json__field(self.getJSONSerializableValue())


# now you have a JSON default-serializable class:
a = MyJSONSerializableObject()
print json.dumps(a)
Malatya answered 29/2, 2016 at 10:55 Comment(0)
A
0

I don't understand why you can't write a serialize function for your own class? You implement the custom encoder inside the class itself and allow "people" to call the serialize function that will essentially return self.__dict__ with functions stripped out.

edit:

This question agrees with me, that the most simple way is write your own method and return the json serialized data that you want. They also recommend to try jsonpickle, but now you're adding an additional dependency for beauty when the correct solution comes built in.

Actinomycosis answered 31/8, 2013 at 21:36 Comment(2)
I suspect the reason is because if the stock json.dumps() method (or json.JSONEncoder) were smarter one or the other would automatically look for a special object method, then it would be unnecessary to pass it a special encoder -- something which is not always possible. This is how the print statement/function works. It looks for a __str__() object method and uses it if one is found. This makes it very easy to print instances of classes, even when they're inside something else like a list or dict.Tungstic
@Tungstic he'll probably have to submit a change for the json module in the standard library to do it the way he wants. Looking at the source for json it's just not there for this.Actinomycosis
S
0

For production environment, prepare rather own module of json with your own custom encoder, to make it clear that you overrides something. Monkey-patch is not recommended, but you can do monkey patch in your testenv.

For example,

class JSONDatetimeAndPhonesEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (datetime.date, datetime.datetime)):
            return obj.date().isoformat()
        elif isinstance(obj, basestring):
            try:
                number = phonenumbers.parse(obj)
            except phonenumbers.NumberParseException:
                return json.JSONEncoder.default(self, obj)
            else:
                return phonenumbers.format_number(number, phonenumbers.PhoneNumberFormat.NATIONAL)
        else:
            return json.JSONEncoder.default(self, obj)

you want:

payload = json.dumps(your_data, cls=JSONDatetimeAndPhonesEncoder)

or:

payload = your_dumps(your_data)

or:

payload = your_json.dumps(your_data)

however in testing environment, go a head:

@pytest.fixture(scope='session', autouse=True)
def testenv_monkey_patching():
    json._default_encoder = JSONDatetimeAndPhonesEncoder()

which will apply your encoder to all json.dumps occurrences.

Stargell answered 15/1, 2019 at 15:40 Comment(2)
Unfortunately, this approach will not work for some standard classes like str, dict, tuple, etc.. It is not easy to understand it from the documentation: json.JSONEncoder [skip] Supports the following objects and types by default. [skip] To extend this to recognize **other** objects, subclass and implement a default() method. What a disappointment..Nippy
@NikO'Lai: I've added custom recognition for 2 types, as an example. any other type will work in default way and with default encoder, which is json.JSONEncoderSelimah

© 2022 - 2024 — McMap. All rights reserved.