SimpleJSON and NumPy array
Asked Answered
A

9

43

What is the most efficient way of serializing a numpy array using simplejson?

Astrahan answered 15/8, 2010 at 19:42 Comment(2)
Related and simple solution by explicitly passing a default handler for non-serializable objects.Volant
Yet another answer here: #26646862Contrasty
P
28

I'd use simplejson.dumps(somearray.tolist()) as the most convenient approach (if I was still using simplejson at all, which implies being stuck with Python 2.5 or earlier; 2.6 and later have a standard library module json which works the same way, so of course I'd use that if the Python release in use supported it;-).

In a quest for greater efficiency, you could subclass json.JSONEncoder (in json; I don't know if the older simplejson already offered such customization possibilities) and, in the default method, special-case instances of numpy.array by turning them into list or tuples "just in time". I kind of doubt you'd gain enough by such an approach, in terms of performance, to justify the effort, though.

Parasol answered 15/8, 2010 at 20:0 Comment(1)
JSONEncoder's default method must return a serializable object, so it will be the same as returning somearray.tolist(). If you want something more fast you have to encode it yourself element by element.Sb
M
80

In order to keep dtype and dimension try this:

import base64
import json
import numpy as np

class NumpyEncoder(json.JSONEncoder):

    def default(self, obj):
        """If input object is an ndarray it will be converted into a dict 
        holding dtype, shape and the data, base64 encoded.
        """
        if isinstance(obj, np.ndarray):
            if obj.flags['C_CONTIGUOUS']:
                obj_data = obj.data
            else:
                cont_obj = np.ascontiguousarray(obj)
                assert(cont_obj.flags['C_CONTIGUOUS'])
                obj_data = cont_obj.data
            data_b64 = base64.b64encode(obj_data)
            return dict(__ndarray__=data_b64,
                        dtype=str(obj.dtype),
                        shape=obj.shape)
        # Let the base class default method raise the TypeError
        super(NumpyEncoder, self).default(obj)


def json_numpy_obj_hook(dct):
    """Decodes a previously encoded numpy ndarray with proper shape and dtype.

    :param dct: (dict) json encoded ndarray
    :return: (ndarray) if input was an encoded ndarray
    """
    if isinstance(dct, dict) and '__ndarray__' in dct:
        data = base64.b64decode(dct['__ndarray__'])
        return np.frombuffer(data, dct['dtype']).reshape(dct['shape'])
    return dct

expected = np.arange(100, dtype=np.float)
dumped = json.dumps(expected, cls=NumpyEncoder)
result = json.loads(dumped, object_hook=json_numpy_obj_hook)


# None of the following assertions will be broken.
assert result.dtype == expected.dtype, "Wrong Type"
assert result.shape == expected.shape, "Wrong Shape"
assert np.allclose(expected, result), "Wrong Values"
Manful answered 23/6, 2014 at 21:46 Comment(14)
Agreed, this solution works in general for nested arrays, IE a dictionary of dictionary of arrays. #27910158Frig
Can you adopt this to work with recarrays? dtype=str(obj.dtype) truncates the list of the recarray dtype into a string, which cannot be correctly recovered upon reconstruction, without conversion to string (i.e. dtype=obj.dtype) I get a circular reference exception :-(Wernher
This encodes the values of the array safely, which is good. However, if you want the values in the resulting JSON to be human-readable, you can consider leaving out the base64 library and simply convert to list. One could do data_json = cont_obj.tolist() in the encoder, np.array(dct['__ndarray__'], dct['dtype']).reshape(dct['shape']) in the decoder.Prowel
Hey, this fails on python3 on "RuntimeError: maximum recursion depth exceeded". Does anyone know why?Warfore
@Warfore I was getting the recursion error until I added the check for np.generic suggested by the answer from ankostis https://mcmap.net/q/378014/-simplejson-and-numpy-array.Sworn
@Community This was edited for C_CONTIGUOUS similar to my answer to https://mcmap.net/q/390168/-json-encoder-and-decoder-for-complex-numpy-arrays. When I looked at this, I thought np.ascontiguousarray() was a no op for C_CONTIGUOUS, making the if/else check unnecessary compared to simply always calling np.ascontiguousarray(). Am I correct?Sworn
^ yep, you are correct. as stated in numpy doc docs.scipy.org/doc/numpy-1.10.0/reference/generated/… it is C_CONTIGUOUS afterwards and has no effect if it was apriori. so obj_data = np.ascontiguousarray(obj) would also be fine. thanks for that hint.Manful
To fix the infinite recursion problem I changed return json.JSONEncoder(self, obj) to super(JsonNumpy, self).default(obj)Encephalitis
@Encephalitis but what is JsonNumpy?Paperhanger
@Paperhanger Cut-n-paste typo. Should be NumpyEncoder.Encephalitis
I encounter an issue with decoding the following dict: {"-0.1186": {"__ndarray__": When falls through to the super statement, the code logs an error >NumpyEncoder fell through for type <class 'bytes'>. 08April2019_10:02:11 GenerateControlTable.py ERROR: Exception writing JSON file: Inappropriate argument type. Traceback (most recent call last): TypeError: Object of type 'bytes' is not JSON serializableEruptive
Numpy::Encoder is encoding the obj.data as data_b64. From documentation, base64.b64encode(s, altchars=None): Encode the bytes-like object s using Base64 and return the encoded bytes. Using Python 3.6. The encoder.py seems to expect a strfloat() conversion somewere in the procedure.Eruptive
data_b64 = repr(base64.b64encode(np.ascontiguousarray(obj).data)) seems to work, not sure I can get the array elements back from the file with the hook as written. {"-0.1186": {"__ndarray__": "b'mpmZmZmZuT/dtYR8...Eruptive
Nope, data = base64.b64decode(dct['__ndarray__']) is not decoding the serialized bytes correctly. First three elements, data[0] = 110, data[1] = 106, data[3] = 102. Should be =0.1, =0.1004, =0.1009.Eruptive
P
28

I'd use simplejson.dumps(somearray.tolist()) as the most convenient approach (if I was still using simplejson at all, which implies being stuck with Python 2.5 or earlier; 2.6 and later have a standard library module json which works the same way, so of course I'd use that if the Python release in use supported it;-).

In a quest for greater efficiency, you could subclass json.JSONEncoder (in json; I don't know if the older simplejson already offered such customization possibilities) and, in the default method, special-case instances of numpy.array by turning them into list or tuples "just in time". I kind of doubt you'd gain enough by such an approach, in terms of performance, to justify the effort, though.

Parasol answered 15/8, 2010 at 20:0 Comment(1)
JSONEncoder's default method must return a serializable object, so it will be the same as returning somearray.tolist(). If you want something more fast you have to encode it yourself element by element.Sb
F
17

I found this json subclass code for serializing one-dimensional numpy arrays within a dictionary. I tried it and it works for me.

class NumpyAwareJSONEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, numpy.ndarray) and obj.ndim == 1:
            return obj.tolist()
        return json.JSONEncoder.default(self, obj)

My dictionary is 'results'. Here's how I write to the file "data.json":

j=json.dumps(results,cls=NumpyAwareJSONEncoder)
f=open("data.json","w")
f.write(j)
f.close()
Funereal answered 28/4, 2012 at 20:51 Comment(4)
This approach also works when you have a numpy array nested inside of a dict. This answer (I think) implied what I just said, but it's an important point.Bucky
This did not work for me. I had to use return obj.tolist() instead of return [x for x in obj].Marisamariscal
I prefer using numpy's object to list - it should be faster to have numpy iterate through the list as opposed to having python iterate through.Gardy
what's the point of and obj.ndim == 1 ? this works even without this constraintChadbourne
N
13

This shows how to convert from a 1D NumPy array to JSON and back to an array:

try:
    import json
except ImportError:
    import simplejson as json
import numpy as np

def arr2json(arr):
    return json.dumps(arr.tolist())
def json2arr(astr,dtype):
    return np.fromiter(json.loads(astr),dtype)

arr=np.arange(10)
astr=arr2json(arr)
print(repr(astr))
# '[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]'
dt=np.int32
arr=json2arr(astr,dt)
print(repr(arr))
# array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Building on tlausch's answer, here is a way to JSON-encode a NumPy array while preserving shape and dtype of any NumPy array -- including those with complex dtype.

class NDArrayEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, np.ndarray):
            output = io.BytesIO()
            np.savez_compressed(output, obj=obj)
            return {'b64npz' : base64.b64encode(output.getvalue())}
        return json.JSONEncoder.default(self, obj)


def ndarray_decoder(dct):
    if isinstance(dct, dict) and 'b64npz' in dct:
        output = io.BytesIO(base64.b64decode(dct['b64npz']))
        output.seek(0)
        return np.load(output)['obj']
    return dct

# Make expected non-contiguous structured array:
expected = np.arange(10)[::2]
expected = expected.view('<i4,<f4')

dumped = json.dumps(expected, cls=NDArrayEncoder)
result = json.loads(dumped, object_hook=ndarray_decoder)

assert result.dtype == expected.dtype, "Wrong Type"
assert result.shape == expected.shape, "Wrong Shape"
assert np.array_equal(expected, result), "Wrong Values"
Numerous answered 15/8, 2010 at 20:22 Comment(0)
P
4

I just discovered tlausch's answer to this Question and realized it gives the almost correct answer for my problem, but at least for me it does not work in Python 3.5, because of several errors: 1 - infinite recursion 2 - the data was saved as None

since i can not directly comment on the original answer yet, here is my version:

import base64
import json
import numpy as np

    class NumpyEncoder(json.JSONEncoder):
        def default(self, obj):
            """If input object is an ndarray it will be converted into a dict
            holding dtype, shape and the data, base64 encoded.
            """
            if isinstance(obj, np.ndarray):
                if obj.flags['C_CONTIGUOUS']:
                    obj_data = obj.data
                else:
                    cont_obj = np.ascontiguousarray(obj)
                    assert(cont_obj.flags['C_CONTIGUOUS'])
                    obj_data = cont_obj.data
                data_b64 = base64.b64encode(obj_data)
                return dict(__ndarray__= data_b64.decode('utf-8'),
                            dtype=str(obj.dtype),
                            shape=obj.shape)


    def json_numpy_obj_hook(dct):
        """Decodes a previously encoded numpy ndarray with proper shape and dtype.

        :param dct: (dict) json encoded ndarray
        :return: (ndarray) if input was an encoded ndarray
        """
        if isinstance(dct, dict) and '__ndarray__' in dct:
            data = base64.b64decode(dct['__ndarray__'])
            return np.frombuffer(data, dct['dtype']).reshape(dct['shape'])
        return dct

expected = np.arange(100, dtype=np.float)
dumped = json.dumps(expected, cls=NumpyEncoder)
result = json.loads(dumped, object_hook=json_numpy_obj_hook)


# None of the following assertions will be broken.
assert result.dtype == expected.dtype, "Wrong Type"
assert result.shape == expected.shape, "Wrong Shape"
assert np.allclose(expected, result), "Wrong Values"    
Poncho answered 27/6, 2017 at 8:52 Comment(1)
The solution has worked for me by replacing result = json.loads(dumped, object_hook=json_numpy_obj_hook) by result = json.load(dumped, object_hook=NumpyEncoder.json_numpy_obj_hook)Jubbah
D
3

If you want to apply Russ's method to n-dimensional numpy arrays you can try this

class NumpyAwareJSONEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, numpy.ndarray):
            if obj.ndim == 1:
                return obj.tolist()
            else:
                return [self.default(obj[i]) for i in range(obj.shape[0])]
        return json.JSONEncoder.default(self, obj)

This will simply turn a n-dimensional array into a list of lists with depth "n". To cast such lists back into a numpy array, my_nparray = numpy.array(my_list) will work regardless of the list "depth".

Diamond answered 16/12, 2014 at 19:34 Comment(0)
W
2

Improving On Russ's answer, I would also include the np.generic scalars:

class NumpyAwareJSONEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, np.ndarray) and obj.ndim == 1:
                return obj.tolist()
        elif isinstance(obj, np.generic):
            return obj.item()
        return json.JSONEncoder.default(self, obj)
Worms answered 24/1, 2014 at 1:44 Comment(0)
V
2

You can also answer this with just a function passed into json.dumps in this way:

json.dumps(np.array([1, 2, 3]), default=json_numpy_serializer)

With

import numpy as np

def json_numpy_serialzer(o):
    """ Serialize numpy types for json

    Parameters:
        o (object): any python object which fails to be serialized by json

    Example:

        >>> import json
        >>> a = np.array([1, 2, 3])
        >>> json.dumps(a, default=json_numpy_serializer)

    """
    numpy_types = (
        np.bool_,
        # np.bytes_, -- python `bytes` class is not json serializable     
        # np.complex64,  -- python `complex` class is not json serializable  
        # np.complex128,  -- python `complex` class is not json serializable
        # np.complex256,  -- special handling below
        # np.datetime64,  -- python `datetime.datetime` class is not json serializable
        np.float16,
        np.float32,
        np.float64,
        # np.float128,  -- special handling below
        np.int8,
        np.int16,
        np.int32,
        np.int64,
        # np.object_  -- should already be evaluated as python native
        np.str_,
        np.timedelta64,
        np.uint8,
        np.uint16,
        np.uint32,
        np.uint64,
        np.void,
    )

    if isinstance(o, np.ndarray):
        return o.tolist()
    elif isinstance(o, numpy_types):        
        return o.item()
    elif isinstance(o, np.float128):
        return o.astype(np.float64).item()
    # elif isinstance(o, np.complex256): -- no python native for np.complex256
    #     return o.astype(np.complex128).item() -- python `complex` class is not json serializable 
    else:
        raise TypeError("{} of type {} is not JSON serializable".format(repr(o), type(o)))

validated:

need_addition_json_handeling = (
    np.bytes_,
    np.complex64,  
    np.complex128, 
    np.complex256, 
    np.datetime64,
    np.float128,
)


numpy_types = tuple(set(np.typeDict.values()))

for numpy_type in numpy_types:
    print(numpy_type)

    if numpy_type == np.void:
        # complex dtypes evaluate as np.void, e.g.
        numpy_type = np.dtype([('name', np.str_, 16), ('grades', np.float64, (2,))])
    elif numpy_type in need_addition_json_handeling:
        print('python native can not be json serialized')
        continue

    a = np.ones(1, dtype=nptype)
    json.dumps(a, default=json_numpy_serialzer)
Vervet answered 14/9, 2016 at 13:10 Comment(0)
K
1

One fast, though not truly optimal way is using Pandas:

import pandas as pd
pd.Series(your_array).to_json(orient='values')
Klong answered 26/6, 2017 at 1:54 Comment(1)
This only seems to work for 1D arrays, however pd.DataFrame(your_array).to_json(orient='values') seems to work for 2D arrays.Recognizee

© 2022 - 2024 — McMap. All rights reserved.