Preserve Python tuples with JSON
Asked Answered
E

6

66

I'm still a little new to this, so I might not know all the conventional terms for things:

Is it possible to preserve Python tuples when encoding with JSON? Right now json.loads(json.dumps(tuple)) gives me a list back. I don't want to convert my tuples to lists, but I want to use JSON. So, are there options?

The reason why: I'm creating an app that uses multi-dimensional arrays, not always the same shape. I've got some class methods that use recursion to probe the arrays and cast the endpoints as a string or int. I recently realized that (based on how my recursion works) I can use tuples to prevent deeper recursive searching of arrays (Python rawks). This could come in handy in situations where I know I for sure I won't need to be probing any deeper into my data structures.

Electrotype answered 30/3, 2013 at 17:28 Comment(0)
G
46

You can write a highly-specialzed encoder and a decoder hook:

import json

class MultiDimensionalArrayEncoder(json.JSONEncoder):
    def encode(self, obj):
        def hint_tuples(item):
            if isinstance(item, tuple):
                return {'__tuple__': True, 'items': item}
            if isinstance(item, list):
                return [hint_tuples(e) for e in item]
            if isinstance(item, dict):
                return {key: hint_tuples(value) for key, value in item.items()}
            else:
                return item

        return super(MultiDimensionalArrayEncoder, self).encode(hint_tuples(obj))

def hinted_tuple_hook(obj):
    if '__tuple__' in obj:
        return tuple(obj['items'])
    else:
        return obj


enc = MultiDimensionalArrayEncoder()
jsonstring =  enc.encode([1, 2, (3, 4), [5, 6, (7, 8)]])

print jsonstring

# [1, 2, {"items": [3, 4], "__tuple__": true}, [5, 6, {"items": [7, 8], "__tuple__": true}]]

print json.loads(jsonstring, object_hook=hinted_tuple_hook)

# [1, 2, (3, 4), [5, 6, (7, 8)]]
Goatsucker answered 30/3, 2013 at 17:58 Comment(7)
Nice. Quite similar to what pymongo does. To be complete, there should be also dict branch in encode.Pascoe
That why it's specialized :) OP's arrays don't seem to have dicts in them.Goatsucker
Thanks! It took me a minute to read the code, but I get it and that's exactly what I needed. That's the same way that I'm doing recursion on the multi-d arrays. I'm doing hooks "outside of" json right now though, so maybe I should read up on object_hooks.Electrotype
Doesn't seem to work on ((1, 2), 3) for instance. Replace return {'__tuple__': True, 'items': item} with return {'__tuple__': True, 'items': tuple(hint_tuples(e) for e in item)} maybe?Vannie
does this mean if you insert a tuple inside a dictionary and then try to save it as json, it won't work?Odette
Shouldn't the "tuple hinting" line be return {'__tuple__': True, 'items': [hint_tuples(e) for e in item]} ?Singband
@Singband Yes, I just checked. Without that, this will not serialize nested tuples.Roseannaroseanne
S
30

Nope, it's not possible. There is no concept of a tuple in the JSON format (see here for a concise breakdown of what types exist in JSON). Python's json module converts Python tuples to JSON lists because that's the closest thing in JSON to a tuple.

You haven't given much detail of your use case here, but if you need to store string representations of data structures that include tuples, a few possibilities immediately come to mind, which may or may not be appropriate depending upon your situation:

  1. Create your own encoding and decoding functions
  2. Use pickle (careful; pickle.loads isn't safe to use on user-provided input).
  3. Use repr and ast.literal_eval instead of json.dumps and json.loads. repr will give you output reasonably similar in appearance to json.dumps, but repr will not convert tuples to lists. ast.literal_eval is a less powerful, more secure version of eval which will only decode strings, numbers, tuples, lists, dicts, booleans, and None.

Option 3 is probably the easiest and simplest solution for you.

Stickup answered 30/3, 2013 at 17:32 Comment(2)
The repr/ast.literal_eval seems like such a good solution, I'm surprised it doesn't have some recognised shorthand - such a tuple-enabled dialect deserves its own name - pyson (pson) perhaps?Meacham
@ThomasKimber Heh. Given that JSON is short for "JavaScript object notation", PyON (short for "Python Object Notation") seems like it'd be a better name - your "PSON" suggestion makes it sound like there's some language named PythonScript. If I could snap my fingers and make PyON a recognised term, I would, but unfortunately I do not have the same influence as thought leaders like Crockford. ;)Stickup
H
7

The principal difference between python lists and tuples is mutability, which is irrelevant to JSON representations, as long as you're not contemplating modifying the internal members of the JSON list while it's in text form. You can just turn the lists you get back into tuples. If you're not using any custom object decoders, the only structured datatypes you have to consider are JSON objects and arrays, which come out as python dicts and lists.

def tuplify(listything):
    if isinstance(listything, list): return tuple(map(tuplify, listything))
    if isinstance(listything, dict): return {k:tuplify(v) for k,v in listything.items()}
    return listything

If you are specializing the decoding, or want some JSON arrays to be python lists and others to be python tuples, you'll need to wrap data items in a dict or tuple that annotates type information. This in itself is a better way to influence an algorithm's control flow than branching based on whether something is a list or tuple (or some other iterable type).

Hepler answered 13/8, 2014 at 19:41 Comment(1)
"The principle difference" --> "The principal difference" "while its in text form" --> "while it's in text form"Vinaya
U
4

It is with simplejson

import simplejson

def _to_json(python_object) :
    if isinstance(python_object, tuple) :
        python_object = {'__class__': 'tuple',
                         '__value__': list(python_object)}
    else :
        raise TypeError(repr(python_object) + ' is not JSON serializable') 

    return python_object

def _from_json(json_object):                                   
    if json_object['__class__'] == 'tuple':
        return tuple(json_object['__value__'])
    return json_object


jsn = simplejson.dumps((1,2,3), 
                       default=_to_json, 
                       tuple_as_array=False)

tpl = simplejson.loads(jsn, object_hook=_from_json)
Unhopedfor answered 28/4, 2015 at 2:9 Comment(0)
C
1

Pavel Anossov answered the question well. To encode objects such as tuples the code works. Having tuples as Python dict keys is also useful, and the code above does not handle tuples as dict keys. To manage tuples as keys, a boolean flag signifying if tuple is a dict key can be used and tuple gets wrapped in a layer of json.dumps(...) output; during decode, json gets taken care of by recursion.

Solution can allow to pass data structures of tuple vs value that enables easier hashing. Python def __hash__(self): frequently returns the hash of tuple of items in an object, and sometimes it is useful to have simpler data structures not wrapped in classes.

  1. hint_tuples can have a named argument dict_key -- flag for tuple being a dict key. Python dict type cannot be a key to dict, better to turn it to a string with json.dumps(...), during decoding this should be restored to dict and recursion should take care of it being turned into tuple.
  2. optionally __tuple__ can be obfuscated so that in case someone encodes a string __tuple__ as part of dict key it can pass through encoder / decoder.

Code below is what I came up with to take care of encoding tuples in Python dict keys. A couple of basic tests are included as part of __main__ to demonstrate the solution. Readability of encoding output is forgone to increase the number of cases that pass through the solution.

    # Pavel Anossov's solution hinted this:
    
    import json
    tuple_signifier = '__tuple__s_i_g_n_i_f_i_e_r__'
    
    class StreamTuple(dict):
         def __hash__(self):
             return hash(str(self))
    
    class MultiDimensionalArrayEncoder(json.JSONEncoder):
        def encode(self, obj):
            def hint_tuples(item, dict_key=False):
                global tuple_signifier
                ret_val = None
                if isinstance(item, tuple):
                    if dict_key:
                        ret_val = json.dumps(dict(
                            [(
                                tuple_signifier,
                                json.dumps(hint_tuples(list(item))),
                            ),],
                        ))
                    else:
                        ret_val = dict(
                            [(
                                tuple_signifier,
                                json.dumps(hint_tuples(list(item))),
                            ),],
                        )
    
                elif isinstance(item, list):
                    ret_val = [hint_tuples(e) for e in item]
                elif isinstance(item, dict):
                    ret_val = dict([
                        (hint_tuples(key, dict_key=True), hint_tuples(value))
                        for key, value in item.items()
                    ])
                else:
                    ret_val = item
                return ret_val
            return super(MultiDimensionalArrayEncoder, self).\
                         encode(hint_tuples(obj))
    
    
    def hinted_tuple_hook(obj):
        global tuple_signifier
    
        ret_val = {}
        if tuple_signifier in obj:
            ret_val = tuple(json.loads(obj[tuple_signifier], object_hook=hinted_tuple_hook,))
        else:
            for k, v in obj.items():
                inner_k = k
                inner_v = v
                if isinstance(k, str) and tuple_signifier in k:
                    inner_k = json.loads(k, object_hook=hinted_tuple_hook,)
                if isinstance(v, str) and tuple_signifier in v:
                    inner_v = json.loads(v, object_hook=hinted_tuple_hook,)
                ret_val[inner_k] = inner_v
        return ret_val
    
    #
    # Some tests that show how to use the above hinted tuple hook to encode 
    # / decode Python tuples.
    #
    if __name__ == '__main__':
        enc = MultiDimensionalArrayEncoder()
        test_input_1 = (2,)
        test_input_2 = {(2,): 'a'}
        test_input_3 = {'a': {(2,): {1:'a'}}}
        print('test_input_1 encoded:', enc.encode(test_input_1), test_input_1)
        print('test_input_1 decoded:',
            json.loads(enc.encode(test_input_1),
                object_hook=hinted_tuple_hook,)
        )
    #"""
        print('test_input_2 encoded:', enc.encode(test_input_2))
        print('test_input_2 decoded:',
            json.loads(enc.encode(test_input_2),
                object_hook=hinted_tuple_hook,)
        )
    
        print('\n' * 3)
        print('test_input_3 encoded:', enc.encode(test_input_3))
        print('test_input_3 decoded:',
            json.loads(enc.encode(test_input_3),
                object_hook=hinted_tuple_hook,)
        )
    
        print('\n' * 3)
        test_input_4 = {'a': 'b'}
        print('test_input_4  encoded:', enc.encode(test_input_4))
        print('test_input_4 decoded:',
            json.loads(enc.encode(test_input_4),
                object_hook=hinted_tuple_hook,)
        )
    
        #"""
Crandell answered 12/9, 2020 at 12:2 Comment(1)
This is great, but I think the point of using json is for output to be consistent and human readable. This makes for pretty unreadable jsonTarr
F
0

How about this example:

import json

def serialize(obj):
    if isinstance(obj, tuple):
        return {'__tuple__': True, 'items': list(obj)}
    elif isinstance(obj, list):
        return [serialize(item) for item in obj]
    elif isinstance(obj, dict):
        return {key: serialize(value) for key, value in obj.items()}
    else:
        return obj

def deserialize(obj):
    if isinstance(obj, list):
        return [deserialize(item) for item in obj]
    elif isinstance(obj, dict):
        if '__tuple__' in obj:
            return tuple(obj['items'])
        else:
            return {key: deserialize(value) for key, value in obj.items()}
    else:
        return obj

original_dict = {'tuple_key': [(1, 2, 3), 4], 'nested': {'key': (5, 6, 7)}}

json_data = json.dumps(serialize(original_dict))

decoded_dict = deserialize(json.loads(json_data))

# Print the results
Original Dictionary: {'tuple_key': [(1, 2, 3), 4], 'nested': {'key': (5, 6, 7)}}
Serialized JSON Data: {"tuple_key": {"__tuple__": true, "items": [1, 2, 3]}, "nested": {"key": {"__tuple__": true, "items": [5, 6, 7]}}}
Decoded Dictionary: {'tuple_key': [(1, 2, 3), 4], 'nested': {'key': (5, 6, 7)}}
Fleshly answered 21/12, 2023 at 11:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.