Deserializing a huge json string to python objects

def _object_hook(dct): if '@CLASS' in dct: # server sends domain objects with this @CLASS clsname = dct['@CLASS'] # This is like Class.forName (This imports the module and gives the class) cls = get_class(clsname) # As my server is in java, I convert the attributes to python as per python naming convention. dct = dict( (convert_java_name_to_python(k), dct[k]) for k in dct.keys()) if cls != None: obj_key = None if "@uuid"in dct obj_key = dct["@uuid"] del(dct["@uuid"]) else: info("Class missing uuid: " + clsname) dct.pop("@CLASS", None) obj = cls(**dct) #This I found to be the most time consuming process. In my domian object, in the __init__ method I have the logic to set all attributes based on the kwargs passed if obj_key is not None: shared_objs[obj_key] = obj #I keep all uuids along with the objects in shared_objs dictionary. This shared_objs will be used later to replace references. else: warning("class not found: " + clsname) obj = dct return obj else: return dct

{"@CLASS":"sample.counter","@UUID":"86f26a0a-1a58-4429-a762- 9b1778a99c82","val1":"ABC","val2":1131,"val3":1754095,"value4": {"@CLASS":"sample.nestedClass","@UUID":"f7bb298c-fd0b-4d87-bed8- 74d5eb1d6517","id":1754095,"name":"XYZ","abbreviation":"ABC"}}

I don't know of any framework that offers what you seek out of the box, but you may apply a few optimizations to the way your class instance is setup.

Since unpacking the dictionary into keyword arguments and applying them to your class variables is taking the bulk of the time, you may consider passing the dct directly to your class __init__ and setting up the class dictionary cls.__dict__ with dct:

Trial 1

In [1]: data = {"name": "yolanda", "age": 4}

In [2]: class Person:
   ...:     def __init__(self, name, age):
   ...:         self.name = name
   ...:         self.age = age
   ...:
In [3]: %%timeit
   ...: Person(**data)
   ...:
1000000 loops, best of 3: 926 ns per loop

Trial 2

In [4]: data = {"name": "yolanda", "age": 4}

In [5]: class Person2:
   ....:     def __init__(self, data):
   ....:         self.__dict__ = data
   ....:
In [6]: %%timeit
   ....: Person2(data)
   ....:
1000000 loops, best of 3: 541 ns per loop

There will be no worries about the self.__dict__ being modified via another reference since the reference to dct is lost before _object_hook returns.

This will of course mean changing the set up of your __init__, with the attributes of your class strictly depending on the items in dct. It's up to you.

You may also replace cls != None with cls is not None (there is only one None object so an identity check is more pythonic):

Trial 1

In [38]: cls = 5
In [39]: %%timeit
   ....: cls != None
   ....:
10000000 loops, best of 3: 85.8 ns per loop

Trial 2

In [40]: %%timeit
   ....: cls is not None
   ....:
10000000 loops, best of 3: 57.8 ns per loop

And you can replace two lines with one with:

obj_key = dct["@uuid"]
del(dct["@uuid"])

becoming:

obj_key = dct.pop('@uuid') # Not an optimization as this is same with the above

On a scale of 800K domain objects, these would save you some good time on getting the object_hook to create your objects more quickly.

Recommended topics

Hot tags