How to convert to a Python datetime object with JSON.loads?
Asked Answered
Q

13

51

I have a string representation of a JSON object.

dumped_dict = '{"debug": false, "created_at": "2020-08-09T11:24:20"}'

When I call json.loads with this object;

json.loads(dumped_dict)

I get;

{'created_at': '2020-08-09T11:24:20', 'debug': False}

There is nothing wrong in here. However, I want to know if there is a way to convert the above object with json.loads to something like this:

{'created_at': datetime.datetime(2020, 08, 09, 11, 24, 20), 'debug': False}

Shortly, are we able to convert datetime strings to actual datetime.datetime objects while calling json.loads?

Quadriceps answered 9/1, 2012 at 18:43 Comment(2)
I also get dates as strings(in double quotes), is that because json does not have date datatype, the dates come as strings ?Unbroken
@Unbroken just saw this, but yes json is actually rather primitive. It knows only the types of string, number, bool, array, and map (think I got em all). so yes all dates/times must be transmitted as strings, but date types can also come as numbers - from time stamps.Achromatism
D
33

My solution so far:

>>> json_string = '{"last_updated": {"$gte": "Thu, 1 Mar 2012 10:00:49 UTC"}}'
>>> dct = json.loads(json_string, object_hook=datetime_parser)
>>> dct
{u'last_updated': {u'$gte': datetime.datetime(2012, 3, 1, 10, 0, 49)}}


def datetime_parser(dct):
    for k, v in dct.items():
        if isinstance(v, basestring) and re.search("\ UTC", v):
            try:
                dct[k] = datetime.datetime.strptime(v, DATE_FORMAT)
            except:
                pass
    return dct

For further reference on the use of object_hook: JSON encoder and decoder

In my case the json string is coming from a GET request to my REST API. This solution allows me to 'get the date right' transparently, without forcing clients and users into hardcoding prefixes like __date__ into the JSON, as long as the input string conforms to DATE_FORMAT which is:

DATE_FORMAT = '%a, %d %b %Y %H:%M:%S UTC'

The regex pattern should probably be further refined

PS: in case you are wondering, the json_string is a MongoDB/PyMongo query.

Delivery answered 24/5, 2012 at 8:58 Comment(6)
Please provide some feedback/suggestions other than a plain -1, so I can learn something at least :)Delivery
@NicolaIarocci looks like an awesome solution, however isn't this also forcing clients to hardcode a suffix " UTC" into their json?Dermot
You can remove the UTC test if you don't want that. I just didn't want to attempt a date conversion on every string in the payload (since I have many). Wether it is faster to re.search or perform date conversion remains to be seen though.Delivery
Note that this only works for dicts. If you have dates in a list, this won't work.Elaineelam
@NicolaIarocci You could just use .endswith(" UTC"), I haven't tested but I'm guessing that'd be faster than a regex. It'll only work if it's at the end of the string, but I'd consider that an advantage: I wouldn't want to decode a story which mentions a time into a datetime object.Coraleecoralie
isinstance(v, basestring) doesn't work in Python 3. But otherwise looks good.Opal
I
28

You need to pass an object_hook. From the documentation:

object_hook is an optional function that will be called with the result of any object literal decoded (a dict). The return value of object_hook will be used instead of the dict.

Like this:

import datetime
import json

def date_hook(json_dict):
    for (key, value) in json_dict.items():
        try:
            json_dict[key] = datetime.datetime.strptime(value, "%Y-%m-%dT%H:%M:%S")
        except:
            pass
    return json_dict

dumped_dict = '{"debug": false, "created_at": "2020-08-09T11:24:20"}'
loaded_dict = json.loads(dumped_dict, object_hook=date_hook)

If you also want to handle timezones you'll have to use dateutil instead of strptime.

Incorrigible answered 30/4, 2013 at 23:47 Comment(2)
Using try/catch as a control structure is not ideal.Outcry
Just want to point out that this will edit the passed-in json_dict in-place, i.e. loaded_dict will be a name pointing at the same data as dumped_dict so there's no point returning json_dict in the function if that was the intention (it probably is given this is for json.loads deserializing from a string). If that's not the intention then you need to construct a new dictionary, or deepcopy the function arg json_dict before editing. try/catch is fine as a control structure btw, there's heated debates about "look before you leap" vs "ask forgiveness not permission" yet both are valid.Dorking
A
9

Although it technically works just to give the an object hook function, I recommend to use a proper subclass of JSONDecoder as it is intended by the framework developers:

class _JSONDecoder(json.JSONDecoder):
    def __init__(self, *args, **kwargs):
        json.JSONDecoder.__init__(
            self, object_hook=self.object_hook, *args, **kwargs)

    def object_hook(self, obj):
        ret = {}
        for key, value in obj.items():
            if key in {'timestamp', 'whatever'}:
                ret[key] = datetime.fromisoformat(value) 
            else:
                ret[key] = value
        return ret

For the sake of completeness, here is the counterpart to the decoder, the custom JSONEncoder:

class _JSONEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (datetime.date, datetime.datetime, pd.Timestamp)):
            return obj.isoformat()
        return json.JSONEncoder.default(obj)

Both in action look like:

json_str = json.dumps({'timestamp': datetime.datetime.now()}, cls=_JSONEncoder)
d = json.loads(json_str, cls=_JSONDecoder)
Alar answered 3/9, 2021 at 8:17 Comment(1)
I believe in your JSONDecoder the *args must come before objecthook=self.object_hook, else you'll get an error with args coming after a kwarg.Generate
H
7

I would do the same as Nicola suggested with 2 changes:

  1. Use dateutil.parser instead of datetime.datetime.strptime
  2. Define explicitly which exceptions I want to catch. I generally recommend avoiding at all cost having an empty except:

Or in code:

import dateutil.parser

def datetime_parser(json_dict):
    for (key, value) in json_dict.items():
        try:
            json_dict[key] = dateutil.parser.parse(value)
        except (ValueError, AttributeError):
            pass
    return json_dict

str = "{...}"  # Some JSON with date
obj = json.loads(str, object_hook=datetime_parser)
print(obj)
Handrail answered 8/11, 2016 at 14:44 Comment(2)
Interesting direction to try. But looks a bit slow to run a datetime parse on every item in the json. Most items are not going to be datetime values.Opal
dateutil.parser.parse is indeed quite slow and should be a last resort for date hunting in unstructured data. If you know the format of your date, datetime.strptime is the way to go.Aerospace
W
3

The way that your question is put, there is no indication to json that the string is a date value. This is different than the documentation of json which has the example string:

'{"__complex__": true, "real": 1, "imag": 2}'

This string has an indicator "__complex__": true that can be used to infer the type of the data, but unless there is such an indicator, a string is just a string, and all you can do is to regexp your way through all strings and decide whether they look like dates.

In your case you should definitely use a schema if one is available for your format.

Whorish answered 9/1, 2012 at 19:15 Comment(2)
What exactly documentation of json proposes to use double-underscored names? I have seen __type, for example, but all those look like conventions with limited use.Erastianism
The example was taken from the json package documentation.Whorish
O
3

You could use regex to determine whether or not you want to convert a certain field to datetime like so:

def date_hook(json_dict):
    for (key, value) in json_dict.items():
        if type(value) is str and re.match('^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d*$', value):
            json_dict[key] = datetime.datetime.strptime(value, "%Y-%m-%dT%H:%M:%S.%f")
        elif type(value) is str and re.match('^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}$', value):
            json_dict[key] = datetime.datetime.strptime(value, "%Y-%m-%dT%H:%M:%S")
        else:
            pass

    return json_dict

Then you can reference the date_hook function using the object_hook parameter in your call to json.loads():

json_data = '{"token": "faUIO/389KLDLA", "created_at": "2016-09-15T09:54:20.564"}'
data_dictionary = json.loads(json_data, object_hook=date_hook)
Outcry answered 15/9, 2016 at 14:27 Comment(0)
U
2

The method implements recursive string search in date-time format

import json
from dateutil.parser import parse

def datetime_parser(value):
    if isinstance(value, dict):
        for k, v in value.items():
            value[k] = datetime_parser(v)
    elif isinstance(value, list):
        for index, row in enumerate(value):
            value[index] = datetime_parser(row)
    elif isinstance(value, str) and value:
        try:
            value = parse(value)
        except (ValueError, AttributeError):
            pass
    return value

json_to_dict = json.loads(YOUR_JSON_STRING, object_hook=datetime_parser)
Uzzia answered 4/9, 2019 at 9:28 Comment(1)
Can you please put some explanation. Thanks!Inversion
E
1

As far as I know there is no out of the box solution for this.

First of all, the solution should take into account json schema to correctly distinguish between strings and datetimes. To some extent you can guess schema with json schema inferencer (google for json schema inferencer github) and then fix the places which are really datetimes.

If the schema is known, it should be pretty easy to make a function, which parses json and substitutes string representations with datetime. Some inspiration for the code could perhaps be found from validictory product (and json schema validation could be also good idea).

Erastianism answered 9/1, 2012 at 18:58 Comment(0)
T
1

Inspired by Nicola's answer and adapted to python3 (str instead of basestring):

import re
from datetime import datetime
datetime_format = "%Y-%m-%dT%H:%M:%S"
datetime_format_regex = re.compile(r'^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}$')


def datetime_parser(dct):
    for k, v in dct.items():
        if isinstance(v, str) and datetime_format_regex.match(v):
            dct[k] = datetime.strptime(v, datetime_format)
    return dct

This avoids using a try/except mechanism. On OP's test code:

>>> import json
>>> json_string = '{"debug": false, "created_at": "2020-08-09T11:24:20"}'
>>> json.loads(json_string, object_hook=datetime_parser)
{'created_at': datetime.datetime(2020, 8, 9, 11, 24, 20), 'debug': False}

The regex and datetime_format variables can be easily adapted to fit other patterns, e.g. without the T in the middle.

To convert a string saved in isoformat (therefore stored with microseconds) back to a datetime object, refer to this question.

Tenon answered 6/9, 2017 at 13:11 Comment(0)
S
1

if you are looking for django json steriliser:

from django.utils.timezone import now
from django.core.serializers.json import DjangoJSONEncoder
from django.utils.dateparse import parse_datetime

dt = now()
sdt = json.dumps(dt.strftime('%Y-%m-%dT%H:%M:%S'))
ndt = parse_datetime(json.loads(sdt))
print(sdt)
# "2022-04-27T12:20:23"
print(ndt)
# 2022-04-27 12:20:23
Schlemiel answered 27/4, 2022 at 12:21 Comment(0)
A
0

The solutions that suggest creating a JSON encoder and decoder are all perfectly valid. The only thing I can see wrong with this is a slight performance impact, which might happen if you're scanning each JSON value to check to match against a date/time format.

Here's the approach I would take, using the dataclass-wizard library (note: it is designed to work for API responses actually)

  1. Use the included CLI utility to convert the JSON response to a dataclass schema. Note that the value of debug is encoded as a string here, so I'm passing -f so that it force-resolves to a Python bool type. Otherwise, it should appear as Union[bool, str], which is the default inferred type.

    $ echo '{"debug": "false", "created_at": "2020-08-09T11:24:20"}' | wiz gs -f
    

    Output, including the imports at the top (not shown):

    @dataclass
    class Data(JSONWizard):
        """
        Data dataclass
    
        """
        debug: bool
        created_at: datetime
    
  2. Now we can de-serialize the sample JSON string above into a Data object. Note that created_at should come as datetime type. Similarly with the value for debug, it should be decoded as bool.

    string = """{"debug": "false", "created_at": "2020-08-09T11:24:20"}"""
    
    c = Data.from_json(string)
    
    print(repr(c))
    
  3. Serialize it back to JSON. The datetime object should be converted back
    a string:

    print(c.to_json())
    # {"debug": false, "createdAt": "2020-08-09T11:24:20"}
    
Achromatism answered 29/9, 2021 at 23:44 Comment(2)
Pydantic also supports this, the parsing library used to help build fastapiStamata
Yep, agreed, pydantic is a good choice if you want data validation also.Achromatism
H
0

In most of the cases, this is a two way problem, if you make use of a custom encoder you'll probably want to have a custom decoder (and vice-versa). In this case the decoder should be able to parse the encoded data and return the original json object.

Below there's a ful excersise to convert python non-serializable objects to json using 2 different strategies:

  1. Patching the JSONEncoder class to serializa any class that implements a "json" method to serialize classes.
  2. Using a list of "Converters" methods to seralize specific python types.

in the example below, I serialize a Enum class using a custom json method as {enum.name: enum.value} dict, here the enun.value objects are non serializable types in python (date and tuple), by using the methods listed CONVERTERS we can convert these types to serializable types.

Once encoded, the custom_json_decoder method can be invoked to convert that json back to python primitive types. This script exaple below is complete, it should run "as is":

from enum import Enum
from dateutil.parser import parse as dtparse
from datetime import datetime
from datetime import date
from json import JSONEncoder
from json import loads as json_loads
from json import dumps as json_dumps


def wrapped_default(self, obj):
    json_parser = getattr(obj.__class__, "__json__", lambda x: x.__dict__)
    try:
        return json_parser(obj)
    except Exception:
        return wrapped_default.default(obj)


wrapped_default.default = JSONEncoder().default
JSONEncoder.default = wrapped_default

CONVERTERS = {
    "datetime": dtparse,
    "date": lambda x: datetime.strptime(x, "%Y%m%d").date(),
    "tuple": lambda x: tuple(x),
}


class RskJSONEncoder(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, date):
            return {"val": obj.strftime("%Y%m%d"), "pythontype": "date"}
        elif isinstance(obj, datetime):
            return {"val": obj.isoformat(), "pythontype": "datetime"}
        elif isinstance(obj, tuple):
            return {"val": list(obj), "pythontype": "tuple"}
        return super().default(obj)


def custom_json_decoder(obj):
    def json_hook(json_obj):
        try:
            return CONVERTERS[json_obj.pop("pythontype")](json_obj["val"])
        except Exception:
            res = json_obj
        return res

    return json_loads(obj, object_hook=json_hook)


def custom_json_encoder(obj):
    return json_dumps(obj, cls=RskJSONEncoder)


if __name__ == "__main__":

    class Test(Enum):
        A = date(2021, 1, 1)
        B = ("this", " is", " a", " tuple")

        def __json__(self):
            return {self.name: self.value}

    d = {"enum_date": Test.A, "enum_tuple": Test.B}
    this_is_json = custom_json_encoder(d)
    this_is_python_obj = custom_json_decoder(this_is_json)
    print(f"this is json, type={type(this_is_json)}\n", this_is_json)
    print(
        f"this is python, type={type(this_is_python_obj)}\n",
        this_is_python_obj,
    )
Hematuria answered 10/1, 2022 at 19:35 Comment(0)
F
0

While some time has passed since the initial question, I'd like to provide an alternative solution for the benefit of future visitors.

I've recently introduced a package on PyPI called pyjschema, which offers the capability to work with JSON schemas and convert JSON data into Pythonic types based on those schemas.

In your specific scenario, you can define a schema that specifies the data type for the created_at property. Subsequently, you can parse the JSON data in alignment with this schema:

from pyjschema import loads

schema = {
    'type': 'object', 
    'properties': {
        'created_at': {'type': 'string', 'format': 'date-time'}
    }
}
print(loads('{"debug": false, "created_at": "2020-08-09T11:24:20"}', schema))
# prints: {'created_at': datetime.datetime(2020, 8, 9, 11, 24, 20), 'debug': False}
Falconiform answered 19/9, 2023 at 20:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.