Parsing a date in python without using a default
Asked Answered
M

4

17

I'm using python's dateutil.parser tool to parse some dates I'm getting from a third party feed. It allows specifying a default date, which itself defaults to today, for filling in missing elements of the parsed date. While this is in general helpful, there is no sane default for my use case, and I would prefer to treat partial dates as if I had not gotten a date at all (since it almost always means I got garbled data). I've written the following work around:

from dateutil import parser
import datetime

def parse_no_default(dt_str):
  dt = parser.parse(dt_str, default=datetime.datetime(1900, 1, 1)).date()
  dt2 = parser.parse(dt_str, default=datetime.datetime(1901, 2, 2)).date()
  if dt == dt2:
    return dt
  else:
    return None

(This snippet only looks at the date, as that's all I care about for my application, but similar logic could be extended to include the time component.)

I'm wondering (hoping) there's a better way of doing this. Parsing the same string twice just to see if it fills in different defaults seems like a gross waste of resources, to say the least.

Here's the set of tests (using nosetest generators) for the expected behavior:

import nose.tools
import lib.tools.date

def check_parse_no_default(sample, expected):
  actual = lib.tools.date.parse_no_default(sample)
  nose.tools.eq_(actual, expected)

def test_parse_no_default():
  cases = ( 
      ('2011-10-12', datetime.date(2011, 10, 12)),
      ('2011-10', None),
      ('2011', None),
      ('10-12', None),
      ('2011-10-12T11:45:30', datetime.date(2011, 10, 12)),
      ('10-12 11:45', None),
      ('', None),
      )   
  for sample, expected in cases:
    yield check_parse_no_default, sample, expected
Meaty answered 8/12, 2011 at 17:13 Comment(1)
Would be great if we could just say default=False...Melia
E
9

Depending on your domain following solution might work:

DEFAULT_DATE = datetime.datetime(datetime.MINYEAR, 1, 1)

def parse_no_default(dt_str):    
    dt = parser.parse(dt_str, default=DEFAULT_DATE).date()
    if dt != DEFAULT_DATE:
       return dt
    else:
       return None

Another approach would be to monkey patch parser class (this is very hackiesh, so I wouldn't recommend it if you have other options):

import dateutil.parser as parser
def parse(self, timestr, default=None,
          ignoretz=False, tzinfos=None,
          **kwargs):
    return self._parse(timestr, **kwargs)
parser.parser.parse = parse

You can use it as follows:

>>> ddd = parser.parser().parse('2011-01-02', None)
>>> ddd
_result(year=2011, month=01, day=02)
>>> ddd = parser.parser().parse('2011', None)
>>> ddd
_result(year=2011)

By checking which members available in result (ddd) you could determine when return None. When all fields available you can convert ddd into datetime object:

# ddd might have following fields:
# "year", "month", "day", "weekday",
# "hour", "minute", "second", "microsecond",
# "tzname", "tzoffset"
datetime.datetime(ddd.year, ddd.month, ddd.day)
Eunuchize answered 8/12, 2011 at 17:29 Comment(2)
That only solves the empty string case. When I have a partial date, it is still defaulting the fields not specified, but gets a different final date than the default. I've added some unit tests to the question to illustrate the requirements and where this example fails. Thanks for taking a look though!Meaty
Be careful, apparently in your first example you're comparing a date object with a datetime object. It's always going to be non-equal.Miffy
R
3

This is probably a "hack", but it looks like dateutil looks at very few attributes out of the default you pass in. You could provide a 'fake' datetime that explodes in the desired way.

>>> import datetime
>>> import dateutil.parser
>>> class NoDefaultDate(object):
...     def replace(self, **fields):
...         if any(f not in fields for f in ('year', 'month', 'day')):
...             return None
...         return datetime.datetime(2000, 1, 1).replace(**fields)
>>> def wrap_parse(v):
...     _actual = dateutil.parser.parse(v, default=NoDefaultDate())
...     return _actual.date() if _actual is not None else None
>>> cases = (
...   ('2011-10-12', datetime.date(2011, 10, 12)),
...   ('2011-10', None),
...   ('2011', None),
...   ('10-12', None),
...   ('2011-10-12T11:45:30', datetime.date(2011, 10, 12)),
...   ('10-12 11:45', None),
...   ('', None),
...   )
>>> all(wrap_parse(test) == expected for test, expected in cases)
True
Rourke answered 14/8, 2013 at 21:29 Comment(3)
Nice, clean hack even if it is a hack! +1Panek
Also reading kwargs of replace function i can find out which date elements were specified in the passed string. Only year, or year w/ month etc. Exactly what i needed.Dichlorodifluoromethane
This looked good but did not work for me currently. I modified the function like this and that seems to fix it: def wrap_parse(v): try: _actual = ... except AttributeError: _actual = NoneNoll
A
0

I ran into the exact same problem with dateutil, I wrote this function and figured I would post it for posterity's sake. Basically using the underlying _parse method like @ILYA Khlopotov suggests:

from dateutil.parser import parser
import datetime
from StringIO import StringIO

_CURRENT_YEAR = datetime.datetime.now().year
def is_good_date(date):
    try:
        parsed_date = parser._parse(parser(), StringIO(date))
    except:
        return None
    if not parsed_date: return None
    if not parsed_date.year: return None
    if parsed_date.year < 1890 or parsed_date.year > _CURRENT_YEAR: return None
    if not parsed_date.month: return None
    if parsed_date.month < 1 or parsed_date.month > 12: return None
    if not parsed_date.day: return None
    if parsed_date.day < 1 or parsed_date.day > 31: return None
    return parsed_date

The returned object isn't adatetime instance, but it has the .year, .month, and, .day attributes, which was good enough for my needs. I suppose you could easily convert it to a datetime instance.

Allege answered 14/8, 2013 at 20:34 Comment(0)
M
0

simple-date does this for you (it does try multiple formats, internally, but not as many as you might think, because the patterns it uses extend python's date patterns with optional parts, like regexps).

see https://github.com/andrewcooke/simple-date - but only python 3.2 and up (sorry).

it's more lenient than what you want by default:

>>> for date in ('2011-10-12', '2011-10', '2011', '10-12', '2011-10-12T11:45:30', '10-12 11:45', ''):
...   print(date)
...   try: print(SimpleDate(date).naive.datetime)
...   except: print('nope')
... 
2011-10-12
2011-10-12 00:00:00
2011-10
2011-10-01 00:00:00
2011
2011-01-01 00:00:00
10-12
nope
2011-10-12T11:45:30
2011-10-12 11:45:30
10-12 11:45
nope

nope

but you could specify your own format. for example:

>>> from simpledate import SimpleDateParser, invert
>>> parser = SimpleDateParser(invert('Y-m-d(%T| )?(H:M(:S)?)?'))
>>> for date in ('2011-10-12', '2011-10', '2011', '10-12', '2011-10-12T11:45:30', '10-12 11:45', ''):
...   print(date)
...   try: print(SimpleDate(date, date_parser=parser).naive.datetime)
...   except: print('nope')
... 
2011-10-12
2011-10-12 00:00:00
2011-10
nope
2011
nope
10-12
nope
2011-10-12T11:45:30
2011-10-12 11:45:30
10-12 11:45
nope

nope

ps the invert() just switches the presence of % which otherwise become a real mess when specifying complex date patterns. so here only the literal T character needs a % prefix (in standard python date formatting it would be the only alpha-numeric character without a prefix)

Maribeth answered 14/8, 2013 at 20:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.