Mixing datetime.strptime() arguments
Asked Answered
Z

1

8

It is quite a common mistake to mix up the datetime.strptime() format string and date string arguments using:

datetime.strptime("%B %d, %Y", "January 8, 2014")

instead of the other way around:

datetime.strptime("January 8, 2014", "%B %d, %Y")

Of course, it would fail during the runtime:

>>> datetime.strptime("%B %d, %Y", "January 8, 2014")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/_strptime.py", line 325, in _strptime
    (data_string, format))
ValueError: time data '%B %d, %Y' does not match format 'January 8, 2014'

But, is it possible to catch this problem statically even before actually running the code? Is it something pylint or flake8 can help with?


I've tried the PyCharm code inspection, but both snippets don't issue any warnings. Probably, because both arguments have the same type - they both are strings which makes the problem more difficult. We would have to actually analyze if a string is a datetime format string or not. Also, the Language Injections PyCharm/IDEA feature looks relevant.

Zingaro answered 1/7, 2016 at 14:26 Comment(1)
alecxe, generally if we want to convert a string to datetime we'd use strptime() on particular string, other than strptime we can check string with regular expressions if the given string is in proper datetime format or not, but it would take more regular expression patterns to check.Badajoz
H
17

I claim that this cannot be checked statically in the general case.

Consider the following snippet:

d = datetime.strptime(read_date_from_network(), read_format_from_file())

This code may be completely valid, where both read_date_from_network and read_format_from_file really do return strings of the proper format -- or they may be total garbage, both returning None or some crap. Regardless, that information can only be determined at runtime -- hence, a static checker is powerless.


What's more, given the current definition of datetime.strptime, even if we were using a statically typed language, we wouldn't be able to catch this error (except in very specific cases) -- the reason being that the signature of this function doomed us from the start:

classmethod datetime.strptime(date_string, format)

in this definition, date_string and format are both strings, even though they actually have special meaning. Even if we had something analogous in a statically typed language like this:

public DateTime strpTime(String dateString, String format)

The compiler (and linter and everyone else) still only sees:

public DateTime strpTime(String, String)

Which means that none of the following are distinguishable from each other:

strpTime("%B %d, %Y", "January 8, 2014") // strpTime(String, String) CHECK
strpTime("January 8, 2014", "%B %d, %Y") // strpTime(String, String) CHECK
strpTime("cat", "bat") // strpTime(String, String) CHECK

This isn't to say that it can't be done at all -- there do exist some linters for statically typed languages such as Java/C++/etc. that will inspect string literals when you pass them to some specific functions (like printf, etc.), but this can only be done when you're calling that function directly with a literal format string. The same linters become just as helpless in the first case that I presented, because it's simply not yet known if the strings will be the right format.

i.e. A linter may be able to warn about this:

// Linter regex-es the first argument, sees %B et. al., warns you
strpTime("%B %d, %Y", "January 8, 2014")

but it would not be able to warn about this:

strpTime(scanner.readLine(), scanner.readLine())

Now, the same could be engineered into a python linter, but I don't believe that it would be very useful because functions are first-class, so I could easily defeat the (hypothetical python) linter by writing:

f = datetime.strptime
d = f("January 8, 2014", "%B %d, %Y")

And then we're pretty much hosed again.


Bonus: What Went Wrong

The problem here is that the datetime.strptime gives implicit meaning to each of these strings, but it doesn't surface that information to the type system. What could have been done was to give the two strings differing types -- then there could have been more safety, albeit at the expense of some ease-of-use.

e.g (using PEP 484 type annotations, a real thing!):

class DateString(str):
  pass

class FormatString(str):
  pass

class datetime(date):
  ...
  def strptime(date_string: DateString, format: FormatString) -> datetime:
    # etc. etc.

Then it would start to be feasible to provide good linting in the general case -- though the DateString and FormatString classes would need to take care of validating their input, because again, the type system can't do anything at that level.


Afterword:

I think the best way to deal with this is to avoid the problem by using the strftime method, which is bound to a specific datetime object and takes just a format string argument. That circumvents the entire problem by giving us a function signature that doesn't cut us when we hug it. Yay.

Heeler answered 6/7, 2016 at 1:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.