Parsing date with timezone from an email?
Asked Answered
U

8

42

I am trying to retrieve date from an email. At first it's easy:

message = email.parser.Parser().parse(file)
date = message['Date']
print date

and I receive:

'Mon, 16 Nov 2009 13:32:02 +0100'

But I need a nice datetime object, so I use:

datetime.strptime('Mon, 16 Nov 2009 13:32:02 +0100', '%a, %d %b %Y %H:%M:%S %Z')

which raises ValueError, since %Z isn't format for +0100. But I can't find proper format for timezone in the documentation, there is only this %Z for zone. Can someone help me on that?

Underhung answered 24/11, 2009 at 15:27 Comment(0)
T
43

email.utils has a parsedate() function for the RFC 2822 format, which as far as I know is not deprecated.

>>> import email.utils
>>> import time
>>> import datetime
>>> email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0100')
(2009, 11, 16, 13, 32, 2, 0, 1, -1)
>>> time.mktime((2009, 11, 16, 13, 32, 2, 0, 1, -1))
1258378322.0
>>> datetime.datetime.fromtimestamp(1258378322.0)
datetime.datetime(2009, 11, 16, 13, 32, 2)

Please note, however, that the parsedate method does not take into account the time zone and time.mktime always expects a local time tuple.

>>> (time.mktime(email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0900')) ==
... time.mktime(email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0100'))
True

So you'll still need to parse out the time zone and take into account the local time difference, too:

>>> REMOTE_TIME_ZONE_OFFSET = +9 * 60 * 60
>>> (time.mktime(email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0900')) +
... time.timezone - REMOTE_TIME_ZONE_OFFSET)
1258410122.0
Tuinal answered 24/11, 2009 at 15:42 Comment(4)
Yep, those functions seems to have been moved to utils and email is fine to use. Thanks.Underhung
That won't yield an accurate value. time.mktime assumes a local time tuple, and the parsedate function does not take into account the time zone:time.mktime(email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0900')) == time.mktime(email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0100')) returns True. Tagging @Underhung in case he's relying on this method.Gametocyte
mktime + timezone may produce wrong values for past dates or if the timezone has DST transitions: time.timezone != time.altzone. Use tt = parsedate_tz(date_str); timestamp = calendar.timegm(tt) - tt[9] instead.Dade
In more recent versions of python you can also use email.utils.parsedate_to_datetimeSubclass
B
38

Use email.utils.parsedate_tz(date):

msg=email.message_from_file(open(file_name))
date=None
date_str=msg.get('date')
if date_str:
    date_tuple=email.utils.parsedate_tz(date_str)
    if date_tuple:
        date=datetime.datetime.fromtimestamp(email.utils.mktime_tz(date_tuple))
if date:
    ... # valid date found
Belayneh answered 24/11, 2009 at 15:27 Comment(3)
mktime_tz may fail on Python before 2.7.4 if the local timezone had different UTC offset at date_tuple. Use calendar.timegm() directly in this case.Dade
This returns a naive datetime in UTC. To make it aware, you could provide a time zone as the second parameter to fromtimestamp. In python 3, that's easy: datetime.timezone.utc. In python 2.7, you'd need to implement a UTC tzinfo class and provide that.Stateroom
In python 3.7 parsedate_tz have not counted tz shift in datetime '2019-03-14 20:43:56 +0300' and just returned a naive '2019-03-14 20:43:56'. Although email.utils.parsedate_to_datetime from @Dade answer solved the problem and returned tz-aware object.Insecurity
U
17

For python 3.3+ you can use parsedate_to_datetime function:

>>> from email.utils import parsedate_to_datetime
>>> parsedate_to_datetime('Mon, 16 Nov 2009 13:32:02 +0100')
...
datetime.datetime(2009, 11, 16, 13, 32, 2, tzinfo=datetime.timezone(datetime.timedelta(0, 3600)))

Official documentation:

The inverse of format_datetime(). Performs the same function as parsedate(), but on success returns a datetime. If the input date has a timezone of -0000, the datetime will be a naive datetime, and if the date is conforming to the RFCs it will represent a time in UTC but with no indication of the actual source timezone of the message the date comes from. If the input date has any other valid timezone offset, the datetime will be an aware datetime with the corresponding a timezone tzinfo. New in version 3.3.

Undercoating answered 17/8, 2017 at 19:1 Comment(0)
D
12

In Python 3.3+, email message can parse the headers for you:

import email
import email.policy

headers = email.message_from_file(file, policy=email.policy.default)
print(headers.get('date').datetime)
# -> 2009-11-16 13:32:02+01:00

Since Python 3.2+, it works if you replace %Z with %z:

>>> from datetime import datetime
>>> datetime.strptime("Mon, 16 Nov 2009 13:32:02 +0100", 
...                   "%a, %d %b %Y %H:%M:%S %z")
datetime.datetime(2009, 11, 16, 13, 32, 2,
                  tzinfo=datetime.timezone(datetime.timedelta(0, 3600)))

Or using email package (Python 3.3+):

>>> from email.utils import parsedate_to_datetime
>>> parsedate_to_datetime("Mon, 16 Nov 2009 13:32:02 +0100")
datetime.datetime(2009, 11, 16, 13, 32, 2,
                  tzinfo=datetime.timezone(datetime.timedelta(0, 3600)))

if UTC offset is specified as -0000 then it returns a naive datetime object that represents time in UTC otherwise it returns an aware datetime object with the corresponding tzinfo set.

To parse rfc 5322 date-time string on earlier Python versions (2.6+):

from calendar import timegm
from datetime import datetime, timedelta, tzinfo
from email.utils import parsedate_tz

ZERO = timedelta(0)
time_string = 'Mon, 16 Nov 2009 13:32:02 +0100'
tt = parsedate_tz(time_string)
#NOTE: mktime_tz is broken on Python < 2.7.4,
#  see https://bugs.python.org/issue21267
timestamp = timegm(tt) - tt[9] # local time - utc offset == utc time
naive_utc_dt = datetime(1970, 1, 1) + timedelta(seconds=timestamp)
aware_utc_dt = naive_utc_dt.replace(tzinfo=FixedOffset(ZERO, 'UTC'))
aware_dt = aware_utc_dt.astimezone(FixedOffset(timedelta(seconds=tt[9])))
print(aware_utc_dt)
print(aware_dt)
# -> 2009-11-16 12:32:02+00:00
# -> 2009-11-16 13:32:02+01:00

where FixedOffset is based on tzinfo subclass from the datetime documentation:

class FixedOffset(tzinfo):
    """Fixed UTC offset: `time = utc_time + utc_offset`."""
    def __init__(self, offset, name=None):
        self.__offset = offset
        if name is None:
            seconds = abs(offset).seconds
            assert abs(offset).days == 0
            hours, seconds = divmod(seconds, 3600)
            if offset < ZERO:
                hours = -hours
            minutes, seconds = divmod(seconds, 60)
            assert seconds == 0
            #NOTE: the last part is to remind about deprecated POSIX
            #  GMT+h timezones that have the opposite sign in the
            #  name; the corresponding numeric value is not used e.g.,
            #  no minutes
            self.__name = '<%+03d%02d>GMT%+d' % (hours, minutes, -hours)
        else:
            self.__name = name
    def utcoffset(self, dt=None):
        return self.__offset
    def tzname(self, dt=None):
        return self.__name
    def dst(self, dt=None):
        return ZERO
    def __repr__(self):
        return 'FixedOffset(%r, %r)' % (self.utcoffset(), self.tzname())
Dade answered 16/4, 2014 at 18:14 Comment(0)
C
2

Have you tried

rfc822.parsedate_tz(date) # ?

More on RFC822, http://docs.python.org/library/rfc822.html

It's deprecated (parsedate_tz is now in email.utils.parsedate_tz), though.

But maybe these answers help:

Crossland answered 24/11, 2009 at 15:32 Comment(2)
Yeah, I've seen it, but it's deprecated.Underhung
This function is now known as email.utils.parsedate_tz(), FWIW.Mingmingche
V
1
# Parses Nginx' format of "01/Jan/1999:13:59:59 +0400"
# Unfortunately, strptime doesn't support %z for the UTC offset (despite what
# the docs actually say), hence the need # for this function.
def parseDate(dateStr):
    date = datetime.datetime.strptime(dateStr[:-6], "%d/%b/%Y:%H:%M:%S")
    offsetDir = dateStr[-5]
    offsetHours = int(dateStr[-4:-2])
    offsetMins = int(dateStr[-2:])
    if offsetDir == "-":
        offsetHours = -offsetHours
        offsetMins = -offsetMins
    return date + datetime.timedelta(hours=offsetHours, minutes=offsetMins)
Viticulture answered 14/6, 2016 at 23:4 Comment(0)
F
0

For those who want to get the correct local time, here is what I did:

from datetime import datetime
from email.utils import parsedate_to_datetime

mail_time_str = 'Mon, 16 Nov 2009 13:32:02 +0100'

local_time_str = datetime.fromtimestamp(parsedate_to_datetime(mail_time_str).timestamp()).strftime('%Y-%m-%d %H:%M:%S')

print(local_time_str)
Flyfish answered 16/6, 2018 at 4:14 Comment(0)
L
-1

ValueError: 'z' is a bad directive in format...

(note: I have to stick to python 2.7 in my case)

I have had a similar problem parsing commit dates from the output of git log --date=iso8601 which actually isn't the ISO8601 format (hence the addition of --date=iso8601-strict in a later version).

Since I am using django I can leverage the utilities there.

https://github.com/django/django/blob/master/django/utils/dateparse.py

>>> from django.utils.dateparse import parse_datetime
>>> parse_datetime('2013-07-23T15:10:59.342107+01:00')
datetime.datetime(2013, 7, 23, 15, 10, 59, 342107, tzinfo=+0100)

Instead of strptime you could use your own regular expression.

Lole answered 16/3, 2015 at 17:4 Comment(3)
it does not answer the question. You use different time format. Note: the time format in the question is defined in rfc 5322 (and its predessors) -- it can be parsed using email.utils.parsedate_tz on Python 2.7. Your format looks like rfc 3339. Both can be parsed using dateutil.parser.parse() on Python 2. See Convert timestamps with offset to datetime obj using strptimeDade
@J.F.Sebastian, had you not deleted my answer on one of the duplicate question, I would not have posted my answer here. My problem was strptime does not handle %z format, I believe this is the same problem.Lole
I can't delete someone's else answer by myself. Could you link to the corresponding question?Dade

© 2022 - 2024 — McMap. All rights reserved.