Parsing of Ordered Timestamps in Local Time (to UTC) While Observing Daylight Saving Time
Asked Answered
A

2

3

I have CSV data files with timestamped records that are in local time. Unfortunately the data files cover the period where daylight saving time changes (Nov 3rd 2013) so the time component of the timestamps for the records go: 12:45, 1:00, 1:15, 1:30, 1:45, 1:00, 1:15, 1:30, 1:45, 2:00. I want to be able to convert and store the values in the database as UTC.

Unfortunately the standard DateTime.Parse() function of .NET will parse as this (all November 3rd 2013):

| Time String | Parsed Local Time | In DST | Parsed Local Time to UTC
|  12:45 am   |    12:45 am       |   Yes  |     4:45 am
| 12:59:59 am |    12:59:59 am    |   Yes  |     4:59:59 am
|  01:00 am   |     1:00 am       |   No   |     6:00 am
|  01:15 am   |     1:15 am       |   No   |     6:15 am

So it never sees the 1:00-1:59:59 am range as being in DST and my parsed timestamps in UTC jumps an hour.

Is there a library or class out there that will allow me to parse the timestamps and take into account the change in DST? Like some instantiatable class that will remember the stream of timestamps it's already received and adjust the parsed timestamp accordingly?

Assumptions about the data that can be made while parsing:

  1. I have the start time of file (timestamp of the first record) in the header section of the file in both local and UTC.
  2. The records are in order by timestamp
  3. All local times are in Eastern Standard
  4. The data could also go the other way: from out of DST into it
  5. The records contain a full timestamp in the format: yyyy/mm/dd HH:mm:ss (2013/11/03 00:45:00)

Note: While my software is in C#, I did not tag C#/.NET specifically as I figured I could use any language's implementation of a solution and recode if necessary.

Agility answered 6/10, 2014 at 13:35 Comment(3)
can successive timestamps in your file go backwards in UTC?Ivoryivorywhite
No, they can be assumed to be in order of UTC.Agility
I was just being lazy in typing the table above. The timestamp field in the data file contains a full date and time. I'll add it to the above assumptions.Agility
L
3

In C#:

// Define the input values.
string[] input =
{
    "2013-11-03 00:45:00",
    "2013-11-03 01:00:00",
    "2013-11-03 01:15:00",
    "2013-11-03 01:30:00",
    "2013-11-03 01:45:00",
    "2013-11-03 01:00:00",
    "2013-11-03 01:15:00",
    "2013-11-03 01:30:00",
    "2013-11-03 01:45:00",
    "2013-11-03 02:00:00",
};

// Get the time zone the input is meant to be interpreted in.
TimeZoneInfo tz = TimeZoneInfo.FindSystemTimeZoneById("Eastern Standard Time");

// Create an array for the output values
DateTimeOffset[] output = new DateTimeOffset[input.Length];

// Start with the assumption that DST is active, as ambiguities occur when moving
// out of daylight time into standard time.
bool dst = true;

// Iterate through the input.
for (int i = 0; i < input.Length; i++)
{
    // Parse the input string as a DateTime with Unspecified kind
    DateTime dt = DateTime.ParseExact(input[i], "yyyy-MM-dd HH:mm:ss",
                                      CultureInfo.InvariantCulture);

    // Determine the offset.
    TimeSpan offset;
    if (tz.IsAmbiguousTime(dt))
    {
        // Get the possible offsets, and use the DST flag and the previous entry
        // to determine if we are past the transition point.  This only works
        // because we have outside knowledge that the items are in sequence.
        TimeSpan[] offsets = tz.GetAmbiguousTimeOffsets(dt);
        offset = dst && (i == 0 || dt >= output[i - 1].DateTime)
                 ? offsets[1] : offsets[0];
    }
    else
    {
        // The value is unambiguous, so just get the single offset it can be.
        offset = tz.GetUtcOffset(dt);
    }

    // Use the determined values to construct a DateTimeOffset
    DateTimeOffset dto = new DateTimeOffset(dt, offset);

    // We can unambiguously check a DateTimeOffset for daylight saving time,
    // which sets up the DST flag for the next iteration.
    dst = tz.IsDaylightSavingTime(dto);

    // Save the DateTimeOffset to the output array.
    output[i] = dto;
}


// Show the output for debugging
foreach (var dto in output)
{
    Console.WriteLine("{0:yyyy-MM-dd HH:mm:ss zzzz} => {1:yyyy-MM-dd HH:mm:ss} UTC",
                       dto, dto.UtcDateTime);
}

Output:

2013-11-03 00:45:00 -04:00 => 2013-11-03 04:45:00 UTC
2013-11-03 01:00:00 -04:00 => 2013-11-03 05:00:00 UTC
2013-11-03 01:15:00 -04:00 => 2013-11-03 05:15:00 UTC
2013-11-03 01:30:00 -04:00 => 2013-11-03 05:30:00 UTC
2013-11-03 01:45:00 -04:00 => 2013-11-03 05:45:00 UTC
2013-11-03 01:00:00 -05:00 => 2013-11-03 06:00:00 UTC
2013-11-03 01:15:00 -05:00 => 2013-11-03 06:15:00 UTC
2013-11-03 01:30:00 -05:00 => 2013-11-03 06:30:00 UTC
2013-11-03 01:45:00 -05:00 => 2013-11-03 06:45:00 UTC
2013-11-03 02:00:00 -05:00 => 2013-11-03 07:00:00 UTC

Note that this assumes that the first time you encounter an ambiguous time like 1:00 that it will be in DST. Say your list was truncated to just the last 5 entries - you wouldn't know that those were in standard time. There's not much you could do in that particular case.

Licensee answered 6/10, 2014 at 18:19 Comment(2)
Dude. Fantastic. I didn't know about TimeZoneInfo and was basically implemented my own IsAmbiguousTime() but didn't have a good way to get the ambiguous offsets. And I was trying to drill too far into the guessing how I should adjust forwards or backwards. THANK YOU! I was able to easily encapsulate this into a class and add some unit tests. It works going into DST too (as long as you have at least one value before the change).Agility
There shouldn't be any requirement to have one value before the change going into DST. In that transition, there is a gap but there is no valid time that is ambiguous. See also the dst tag wiki.Licensee
I
2

If successive timestamps can't go backwards if expressed as time in UTC then this Python script can convert the local time into UTC:

#!/usr/bin/env python3
import sys
from datetime import datetime, timedelta
import pytz  # $ pip install pytz

tz = pytz.timezone('America/New_York' if len(sys.argv) < 2 else sys.argv[1])
previous = None #XXX set it from UTC time: `first_entry_utc.astimezone(tz)`
for line in sys.stdin: # read from stdin
    naive = datetime.strptime(line.strip(), "%Y/%m/%d %H:%M:%S") # no timezone
    try:
        local = tz.localize(naive, is_dst=None) # attach timezone info
    except pytz.AmbiguousTimeError:
        # assume ambiguous time always corresponds to True -> False transition
        local = tz.localize(naive, is_dst=True)
        if previous >= local: # timestamps must be increasing
            local = tz.localize(naive, is_dst=False)
        assert previous < local
    #NOTE: allow NonExistentTimeError to propagate (there shouldn't be
    # invalid local times in the input)
    previous = local
    utc = local.astimezone(pytz.utc)
    timestamp = utc.timestamp()
    time_format = "%Y-%m-%d %H:%M:%S %Z%z"
    print("{local:{time_format}}; {utc:{time_format}}; {timestamp:.0f}"
          .format_map(vars()))

Input

2013/11/03 00:45:00
2013/11/03 01:00:00
2013/11/03 01:15:00
2013/11/03 01:30:00
2013/11/03 01:45:00
2013/11/03 01:00:00
2013/11/03 01:15:00
2013/11/03 01:30:00
2013/11/03 01:45:00
2013/11/03 02:00:00

Output

2013-11-03 00:45:00 EDT-0400; 2013-11-03 04:45:00 UTC+0000; 1383453900
2013-11-03 01:00:00 EDT-0400; 2013-11-03 05:00:00 UTC+0000; 1383454800
2013-11-03 01:15:00 EDT-0400; 2013-11-03 05:15:00 UTC+0000; 1383455700
2013-11-03 01:30:00 EDT-0400; 2013-11-03 05:30:00 UTC+0000; 1383456600
2013-11-03 01:45:00 EDT-0400; 2013-11-03 05:45:00 UTC+0000; 1383457500
2013-11-03 01:00:00 EST-0500; 2013-11-03 06:00:00 UTC+0000; 1383458400
2013-11-03 01:15:00 EST-0500; 2013-11-03 06:15:00 UTC+0000; 1383459300
2013-11-03 01:30:00 EST-0500; 2013-11-03 06:30:00 UTC+0000; 1383460200
2013-11-03 01:45:00 EST-0500; 2013-11-03 06:45:00 UTC+0000; 1383461100
2013-11-03 02:00:00 EST-0500; 2013-11-03 07:00:00 UTC+0000; 1383462000
Ivoryivorywhite answered 6/10, 2014 at 17:5 Comment(9)
Thank you for this, it's great. The other solution was in C#, so more easily applied to my project, but for those using Python this seems pretty straight forward.Agility
@MattJohnson: it is nice but it fails for 2014-10-26 in Europe/Moscow timezone with the latest tz database (dst is 0 before and after the jump).Ivoryivorywhite
Hmmmm... tz.normalize(tz.localize(datetime(2014,10,26,1,0,0)) + timedelta(hours=1)) works. (Though it would be better not to hardcode the 1 hour). But you're right that localize by itself has no way to give the +03:00 offset because is_dst would be false for both.Licensee
@MattJohnson: tz.normalize() fixes nothing here: tz.localize(datetime(2014,10,26,1,0,0) + timedelta(hours=1)) also produces +0300. I don't see it helps to parse the input (replace 2013/11/03 with 2014/10/26 above) in Europe/Moscow timezone.Ivoryivorywhite
The time changes. Without normalize, you get 02:00+3:00. With normalize, you get 01:00+3:00Licensee
@MattJohnson: I know and it doesn't matter in this case. Both do not help parse the input.Ivoryivorywhite
Perhaps you could flip this sub-issue into a new question so others could offer insight? :)Licensee
@MattJohnson: I've added support for UTC transitions where is_dst doesn't change.Ivoryivorywhite
@MattJohnson: I've opened pytz bug about DST transition where dst is the same before/after the transition.Ivoryivorywhite

© 2022 - 2024 — McMap. All rights reserved.