Date string to epoch seconds (UTC)
Asked Answered
O

2

8

Question

I want to parse a date-time given as string (UTC) into seconds since epoch. Example (see EpochConverter):

2019-01-15 10:00:00 -> 1547546400

Problem

The straightforward solution, which is also accepted in a very related question C++ Converting a time string to seconds from the epoch goes std::string -> std::tm -> std::time_t using std::get_time and then std::mktime:

std::tm tm;
std::stringstream ss("2019-01-15 10:00:00");
ss >> std::get_time(&tm, "%Y-%m-%d %H:%M:%S");
std::time_t epoch = std::mktime(&tm);
// 1547546400 (expected)
// 1547539200 (actual, 2 hours too early)

But std::mktime seems to mess up the hours due to timezone. I am executing the code from UTC+01:00, but we also had DST at that date, so its +2 here.

The tm shows 15 for the hour field after std::get_time. It gets messed up as soon as it enters std::mktime.

So again, the string is to be interpreted as UTC timestamp, no timezones whatsoever should be involved. But all solutions I came up with seem to interpret it as local timestamp and add offsets to it.


Restrictions

I have some restrictions for this:

  • C++17
  • platform/compiler independent
  • no environment variable hacking
  • no external libraries (like boost)

Feel free to post answers involving those for the sake of Q&A though, but I wont accept them.


Research

I found various attempts to solve this problem, but none met my requirements:

  • std::mktime (as mentioned above), messes up the time because it assumes local time
  • strptime, not available on my platform, not part of the standard
  • timegm (thats exactly what I would need), not platform independent
  • _mkgmtime, not platform independent
  • boost::posix_time::from_iso_string, is an external library
  • std::chrono::date::parse, not available with C++17
  • clear and reset the timezone variable with tzset, uses environment variable hacking
  • manually countering the offset with mktime(localtime(&timestamp)) - mktime(gmtime(&timestamp)), computes the wrong offset since it does not account for DST (1 hour on my platform but it would need to be 2 hours)
Ootid answered 1/8, 2019 at 7:55 Comment(3)
From this std::mktime reference: "If the std::tm object was obtained from std::get_time ..., the value of tm_isdst is indeterminate, and needs to be set explicitly before calling mktime." So start with that.Calkins
@Someprogrammerdude But that would require me to know if DST is active or not. Which is not the case for a program that I want to run everywhere in the world at any time. And it doesnt sound necessary to me, since I do not want to involve any timezones at all, the date-string is in UTC and should also be interpreted in UTC, no offsets should be involved. Which is why I think that std::mktime is a bad approach to this, as it assumes local time. Thanks for your input though :)Ootid
Then you need to explicitly set the time-zone to "UTC", which before C++20 is going to be platform-dependent. From C++20 there are calendar and time zone functions.Calkins
O
5

Solution prior to C++20: Roll your own.

Given the right documentation, it really is much easier than it sounds, and can even be lightning fast if you don't need much error detection.

The first problem is to parse the numbers without manipulating any of them. You only need to read unsigned values of length 2 and 4 digits, so just do that bare minimum:

int
read2(std::string const& str, int pos)
{
    return (str[pos] - '0')*10 + (str[pos+1]  - '0');
}

int
read4(std::string const& str, int pos)
{
    return (str[pos] - '0')*1000 + (str[pos+1] - '0')*100 +
           (str[pos+2] - '0')*10 + (str[pos+3]  - '0');
}

Now given a string, it is easy to parse out the different values you will need:

// yyyy-mm-dd hh:MM:ss -> count of non-leap seconds since 1970-01-01 00:00:00 UTC
// 0123456789012345678
long long
EpochConverter(std::string const& str)
{
    auto y = read4(str, 0);
    auto m = read2(str, 5);
    auto d = read2(str, 8);
    ...

The part that usually trips people up is how to convert the triple {y, m, d} into a count of days since/prior 1970-01-01. Here is a collection of public domain calendrical algorithms that will help you do this. This is not a 3rd party date/time library. It is a tutorial on the algorithms you will need to write your own date/time library. And these algorithms are efficient. No iteration. No large tables. That makes them very pipeline and cache friendly. And they are unit tested over a span of +/- a million years. So you don't have to worry about hitting any correctness boundaries with them. These algorithms also have a very in-depth derivation if you are interested in how they work.

So just go to the collection of public domain calendrical algorithms, pick out the algorithms you need (and customize them however you want), and roll your own converter.

For example:

#include <cstdint>
#include <limits>
#include <string>

int
days_from_civil(int y, unsigned m, unsigned d) noexcept
{
    static_assert(std::numeric_limits<unsigned>::digits >= 18,
             "This algorithm has not been ported to a 16 bit unsigned integer");
    static_assert(std::numeric_limits<int>::digits >= 20,
             "This algorithm has not been ported to a 16 bit signed integer");
    y -= m <= 2;
    const int era = (y >= 0 ? y : y-399) / 400;
    const unsigned yoe = static_cast<unsigned>(y - era * 400);      // [0, 399]
    const unsigned doy = (153*(m + (m > 2 ? -3 : 9)) + 2)/5 + d-1;  // [0, 365]
    const unsigned doe = yoe * 365 + yoe/4 - yoe/100 + doy;         // [0, 146096]
    return era * 146097 + static_cast<int>(doe) - 719468;
}

int
read2(std::string const& str, int pos)
{
    return (str[pos] - '0')*10 + (str[pos+1]  - '0');
}

int
read4(std::string const& str, int pos)
{
    return (str[pos] - '0')*1000 + (str[pos+1] - '0')*100 +
           (str[pos+2] - '0')*10 + (str[pos+3]  - '0');
}

// yyyy-mm-dd hh:MM:ss -> count of non-leap seconds since 1970-01-01 00:00:00 UTC
// 0123456789012345678
long long
EpochConverter(std::string const& str)
{
    auto y = read4(str, 0);
    auto m = read2(str, 5);
    auto d = read2(str, 8);
    auto h = read2(str, 11);
    auto M = read2(str, 14);
    auto s = read2(str, 17);
    return days_from_civil(y, m, d)*86400LL + h*3600 + M*60 + s;
}

#include <iostream>

int
main()
{
    std::cout << EpochConverter("2019-01-15 10:00:00") << '\n';
}

This just output for me:

1547546400

Sprinkle in whatever error detection is appropriate for your application.

Obsolesce answered 1/8, 2019 at 14:35 Comment(3)
The problem I see with the mentioned algorithms is that they do not account for all the weird phenomens of time and timezones. There are countries which switched timezones and countries that have very special offset situations. While this algorithm works perfectly fine for most standard situations, it fails for the weird edge cases around the globe. I will accept it though since I do get that there is no satisfying solution out there (within my requirements) and this comes close enough and is very well researched. Thanks.Ootid
The wording in your question led me to believe that you did not wish to involve timezones. My solution is strictly UTC-only. The "epoch" is measured in UTC (1970-01-01 00:00:00 UTC). And you state that the input string is UTC. So no timezones to deal with.Obsolesce
Oh, you are right. Since we do parse the string directly, without any built-in parsers, we do not get any weird offsets added automatically by C++. In that case, your solutions solves my problem. Many thanks.Ootid
S
1

I had the same requirement recently. I was disappointed to find that the handling of DST and timezones seemed inconsistent between writing timestamps and parsing them.

The code I came up with was this:

void time_point_from_stream(std::istream &is, system_clock::time_point &tp)
{
    std::tm tm {};
    is >> std::get_time(&tm, "%Y-%m-%dT%H:%M:%S");

    // unhappily, mktime thinks it's reading local time with DST adjustments
    auto my_time_t = std::mktime(&tm);
    my_time_t += tm.tm_gmtoff;

    if (tm.tm_isdst == 1)
        my_time_t -= 3600;

    tp = system_clock::from_time_t(my_time_t);

    if (not is)
        return;
    auto ch = is.peek();

    if (std::isspace(ch))
        return;

    if (ch == '.')
    {
        double zz;
        is >> zz;
        auto zseconds = std::chrono::duration< double >(zz);
        tp += chrono::duration_cast< system_clock::duration >(zseconds);
        if (not is)
            return;
        ch = is.peek();
    }

    if (ch == 'Z')
        is.get();
    else if (not isspace(ch))
    {
        is.setstate(std::ios::failbit);
    }
}

Essentially, the steps are:

  1. Use std::get_time to fill a tm
  2. use std::mktime to convert that to a time_t
  3. reverse out timezone and DST adjustments
  4. convert to a std::chrono::system_clock::time_point
  5. Parse the fractional seconds and adjust the result.

I believe c++20 improves on the situation.

Howard Hinnant has also written an improved date/time library. There is also boost::posix_time which I have always found easier to use than the std offering.

Shapely answered 1/8, 2019 at 8:5 Comment(2)
As I mentioned in a comment to the OP, the value of tm_isdst is indeterminate after get_time, so it can't be relied on.Calkins
@Someprogrammerdude That would explain why I could find no documentation on it, and had to deduce behaviour through empirical analysis. boost posix_time is a much better libraryShapely

© 2022 - 2024 — McMap. All rights reserved.