Best practices with saving datetime & timezone info in database when data is dependant on datetime
Asked Answered
I

2

95

There were quite a few questions about saving datetime & timezones info in DB but more on the overall level. Here I'd like to address a specific case.

System specs

  • We have an Orders system database
  • It is a multi-tenant system where tenants can use arbitrary timezone (it is arbitrary but single timezone per tenant, saved in Tenants table once and never changes)

Business rule needed to be covered in DB

  • When tenant places an Order into the system, order number gets computed based on their local datetime (its not literally a number but some kind of an identifier like ORDR-13432-Year-Month-Day). Precise calculation is not important for the moment, it's just important that it's dependant on tenants local datetime
  • We also do want to be able to select all Orders, on the system level, placed between some UTC datetimes regardless of the tenant (for general system statistics/reporting)

Our initial idea

  • Our initial idea was to save UTC datetime across whole DB and, of course, keep tenants timezone offset relative to UTC and have application that consumes DB always convert datetimes to UTC so that DB itself always operate with UTC.

Approach 1

  • Saving local tenants datetime would be nice per tenant but then we have problem with queries like:

    SELECT * FROM ORDERS WHERE OrderDateTime BETWEEN UTCDateTime1 AND UTCDateTime2
    

It's problematic because OrderDateTime in this query means different moment in time, based on tenant. Of course, this query might include join to Tenants table to get local datetime offset which would then calculate OrderDateTime on the fly to make adjustments. It's possible, but not sure if it's a good way to do it?

Approach 2

  • On the other hand, when saving UTC datetime, then when we do calculation of OrderNumber since the day/month/year in UTC might differ from the one in local datetime

Let's take extreme example; let's say tenant is 6 hours ahead of UTC and his local datetime is 2017-01-01 02:00. UTC would be 2016-12-31 20:00. Order placed at that moment should get OrderNumber 'ORDR-13432-2017-1-1' but if saving UTC it would get ORDR-13432-2016-12-31.

In this case, at the moment of creating Order in DB, we should get UTC datetime, tenants offset and compile OrderNumber based on recalculated tenants localtime but still save DateTime column in UTC.

Questions

  1. What is the preferred way of handling this kind of situation?
  2. Is there a nice solution with saving UTC datetimes because that one would be pretty nice for us because of the system-level reporting?
  3. If going with saving UTC, is Approach 2) good way to handle those cases or is there some better/recommended way?

[UPDATE]

Based on comments from Gerard Ashton and Hugo:

Initial question was not clear with respect to the detail if tenant can change timezone or not and what happens if political authority changes the timezone properties or some terirory's timezone. Of course that is of an extreme importance, but it is not in the center of this question . We might address that in a separate question.

For the sake of this question, lets assume tenant will not change location. Timezone properties or timezone itself for that location might change and those changes will be handled in the system separately from this question.

Ician answered 7/7, 2017 at 7:58 Comment(9)
The assumption that local time zone never changes is risky. If time zone is stored as a standardized name (Eastern Standard Time) then you must also keep track of the start and end dates of daylight saving time throughout the history of the database. If stored as a numerical offset from UTC, it will change in areas affected by daylight saving time. On rare occasions a political subdivision may change from one timezone to another.Dentation
statement that 'local time zone never changes' is per tenant. it just means that once tenant chooses their timezone, they will always use that timezone. with all potential underlying changes to timezone itself.Ician
Maybe the time zone issue is outside your control. But if I were a tenant and the US Congress and the state legislature decided to change the time zone of my office from Mountain Time to Central Time, and the database provider wouldn't let me change the time zone in the database, I'd be unhappy. Are you keeping track of the time zone as a name or a numerical value?Dentation
Even if you use a timezone name, there's no guarantee that a region will be in the same zone forever - a new zone can be created for that region, which means a new name (that didn't exist before) will be used to refer to that region in terms of zone rules (what's the offset from UTC, when DST starts and ends, etc) - a new name is created when the region decides to have different rules and its historical data becomes different to all exisiting zones. It doesn't happen everyday, but you should still consider it.Carbonari
Gerard Ashton, Hugo - thanks for your comments! i've answered your comments in questions updateIcian
The more fundamental problem, in my opinion, is that you're trying to use the datetime in both the Order Date and Order Number, which violates basic second normal form. Fixing that would make this timezone problem easier and would prevent other problems down the line.Broadcloth
@KevinChristopherHenry I couldn't agree more but in this case its a required business rule. Customer wants OrderNumber composed that way. Its not a developers choice.Ician
@daneejela: Fair enough (and condolences).Broadcloth
There's nothing wrong with using an order date as part of the order number. Normalization doesn't apply to that, and is overrated anyway...Comer
C
70

Hugo's answer is mostly correct, but I'll add a few key points:

  • When you're storing the customer's time zone, do NOT store a numerical offset. As others have pointed out, the offset from UTC is only for a single point in time, and can easily change for DST and for other reasons. Instead, you should store a time zone identifier, preferably an IANA time zone identifier as a string, such as "America/Los_Angeles". Read more in the timezone tag wiki.

  • Your OrderDateTime field should absolutely represent the time in UTC. However, depending on your database platform, you have several choices for how to store this.

    • For example, if using Microsoft SQL Server, a good approach is to store the local time in a datetimeoffset column, which preserves the offset from UTC. Note that any index you create on that column will be based on the UTC equivalent, so you will get good query performance when doing your range query.

    • If using other database platforms, you may instead wish to store the UTC value in a timestamp field. Some databases also have timestamp with time zone, but understand that it doesn't mean it stores the time zone or offset, it just means that it can do conversions for you implicitly as you store and retrieve values. If you intend to always represent UTC, then often timestamp (without time zone) or just datetime is more appropriate.

  • Since either of the above methods will store a UTC time, you'll also need to consider how to perform operations that need an index of local time values. For example, you might need to create a daily report, based on the day of the user's time zone. For that, you'd need to group by the local date. If you try to compute that at query time from your UTC value, you'll end up scanning the entire table.

    A good approach to deal with this is to create a separate column for the local date (or perhaps even the local datetime depending on your needs, but not a datetimeoffset or timestamp). This could be a completely isolated column that you populate separately, or it could be a computed/calculated column based on your other column. Use this column in an index so you can filter or group by local date.

  • If you go for the computed-column approach, you'll need to know how to convert between time zones in the database. Some databases have a convert_tz function built-in that understands IANA time zone identifiers.

    If you're using Microsoft SQL Server, you can use the new AT TIME ZONE function in SQL 2016 and Azure SQL DB, but that only works with Microsoft time zone identifiers. To use IANA time zone identifiers, you'll need a third party solution, such as my SQL Server Time Zone Support project.

  • At query time, avoid using the BETWEEN statement. It is fully inclusive. It works ok for whole dates, but when you have time involved you're better off doing a half-open range query, such as:

    ... WHERE OrderDateTime >= @t1 AND OrderDateTime < @t2
    

    For example, if @t1 were the start of today, @t2 would be the start of tomorrow.

Regarding the scenario discussed in comments where the user's time zone has changed:

  • If you choose to calculate the local date in the database, the only scenario you need to worry about is if a location or business switches time zones without a "zone split" occurring. A zone split is when a new time zone identifier is introduced which covers the area that changed, including their old and new rules.

    For example, the latest zone added to the IANA tzdb at the time of writing this is America/Punta_Arenas, which was a zone split when the southern part of Chile decided to stay at UTC-3 when the rest of Chile (America/Santiago) went back to UTC-4 at the end of DST.

    However, if a minor locality on the border of two time zones decides to change which side they follow, and a zone split wasn't warranted, then you'd potentially be using the rules of their new time zone against their old data.

  • If you store the local date separately (computed in the application, not the DB), then you'll have no problems. The user changes their time zone to the new one, all old data is still intact, and new data is stored with the new time zone.

Comer answered 7/7, 2017 at 17:55 Comment(9)
Awesome answer, as usual from you :). Feels like having separated local datetime column is the safest way to go for any kind of reporting that requires local datetime. Pretty much because of the exact synchronization with timezones changes taking effect moment. With separated column its left to the client to deal with what they consider local which sounds very right in server environments.Eucaine
Yep, and like I said, I often just use a date column for this. Unless you're doing hourly filters/grouping by local time, you probably just need the date. It's fairly clear what's going on if you name your columns things like CreatedUTCDateTime and CreatedLocalDate.Comer
I'd guess date is good enough for daily reports but if there are any events to be scheduled in clients timezone, for example, order expiry, payment reminder or any kind of future local events, then full datetime is probably a better choice. And since its more often than not hard to predict what clients feature request might come up down the road, its probably safer to go with full blown datetime from the begining.Eucaine
Sorry, but I disagree with this point "When you're storing the customer's time zone, do NOT store a numerical offset... the offset from UTC is only for a single point in time, and can easily change". Yes, the offset being used for a time zone might change. However, the offset at a particular point in time will not ever change. Even if you got it wrong (say based on the location you should have recorded a +5 offset but you recorded a +6 offset), if the offset is accurate then the time is accurate.Terrier
@Terrier - That's fine if the date, time, and offset are stored together (such as a datetimeoffset field in SQL Server, etc.). However, if it's detached from any particular point in time (such as in a user profile, or office location, etc.), then a time zone identifier must be used rather than an offset.Comer
When you say that the Order DateTime should "represent UTC" and then suggest SQL server storing the DateTimeOffset, do you mean storing the actual local time with offset is a valid representation of UTC as you can always get the UTC value from the field? Or are you suggesting storing multiple fields (one in utc the other at local offset)? I assume the former but I've read this a few times and I'm not absolutely certain that's what you mean.Weft
@Weft - For SQL Server specifically, you can use a datetimeoffset field alone. The value you put into that field can either be the UTC time with a zero offset, or a local time with the corresponding local offset. That's really up to you. Either way, SQL Server will build its indexes by the UTC equivalent. Thus, if you will need to query by local time, then you might want a secondary field for the local time (or perhaps just the local date).Comer
@MattJohnson-Pint then in that case does it make sense to store a combination of DateTimeOffset, local datetime AND the time zone identifier (The latter probably via a joined table, say... Stored on the customer) or does it then become overkill?Weft
@Weft - It all depends on what your needs are. Sometimes, yes. But often it's enough to just store a single datetimeoffset. For example, if you're just timestamping a transaction, then you don't need the time zone id.Comer
C
40

I'd recommend to always use UTC internally, and convert to a timezone only when displaying the date to the user. So I tend to prefer approach 2.

If there's a business rule saying that the tenant's local date/time must be part of the identifier, so be it. But internally, you keep the order date in UTC.

Using your example: a tenant whose timezone is in UTC+06:00, so the tenant's local time is 2017-01-01 02:00, which is equivalent to 2016-12-31 20:00 in UTC.

The order identifier would be ORDR-13432-2017-1-1 and the order date would be UTC 2016-12-31 20:00Z.

To get all orders between 2 dates, this query is straighforward:

SELECT * FROM ORDERS WHERE OrderDateTime BETWEEN UTCDateTime1 AND UTCDateTime2

Because OrderDateTime is in UTC.

If looking for a specific tenant, then you can get the corresponding timezone, convert the date accordingly and search for it. Using the same example above (tenant's timezone is in UTC+06:00), to get all orders made in 2017-01-01 (in tenant's local time):

--get tenant timezone
--startUTC=tenant's local 2017-01-01 00:00 converted to UTC (2016-12-31T18:00Z)
--endUTC=tenant's local 2017-01-01 23:59:59.999 converted to UTC (2017-01-01T17:59:59.999)
SELECT * FROM ORDERS WHERE OrderDateTime between startUTC and endUTC

This will get ORDR-13432-2017-1-1 correctly.


To make queries for multiple tenants in different timezones, both approaches require a join, so none are "better" for this case.

Unless you create an extra column with the tenant's local date/time (the UTC OrderDateTime converted to tenant's timezone). It'll be redundant, but it can help you with queries that searches in more than one timezone. If that's a reasonable trade-off it will depend on how frequent those queries will be made.

Carbonari answered 7/7, 2017 at 14:36 Comment(2)
Very nice answer! And a good point with idea for saving additional column with local date/time. It would be reduntant but it makes life so much easier if you want to make queries like: get top 10 tenants that made the most orders in a day (day being defined as working hours in their local timezones).Eucaine
this approach does not consider DST (Daylight saving time) and could fall into errors to find range of past data with different DST.Tl

© 2022 - 2024 — McMap. All rights reserved.