Perform this hours of operation query in PostgreSQL
Asked Answered
S

1

15

I'm in the RoR stack and I had to write some actual SQL to complete this query for all records that are "open", meaning that the current time is within the specified hours of operation. In the hours_of_operations table two integer columns opens_on and closes_on store a weekday, and two time fields opens_at and closes_at store the respective time of the day.

I made a query that compares the current date and time to the stored values but I'm wondering if there is a way to cast to some sort of date type and have PostgreSQL do the rest?

The meat of the query is:

WHERE (
 (

 /* Opens in Future */
 (opens_on > 5 OR (opens_on = 5 AND opens_at::time > '2014-03-01 00:27:25.851655'))
 AND (
 (closes_on < opens_on AND closes_on > 5)
 OR ((closes_on = opens_on)
 AND (closes_at::time < opens_at::time AND closes_at::time > '2014-03-01 00:27:25.851655'))
 OR ((closes_on = 5)
 AND (closes_at::time > '2014-03-01 00:27:25.851655' AND closes_at::time < opens_at::time)))
 OR

 /* Opens in Past */
 (opens_on < 5 OR (opens_on = 5 AND opens_at::time < '2014-03-01 00:27:25.851655'))
 AND
 (closes_on > 5)
 OR
 ((closes_on = 5)
 AND (closes_at::time > '2014-03-01 00:27:25.851655'))
 OR (closes_on < opens_on)
 OR ((closes_on = opens_on)
 AND (closes_at::time < opens_at::time))
 )

 )

Th reason for such dense complexity is that an hour of operation may wrap around the end of the week, for example, starting at noon on Sunday and going through 6 AM Monday. Since I store values in UTC, there are many cases in which local time of the user could wrap in a very strange way. The query above ensures that you could enter ANY two times of the week and we compensate for the wrapping.

Squire answered 1/3, 2014 at 0:37 Comment(0)
C
36

Table layout

Re-design the table to store opening hours (hours of operation) as a set of tsrange (range of timestamp without time zone) values. Requires Postgres 9.2 or later.

Pick a random week to stage your opening hours. I like the week:
1996-01-01 (Monday) to 1996-01-07 (Sunday)
That's the most recent leap year where Jan 1st conveniently happens to be a Monday. But it can be any random week for this case. Just be consistent.

Install the additional module btree_gist first:

CREATE EXTENSION btree_gist;

See:

Then create the table like this:

CREATE TABLE hoo (
  hoo_id  serial PRIMARY KEY
, shop_id int NOT NULL -- REFERENCES shop(shop_id)     -- reference to shop
, hours   tsrange NOT NULL
, CONSTRAINT hoo_no_overlap EXCLUDE USING gist (shop_id with =, hours WITH &&)
, CONSTRAINT hoo_bounds_inclusive CHECK (lower_inc(hours) AND upper_inc(hours))
, CONSTRAINT hoo_standard_week CHECK (hours <@ tsrange '[1996-01-01 0:0, 1996-01-08 0:0]')
);

The one column hours replaces all of your columns:

opens_on, closes_on, opens_at, closes_at

For instance, hours of operation from Wednesday, 18:30 to Thursday, 05:00 UTC are entered as:

'[1996-01-03 18:30, 1996-01-04 05:00]'

The exclusion constraint hoo_no_overlap prevents overlapping entries per shop. It is implemented with a GiST index, which also happens to support our queries. Consider the chapter "Index and Performance" below discussing indexing strategies.

The check constraint hoo_bounds_inclusive enforces inclusive boundaries for your ranges, with two noteworthy consequences:

  • A point in time falling on lower or upper boundary exactly is always included.
  • Adjacent entries for the same shop are effectively disallowed. With inclusive bounds, those would "overlap" and the exclusion constraint would raise an exception. Adjacent entries must be merged into a single row instead. Except when they wrap around Sunday midnight, in which case they must be split into two rows. The function f_hoo_hours() below takes care of this.

The check constraint hoo_standard_week enforces the outer bounds of the staging week using the "range is contained by" operator <@.

With inclusive bounds, you have to observe a corner case where the time wraps around at Sunday midnight:

'1996-01-01 00:00+0' = '1996-01-08 00:00+0'
 Mon 00:00 = Sun 24:00 (= next Mon 00:00)

You have to search for both timestamps at once. Here is a related case with exclusive upper bound that wouldn't exhibit this shortcoming:

Function f_hoo_time(timestamptz)

To "normalize" any given timestamp with time zone:

CREATE OR REPLACE FUNCTION f_hoo_time(timestamptz)
  RETURNS timestamp
  LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
$func$
SELECT timestamp '1996-01-01' + ($1 AT TIME ZONE 'UTC' - date_trunc('week', $1 AT TIME ZONE 'UTC'))
$func$;

PARALLEL SAFE only for Postgres 9.6 or later.

The function takes timestamptz and returns timestamp. It adds the elapsed interval of the respective week ($1 - date_trunc('week', $1) in UTC time to the starting point of our staging week. (date + interval produces timestamp.)

Function f_hoo_hours(timestamptz, timestamptz)

To normalize ranges and split those crossing Mon 00:00. This function takes any interval (as two timestamptz) and produces one or two normalized tsrange values. It covers any legal input and disallows the rest:

CREATE OR REPLACE FUNCTION f_hoo_hours(_from timestamptz, _to timestamptz)
  RETURNS TABLE (hoo_hours tsrange)
  LANGUAGE plpgsql IMMUTABLE PARALLEL SAFE COST 500 ROWS 1 AS
$func$
DECLARE
   ts_from timestamp := f_hoo_time(_from);
   ts_to   timestamp := f_hoo_time(_to);
BEGIN
   -- sanity checks (optional)
   IF _to <= _from THEN
      RAISE EXCEPTION '%', '_to must be later than _from!';
   ELSIF _to > _from + interval '1 week' THEN
      RAISE EXCEPTION '%', 'Interval cannot span more than a week!';
   END IF;

   IF ts_from > ts_to THEN  -- split range at Mon 00:00
      RETURN QUERY
      VALUES (tsrange('1996-01-01', ts_to  , '[]'))
           , (tsrange(ts_from, '1996-01-08', '[]'));
   ELSE                     -- simple case: range in standard week
      hoo_hours := tsrange(ts_from, ts_to, '[]');
      RETURN NEXT;
   END IF;

   RETURN;
END
$func$;

To INSERT a single input row:

INSERT INTO hoo(shop_id, hours)
SELECT 123, f_hoo_hours('2016-01-11 00:00+04', '2016-01-11 08:00+04');

For any number of input rows:

INSERT INTO hoo(shop_id, hours)
SELECT id, f_hoo_hours(f, t)
FROM  (
   VALUES (7, timestamptz '2016-01-11 00:00+0', timestamptz '2016-01-11 08:00+0')
        , (8, '2016-01-11 00:00+1', '2016-01-11 08:00+1')
   ) t(id, f, t);

Each can insert two rows if a range needs splitting at Mon 00:00 UTC.

Query

With the adjusted design, your whole big, complex, expensive query can be replaced with ... this:

SELECT *
FROM hoo
WHERE hours @> f_hoo_time(now());

For a little suspense I put a spoiler plate over the solution. Move the mouse over it.

The query is backed by said GiST index and fast, even for big tables.

fiddle* (with more examples)
Old sqlfiddle

If you want to calculate total opening hours (per shop), here is a recipe:

Index and Performance

The containment operator for range types can be supported with a GiST or SP-GiST index. Either can be used to implement an exclusion constraint, but only GiST supports multicolumn indexes:

Currently, only the B-tree, GiST, GIN, and BRIN index types support multicolumn indexes.

And the order of index columns matters:

A multicolumn GiST index can be used with query conditions that involve any subset of the index's columns. Conditions on additional columns restrict the entries returned by the index, but the condition on the first column is the most important one for determining how much of the index needs to be scanned. A GiST index will be relatively ineffective if its first column has only a few distinct values, even if there are many distinct values in additional columns.

So we have conflicting interests here. For big tables, there will be many more distinct values for shop_id than for hours.

  • A GiST index with leading shop_id is faster to write and to enforce the exclusion constraint.
  • But we are searching hours in our query. Having that column first would be better.
  • If we need to look up shop_id in other queries, a plain btree index is much faster for that.
  • To top it off, I found an SP-GiST index on just hours to be fastest for the query.

Benchmark

New test with Postgres 12 on an old laptop. My script to generate dummy data:

INSERT INTO hoo(shop_id, hours)
SELECT id
     , f_hoo_hours(((date '1996-01-01' + d) + interval  '4h' + interval '15 min' * trunc(32 * random()))            AT TIME ZONE 'UTC'
                 , ((date '1996-01-01' + d) + interval '12h' + interval '15 min' * trunc(64 * random() * random())) AT TIME ZONE 'UTC')
FROM   generate_series(1, 30000) id
JOIN   generate_series(0, 6) d ON random() > .33;

Results in ~ 141k randomly generated rows, ~ 30k distinct shop_id, ~ 12k distinct hours. Table size 8 MB.

I dropped and recreated the exclusion constraint:

ALTER TABLE hoo
  DROP CONSTRAINT hoo_no_overlap
, ADD CONSTRAINT hoo_no_overlap  EXCLUDE USING gist (shop_id WITH =, hours WITH &&);  -- 3.5 sec; index 8 MB
    
ALTER TABLE hoo
  DROP CONSTRAINT hoo_no_overlap
, ADD CONSTRAINT hoo_no_overlap  EXCLUDE USING gist (hours WITH &&, shop_id WITH =);  -- 13.6 sec; index 12 MB

shop_id first is ~ 4x faster for this distribution.

In addition, I tested two more for read performance:

CREATE INDEX hoo_hours_gist_idx   on hoo USING gist (hours);
CREATE INDEX hoo_hours_spgist_idx on hoo USING spgist (hours);  -- !!

After VACUUM FULL ANALYZE hoo;, I ran two queries:

  • Q1: late night, finding only 35 rows
  • Q2: in the afternoon, finding 4547 rows.

Results

Got an index-only scan for each (except for "no index", of course):

index                 idx size  Q1        Q2
------------------------------------------------
no index                        38.5 ms   38.5 ms 
gist (shop_id, hours)    8MB    17.5 ms   18.4 ms
gist (hours, shop_id)   12MB     0.6 ms    3.4 ms
gist (hours)            11MB     0.3 ms    3.1 ms
spgist (hours)           9MB     0.7 ms    1.8 ms  -- !
  • SP-GiST and GiST are on par for queries finding few results (GiST is even faster for very few).
  • SP-GiST scales better with a growing number of results, and is smaller, too.

If you read a lot more than you write (typical use case), keep the exclusion constraint as suggested at the outset and create an additional SP-GiST index to optimize read performance.

Chaussure answered 1/3, 2014 at 7:29 Comment(8)
Yes this is incredible, my ONLY question: Is there a way to make this design pattern work with wrap-around (where the store has an hour of operation that opens on sunday and closes on monday)?Squire
@BrianWheeler: Consider the addendum addressing that.Chaussure
Your answer has been tremendously helpful! I've tailored slightly to fit my needs. I've found that by adding a boolean called wrap, I can determine if the hours wrap around the edge of the week. I will post my full Ruby on Rails integration when I'm done building it. Thanks!Squire
@ErwinBrandstetter If the local opening time converts to less than 1996-01-01 00:00 UTC, should a week be added to the opening time so that the date changes from Sunday Dec 31st to Sunday Jan 7th? or does the lower bound need to be changed instead to allow the smallest possible UTC value for "Monday Midnight" in the local time most behind UTCWondering
@FuzzyTree: Excellent question, it pointed out a bug in f_hoo_time(): date_trunc() relies on the current time zone setting, so the time zone has to be SET to UTC explicitly. While being at it, I improved the solution in large parts and added a function to take care of ... everything.Chaussure
@ErwinBrandstetter thanks for the update! my default time zone was already UTC but I think changing the type from tstzrange to tsrange fixes my edge case problem where time conversions at the boundary fall after the 8th or before the 1stWondering
@FuzzyTree: It should actually work with tstzrange as well. It's just simpler to understand with tsrange. You would have to "standardize" the week range as well. This is now all built into the two tools I provided. timestamp with time zone can be confusing ... Also consider my update on indexing. Found SP-GiST to perform better.Chaussure
@ErwinBrandstetter you're right, tstzrange vs tsrange was not the issue. I implemented the changes and it's working swimmingly now. I'll have to test the indexes to see which ones perform better for my application. Thanks again - amazing work!Wondering

© 2022 - 2024 — McMap. All rights reserved.