PostgreSQL daterange not using index correctly
Asked Answered
T

1

7

I have a simple table which has a user_birthday field with a type of date (which can be NULL value)

CREATE TABLE users
(
  user_id bigserial NOT NULL,
  user_email text NOT NULL,
  user_password text,
  user_first_name text NOT NULL,
  user_middle_name text,
  user_last_name text NOT NULL,
  user_birthday date,
  CONSTRAINT pk_users PRIMARY KEY (user_id)
)

There's an index (btree) defined on that field, with the rule of NOT user_birthday IS NULL.

CREATE INDEX ix_users_birthday
  ON users
  USING btree
  (user_birthday)
  WHERE NOT user_birthday IS NULL;

Trying to follow up on another idea, I've added the extension btree_gist and created the following index:

CREATE INDEX ix_users_birthday_gist
  ON glances.users
  USING gist
  (user_birthday)
  WHERE NOT user_birthday IS NULL;

But it had no affect either, as from what I could read it is not used for range checking.

The PostgreSQL version is 9.3.4.0 (22) Postgres.app and issue also exists in 9.3.3.0 (21) Postgres.app

I've been intrigued by the following queries:

Query #1:

EXPLAIN ANALYZE SELECT *
FROM users
WHERE user_birthday <@ daterange('[1978-07-15,1983-03-01)')

Query #2:

EXPLAIN ANALYZE SELECT *
FROM users
WHERE user_birthday BETWEEN '1978-07-15'::date AND '1983-03-01'::date

which, at first glance both should have the same execution plan, but for some reason, here are the results:

Query #1:

"Seq Scan on users  (cost=0.00..52314.25 rows=11101 width=241) (actual
time=0.014..478.983 rows=208886 loops=1)"
"  Filter: (user_birthday <@ '[1978-07-15,1983-03-01)'::daterange)"
"  Rows Removed by Filter: 901214"
"Total runtime: 489.584 ms"

Query #2:

"Bitmap Heap Scan on users  (cost=4468.01..46060.53 rows=210301 width=241)
(actual time=57.104..489.785 rows=209019 loops=1)"
"  Recheck Cond: ((user_birthday >= '1978-07-15'::date) AND (user_birthday
<= '1983-03-01'::date))"
"  Rows Removed by Index Recheck: 611375"
"  ->  Bitmap Index Scan on ix_users_birthday  (cost=0.00..4415.44
rows=210301 width=0) (actual time=54.621..54.621 rows=209019 loops=1)"
"        Index Cond: ((user_birthday >= '1978-07-15'::date) AND
(user_birthday <= '1983-03-01'::date))"
"Total runtime: 500.983 ms"

As you can see, the <@ daterange is not utilizing the existing index, while BETWEEN does.

Important to note that the actual use case for this rule is in a more complex query, which doesn't result in the Recheck Cond and Bitmap Heap scan. In the application complex query, the difference between the two methods (with 1.2 million records) is massive: Query #1 at 415ms Query #2 at 84ms.

Is this a bug with daterange? Am I doing something wrong? or datarange <@ is performing as designed?

There's also a discussion in the pgsql-bugs mailing list

Thundery answered 2/4, 2014 at 22:28 Comment(8)
What happens to the execution plan if you run analyze users;, followed by query #1?Stash
That was one of the 1st things I tried when I saw this issue. It had no affect.Thundery
What's the rationale for indexing on NOT user_birthday IS NULL rather than the date itself?Stash
The question should display CREATE INDEX script.Pewee
I'm ignoring records which have no birthday, so I don't need to index them. The rationale is to keep the index smaller. Per @ErwinBrandstetter request, added the CREATE INDEX, although as mentioned - nothing special in it.Thundery
Agreed, it's bit unwieldy if you have range types and need to check for upper / lower bound, excluding / including, NULL, to build the statement. It would be nice if Postgres could do that for us.Pewee
@ShaharHadas it is a long time ago I know, but I am facing same issue today. Have you detected the issue here ?Spitfire
Nope. This project is long gone by now, but since then I kept using BETWEEN as it provided the needed functionality, just in a different syntaxThundery
P
5

BETWEEN includes upper and lower border. Your condition

WHERE user_birthday BETWEEN '1978-07-15'::date AND '1983-03-01'::date

matches

WHERE user_birthday <@ daterange('[1978-07-15,1983-03-01]')

I see you mention a btree index. For that use simple comparison operators.

Detailed manual page on which index is good for which operators.

The range type operators <@ or @> would work with GiST indexes.
Example:
Perform this hours of operation query in PostgreSQL

Pewee answered 2/4, 2014 at 23:3 Comment(12)
Tried that already. postgres automatically converts [] daterange to [) daterange. No affect - still Seq Scan.Thundery
GIST is used when you have an index on a daterange field. here I'm just using a daterange to filter.Thundery
@Sash: Combine <@ with a gist index on a range type - or a functional index. For the btree index use simple comparison operators.Pewee
I don't have a range type. Added the test table creation. I'm trying to create the complex gist index as you suggested, but it seems that in order to do so, I need to create an index which takes the user birthday as the two values of the range.Thundery
Well, I'd rather suggest to use simple comparison operators with the btree index. :) A GiST index might be a good idea for a nearest neighbor search, but your example works better with simple comparison operators and a btree index.Pewee
I've just reported this as a possible bug to postgres. I'll update this question once I hear back. As for the index, after some more googling, it seems the solution is btree_gist extension, but I'm still looking into this.Thundery
@Sash: It's not a bug, the operators are just not defined. Why wouldn't you want to use the cheaper simple operators? There is nothing to gain for you here ...Pewee
Because I already have the range to filter by in another table as daterange. so now I need to use upper and lower to extract those values with COALESCE with infinity and -infinity. so its ugly ... and I don't yet if it adds slowness or not. I'm working on those changes right now.Thundery
regarding the operators - am I missing something? according to this they do exist: postgresql.org/docs/9.3/static/functions-range.html - look at the timestamp example.Thundery
The operators exist - else your query would have failed altogether. What's missing are the index operator classes. btree indexes cannot handle <@ @> in standard Postgres. Consider the link to the manual I added last. It explains more than my first link.Pewee
moving forward on your suggestion, I've added btree_gist, and then added the following gist index, but still it doesn't uses any indexThundery
@Sash: btree_gist allows you to include the listed basic types in a GiST index, not the other way round (implement additional operators for btree indexes). That might be a lot more complicated. I don't know of a module providing what you are after. If you should find something, be sure to post an answer here.Pewee

© 2022 - 2024 — McMap. All rights reserved.