Why does PostgreSQL consider NULL boundaries in range types to be distinct from infinite boundaries?
Asked Answered
D

1

6

Just to preface, I'm not asking what the difference is between a NULL boundary and an infinite boundary - that's covered in this other question. Rather, I'm asking why PostgreSQL makes a distinction between NULL and infinite boundaries when (as far as I can tell) they function exactly the same.

I started using PostgreSQL's range types recently, and I'm a bit confused by what NULL values in range types are supposed to mean. The documentation says:

The lower bound of a range can be omitted, meaning that all values less than the upper bound are included in the range, e.g., (,3]. Likewise, if the upper bound of the range is omitted, then all values greater than the lower bound are included in the range. If both lower and upper bounds are omitted, all values of the element type are considered to be in the range.

This suggests to me that omitted boundaries in a range (which are the equivalent NULL boundaries specified in a range type's constructor) should be considered infinite. However, PostgreSQL makes a distinction between NULL boundaries and infinite boundaries. The documentation continues:

You can think of these missing values [in a range] as +/-infinity, but they are special range type values and are considered to be beyond any range element type's +/-infinity values.

This is puzzling. "beyond infinity" doesn't make sense, as the entire point of infinite values is that nothing can be greater than +infinity or less than -infinity. That doesn't break "element in range"-type checks, but it does introduce an interesting case for primary keys that I think most people wouldn't expect. Or at least, I didn't expect it.

Suppose we create a basic table whose sole field is a daterange, which is also the PK:

CREATE TABLE public.range_test
(
    id daterange NOT NULL,
    PRIMARY KEY (id)
);

Then we can populate it with the following data with no problem:

INSERT INTO range_test VALUES (daterange('-infinity','2021-05-21','[]'));
INSERT INTO range_test VALUES (daterange(NULL,'2021-05-21','[]'));

Selecting all the data reveals we have these two tuples:

[-infinity,2021-05-22)
(,2021-05-22)

So the two tuples are distinct, or there would have been a primary key violation. But again, NULL boundaries and infinite boundaries work exactly the same when we're dealing with the actual elements that make up the range. For example, there is no date value X such that the results of X <@ [-infinity,2021-05-22) returns a different result than X <@ (,2021-05-22). This makes sense because NULL values can't have a type of date, so they can't even be compared to the range (and PostgreSQL even converted the inclusive boundary on the lower NULL bound in daterange(NULL,'2021-05-21','[]') to an exclusive boundary, (,2021-05-22) to be doubly sure). But why are two ranges that are identical in every practical way considered distinct?

When I was still in school, I remember overhearing some discussion about the difference between "unknown" and "doesn't exist" - two people who were smarter than me were talking about that in the context of why NULL values often cause issues, and that replacing the singular NULL with separate "unknown" and "doesn't exist" values might solve those issues, but the discussion was over my head at the time. Thinking about this weird feature made me think of that discussion. So is the distinction between "unknown" and "doesn't exist" the reason why PostgreSQL treats NULL and +-infinity as distinct? If so, why are ranges the only types that allow for that distinction in PostgreSQL? And if not, why does PostgreSQL treat functionally-equivalent values as distinct?

Dupuis answered 19/5, 2021 at 20:4 Comment(3)
stackoverflow.com/questions/15578050 and this mail threadTick
First NULL can be of type date: select NULL::date pg_typeof date. Second +-infinity can be an inclusive or exclusive boundary, whereas a lack of value can only be exclusive.Intraatomic
The oddity is that -infinity is a valid date value, which runs counter to the normal perception that there is no such date. So the difference is the one between an unbounded interval and the one that is bounded by -infinity. This sounds like it should be the same thing, but it isn't. To the database, -infinity is a date value just like any other (it just happens to be less than all the others).Wagonage
S
4

Rather, I'm asking why PostgreSQL makes a distinction between NULL and infinite boundaries when (as far as I can tell) they function exactly the same.

But they do not. NULL is a syntax convenience when used as bound of a range. '-infinity' / 'infinity' are actual values in the domain of the range. Abstract values meaning lesser / greater that any other value, but values nonetheless (which can be included or excluded).

Also, NULL works for any range type, while most data types don't have special values like '-infinity' / 'infinity'. Take integer or int4range for example.

For a better understanding, consider the thread in pgsql-general that a_horse provided:

This makes sense because NULL values can't have a type of date, so they can't even be compared to the range

Every data type can be NULL, even domains that are explicitly NOT NULL. See:

That includes date, of course (like Adrian commented):

test=> SELECT NULL::date, pg_typeof(NULL::date);
 date | pg_typeof 
------+-----------
      | date
(1 row)

But trying to discuss NULL as value (when used as bound of a range) is a misleading approach to begin with. It's not a value.

... (and PostgreSQL even converted the inclusive boundary on the lower NULL bound in daterange(NULL,'2021-05-21','[]') to an exclusive boundary, (,2021-05-22) to be doubly sure).

Again, NULL is not treated as value in the domain of the range. It just serves as convenient syntax to say: "unbounded". No more than that.

Silicate answered 19/5, 2021 at 23:59 Comment(2)
" -infinity / infinity are actual values in the domain of the range. Abstract values [...], but values nonetheless.". Seeing that stated explicitly really helped. When I did my own testing, I saw I could specify infinite boundaries as either inclusive or exclusive, but I never put it together that infinity is an exact value in PostgreSQL, unlike in mathematics. So while you'll never say "The date today is infinity" or "I'm going to rent this tool starting tomorrow, until the day before infinity" IRL, it's totally OK to say in PostgreSQL, even if "infinity" isn't an IRL date.Dupuis
Another interesting thing I noticed after a little more testing: you can create date ranges where element <@ range is never true, even when one boundary is NULL/boundless. For example, with daterange(NULL, '-infinity', '()') and daterange('infinity', NULL, '()'), there are no date values within those ranges even though they're technically unbounded on one end, since infinity and -infinity are the largest possible dates and they're not included in the ranges.Dupuis

© 2022 - 2024 — McMap. All rights reserved.