How to index jsonb integer values
Asked Answered
L

4

11

I'm trying to use the "newish" JSONB type.

I have a documents table with a properties jsonb field, and in that is a field publication_year. I want to find all document records within a year range e.g. 2013-2015. [EDIT: Querying for a range of values is the main challenge here, even though I have used an exact match example below. The requested approach would also apply for, say dollar ranges (price > $20 and price < $40) or timestamp ranges).]

I have tried:

create index test1 on documents using gin ((cast(properties->'announced_on_year' as integer)));

ERROR:  cannot cast type jsonb to integer

as well as:

create index test1 on documents using gin (cast(properties->>'publication_year' as integer));

ERROR:  data type integer has no default operator class for access method "gin"
HINT:  You must specify an operator class for the index or define a default operator class for the data type.`

I saw from this post http://www.postgresql.org/message-id/[email protected] that this should be possible, but I can't figure out the right syntax.

When I just do a straightforward index:

create index test1 on documents using gin ((properties->'publication_year'));

an index is created, but I cannot query it using integer values to get a range going, it says

select count(*) from documents where properties->>'publication_year' = 2015;
ERROR:  operator does not exist: text = integer
LINE 1: ...*) from documents where properties->>'publication_year' = 2015;
                              ^
HINT:  No operator matches the given name and argument type(s). You might need to add explicit type casts.

Any tips and hints highly appreciated. I'm sure others will benefit too. TIA

Letdown answered 22/3, 2015 at 16:12 Comment(0)
L
1

Why don't you define an index for the whole jsonb field, as described in the doc?

create index test1 on documents using gin (properties);
Luteolin answered 22/3, 2015 at 16:47 Comment(2)
Thank you-- I think this is the answer I was seeking. Even though the docs don't specify it, you can with this solution use CAST to get range results, as in: `EXPLAIN ANALYZE SELECT COUNT(*) FROM documents WHERE cast(properties->>'publication_year' AS integer) > 2012 AND cast(properties->>'publication_year' AS integer) < 2016;Letdown
I am not sure, if using it this way will have an impact on performance (which I guess is the goal). According to the doc - the "->>" operator is nott supported by this type of index. Also - you can combine x > A AND x < B into x BETWEEN A AND BLuteolin
P
6

I've found in my experiences that using GIN indexes on JSONB columns was not faster. You can just create a normal index by casting it to an integer

CREATE INDEX test1 ON documents ((properties->>'publication_year')::int);

Also, GIN has some limitations that should be considered before creating one. Even indexing the entire JSONB column can result in massive table-sized indexes.

This is based on my experience and looking through the Postgres documentation.

Pooi answered 21/3, 2017 at 15:28 Comment(0)
P
3

1) There are no GIN indexes for integer (at least not out of the box), use a btree.

create index test1 on documents using btree (cast (properties->>'announced_on_year' as int));

2) The error is pretty self-explanatory, cast the integer as text or use text for the comparison:

select count(*) from documents where properties->>'publication_year' = '2015';
Pooh answered 22/3, 2015 at 16:23 Comment(3)
Thank you, that is helpful, but it does not address my main problem: range querying (perhaps I should have been more explicit). The end goal is something like find all records where publication_year is > 2012 and < 2016.Letdown
@WillKessler You can do that with a b-tree index. Have you tried?Pooh
I had tried, but was not getting the query right. But now with your help and murison, I've got it, see the comment above to murison's response. Thank you for your assistance however.Letdown
S
2

You can cast as integer and use contrib/btree_gin extension.

create extension btree_gin;
create index tt_jb_int_idx on tt using gin( cast (jb->>'price' as int));
explain analyze select * from tt where cast(jb->>'price' as int)  > 3 and cast(jb->>'price' as int) > 'price'::text))::integer > 3) AND (((jb ->> 'price'::text))::integer   Bitmap Index Scan on tt_jb_int_idx  (cost=0.00..28.06 rows=6 width=0) (actual time=0.016..0.016 rows=1 loops=1)
         Index Cond: ((((jb ->> 'price'::text))::integer > 3) AND (((jb ->> 'price'::text))::integer 
Sanctify answered 17/2, 2017 at 13:47 Comment(0)
L
1

Why don't you define an index for the whole jsonb field, as described in the doc?

create index test1 on documents using gin (properties);
Luteolin answered 22/3, 2015 at 16:47 Comment(2)
Thank you-- I think this is the answer I was seeking. Even though the docs don't specify it, you can with this solution use CAST to get range results, as in: `EXPLAIN ANALYZE SELECT COUNT(*) FROM documents WHERE cast(properties->>'publication_year' AS integer) > 2012 AND cast(properties->>'publication_year' AS integer) < 2016;Letdown
I am not sure, if using it this way will have an impact on performance (which I guess is the goal). According to the doc - the "->>" operator is nott supported by this type of index. Also - you can combine x > A AND x < B into x BETWEEN A AND BLuteolin

© 2022 - 2024 — McMap. All rights reserved.