What are some ways I can improve the performance of a regular expression query in PostgreSQL 8?
Asked Answered
S

7

6

I'm performing a regular expression match on a column of type character varying(256) in PostgreSQL 8.3.3. The column currently has no indices. I'd like to improve the performance of this query if I can.

Will adding an index help? Are there other things I can try to help improve performance?

Saadi answered 30/3, 2009 at 20:56 Comment(2)
More info on what your regex looks like would be helpful. Also, what's the data like in your columns?Deservedly
Create a function (and a functional index) that returns whether your field matches your regex, no need to create new fields/tables or triggers.Seedtime
C
6

You cannot create an index that will speed up any generic regular expression; however, if you have one or a limited number of regular expressions that you are matching against, you have a few options.

As Paul Tomblin mentions, you can use an extra column or columns to indicate whether or not a given row matches that regex or regexes. That column can be indexed, and queried efficiently.

If you want to go further than that, this paper discusses an interesting sounding technique for indexing against regular expressions, which involves looking for long substrings in the regex and indexing based on whether those are present in the text to generate candidate matches. That filters down the number of rows that you actually need to check the regex against. You could probably implement this using GiST indexes, though that would be a non-trivial amount of work.

Cle answered 30/3, 2009 at 22:13 Comment(0)
O
5

An index can't do anything with a regular expression. You're going to have to do a full table scan.

If at all possible, like if you're querying for the same regex all the time, you could add a column that specifies whether this row matches that regex and maintain that on inserts and updates.

Organization answered 30/3, 2009 at 20:58 Comment(0)
S
0

If you have a limited set of regexes to match against you could create a table with the primary key of your table and a field indicating if it matches that regex, which you would update on a trigger and then index your tables key in that table. This trades a small decrease in update and insert speed for a probably large speed increase in select.

Alternatively, you could write a function which compares your field to that regex (or even pass the regex along with the field you are matching to the function), then create a functional index on your table against that function. This also assumes a fixed set of regexes (but you can add new regex matches more easily this way).

If the regex is dynamically created from user input you might have to live with the table scan or change the user app to produce a more simple search like field like 'value%', which would use an index on field ('%value%' wouldn't).

Seedtime answered 30/3, 2009 at 20:56 Comment(0)
C
0

Regex matches do not perform well on fairly big text columns. Try to accomplish this without the regex, or do the match in code if the dataset is not large.

Cordilleras answered 30/3, 2009 at 20:59 Comment(0)
B
0

This may be one of those times when you don't want to use RegEx. What does your reg-ex code look like? Maybe that's a way to speed it up.

Beltane answered 30/3, 2009 at 21:1 Comment(0)
A
0

If you do manage to reduce your needs to a simple LIKE query, look up indexes with text_pattern_ops to speed those up.

Anacreontic answered 1/4, 2009 at 2:41 Comment(0)
T
-1

I think the accepted answer is outdated. We can use inbuilt GiSt indexes or use a GIN index to optimise regex search queries. This medium article I found should help.

Tarry answered 16/8, 2024 at 17:40 Comment(1)
Of course it's outdated, it (and the question) are 15 years old. But please don't just post a link to an external site. Include the relevant parts of the link as blockquotes, or summarize in your own words. In either case, ensure it's clear what the source of the information is.Prelusive

© 2022 - 2025 — McMap. All rights reserved.