Oracle text escaping with curly braces and wildcards
Asked Answered
B

1

26

I want to be able to escape the search criteria in an Oracle text query using contains and combine the escaped criteria with wildcards to have "doubly truncated" criteria. (I know my indexes may not be setup for ideal performance, but that is superfluous). I want to be able to use the curly braces syntax for best readability but this doesn't work. According to the top answer on this related (but not duplicate) question, curly braces define complete tokens. Is there any way to disable or work around this behavior?

Oracle Text: How to sanitize user input

I would rather avoid having to escape every single character in my search criteria (as per the last select in my code) or try to search the string for special characters since reserved words are also considered "special". (Note that I have no stop words) The following demonstrates my problem. (Unfortunately SQLFiddle does not appear to support Oracle text):

create table MY_TABLE(MY_COL varchar2(20));
insert into MY_TABLE(MY_COL) values ('abc');
insert into MY_TABLE(MY_COL) values ('abcd');
insert into MY_TABLE(MY_COL) values ('abcde');
insert into MY_TABLE(MY_COL) values ('bcd');
insert into MY_TABLE(MY_COL) values ('bcde');

create index FTIX on MY_TABLE (MY_COL)
indextype is CTXSYS.CONTEXT
PARAMETERS ('STOPLIST CTXSYS.EMPTY_STOPLIST SYNC (ON COMMIT)');

select * from MY_TABLE where CONTAINS(MY_COL, '%bcd%') > 0; --expected results
select * from MY_TABLE where CONTAINS(MY_COL, '%{bcd}%') > 0; --no results
select * from MY_TABLE where CONTAINS(MY_COL, '{bcd}') > 0; --returns bcd
select * from MY_TABLE where CONTAINS(MY_COL, '{%bcd%}') > 0; --returns bcd
select * from MY_TABLE where CONTAINS(MY_COL, '%\b\c\d%') > 0; --expected results
Beasley answered 12/5, 2015 at 14:53 Comment(2)
Good to see you provided the create and insert statements, something which is not usually seen on Stack Overflow. +1 for that!Kraken
Could you construct a search string in a user function so that you have something like select * from MY_TABLE where CONTAINS(MY_COL, DoubleEscape('bcd')) > 0 where DoubleEscape is the name of your function? In the function, just loop over the string and escape everything.Lionfish
L
3

How about:

select * from MY_TABLE where CONTAINS(MY_COL, regexp_replace('*abc*', '([^*])', '\\\1')) > 0
Lionfish answered 29/5, 2015 at 17:36 Comment(3)
I'll use that over my current approach, however I was looking for something inside the fulltext engine rather than a "pre-processing" hack. I can do the same inside my code or push it off to Oracle just as easily.Beasley
revised again to remove unnecessary discussionLionfish
I was trying to implement an approach with curly braces. Escaping is generally unnecessary when combined with wildcards. Hence, for sea within the earth title, you can do CONTAINS(title, '%ea {within} {the} ear%') > 0;. Input potentially can be sanitized before doing this by removing non-alphabetical chars that are treated as whitespace by BASIC_LEXER. However, this becomes complex / breaks when considering the question of what non-alphabetical chars are. Accented chars are indexed. The proposed approach with escaping every character is actually simpler and would work better.Damek

© 2022 - 2024 — McMap. All rights reserved.