Oracle Text: How to sanitize user input

Asked 20/10, 2008 at 18:56 Answered 22/1, 2017 at 4:17

If anyone has experience using Oracle text (CTXSYS.CONTEXT), I'm wondering how to handle user input when the user wants to search for names that may contain an apostrophe.

Escaping the ' seems to work in some cases, but not for 's at the end of the word - s is in the list of stop words, and so seems to get removed.

We currently change simple query text (i.e. anything that's just letters) to %text%, for example:

contains(field, :text) > 0

A search for O'Neil works, but Joe's doesn't.

Has anyone using Oracle Text dealt with this issue?

Devon answered 20/10, 2008 at 18:56 Comment(0)

Escape all special characters with backslashes. Curly braces won't work with substring searches as they define complete tokens. Eg %{ello}% won't match the token 'Hello'

Escaped space characters will be included in the search token, so the search string '%stay\ near\ me%' will be treated as a literal string "stay near me" and will not invoke the 'near' operator.

If you are indexing short strings (like names, etc ) and you want Oracle Text to behave exactly as the like operator, you must write your own lexer that won't create tokens for individual words. (Unfortunately CATSEARCH does not support substring search...)

It is probably a good idea to change the searches to use oracle text's semantics, with token matching, but for some applications, the wildcard expansion of multiple (short) tokens and numeric tokens will create too many hits for search strings that the users reasonably would expect to work.

Eg, a search for "%I\ AM\ NUMBER\ 9%" will most likely fail if there are a lot of numeric tokens in the indexed data, since all tokens ending with 'I' and starting with '9' must be searched and merged before the result can be returned.

'I' and 'AM' is probably also in the default stoplist and will be totally ignored, so for this hypothetical application, a null stoplist may be used if these tokens are important.

Munch answered 26/1, 2012 at 22:51 Comment(0)

Using PARAMETERS('STOPLIST ctxsys.empty_stoplist') when indexing would include all alphabetical tokens in the index. Accented characters are indexed as well. Non-alphabetical characters are generally treated as whitespace by BASIC_LEXER.

Also, CONTEXT grammar uses a lot of operators that include symbols and reserved words such as WITHIN, NEAR, ABOUT. These all have to be escaped somehow in the input. If you need to search for substrings, the correct approach to escaping is to escape all characters with \. This is an answer to a related question here: Oracle text escaping with curly braces and wildcards. If your requirements is to search for whole terms (names, etc.) you can use simpler {input} escaping.

Ruvolo answered 22/1, 2017 at 4:17 Comment(0)

-2

Forget about sanitizing. Why? Refer to http://en.wikipedia.org/wiki/SQL_injection .

It depends on the kind of database interface API you are using. Perl DBI, ODBC, JDBC support parameterized queries or prepared statements. If you're using a native DBI and it doesn't support it, then God bless you.

Nikolia answered 21/10, 2008 at 0:37 Comment(1)

Not actually related to the question. Oracle Text search arguments are defined using a specific grammar and parsed by the search engine. (A bunch of stored procedures, basically) Prepared statements won't help at all. Special characters and reserved words must be escaped if the api should not be presented to users. – Munch 26/1, 2012 at 22:24

Recommended topics

Hot tags