What is Full Text Search vs LIKE
Asked Answered
Q

5

174

I just read a post mentioning "full text search" in SQL.

I was just wondering what the difference between FTS and LIKE are. I did read a couple of articles but couldn't find anything that explained it well.

Quartile answered 22/10, 2008 at 7:0 Comment(0)
S
207

In general, there is a tradeoff between "precision" and "recall". High precision means that fewer irrelevant results are presented (no false positives), while high recall means that fewer relevant results are missing (no false negatives). Using the LIKE operator gives you 100% precision with no concessions for recall. A full text search facility gives you a lot of flexibility to tune down the precision for better recall.

Most full text search implementations use an "inverted index". This is an index where the keys are individual terms, and the associated values are sets of records that contain the term. Full text search is optimized to compute the intersection, union, etc. of these record sets, and usually provides a ranking algorithm to quantify how strongly a given record matches search keywords.

The SQL LIKE operator can be extremely inefficient. If you apply it to an un-indexed column, a full scan will be used to find matches (just like any query on an un-indexed field). If the column is indexed, matching can be performed against index keys, but with far less efficiency than most index lookups. In the worst case, the LIKE pattern will have leading wildcards that require every index key to be examined. In contrast, many information retrieval systems can enable support for leading wildcards by pre-compiling suffix trees in selected fields.

Other features typical of full-text search are

  • lexical analysis or tokenization—breaking a block of unstructured text into individual words, phrases, and special tokens
  • morphological analysis, or stemming—collapsing variations of a given word into one index term; for example, treating "mice" and "mouse", or "electrification" and "electric" as the same word
  • ranking—measuring the similarity of a matching record to the query string
Suspire answered 22/10, 2008 at 7:8 Comment(1)
ranking is better explained in @VipinJain's answerAntistrophe
W
44

FTS involves indexing the individual words within a text field in order to make searching through many records quick. Using LIKE still requires you to do a string search (linear or the like) within the field.

Wesla answered 22/10, 2008 at 7:4 Comment(0)
T
20

Like uses wildcards only, and isn't all that powerful.

Full text allows much more complex searching, including And, Or, Not, even similar sounding results (SOUNDEX) and many more items.

I would start looking at the SQL CONTAINS() FREETEXT() and related Full Text search items to help get a better understanding of what is available.

Trowbridge answered 22/10, 2008 at 7:5 Comment(1)
I highly recommend everyone checking SOUNDEXCongius
M
14

The real difference is the scanning methodologies. For full-text search, the words (terms) are used as hash keys - each of which is associated with an array of documents the keys (terms) appears in. Its like this:

Document sets = {d1, d2, d3, d4, ... dn}
Term sets = {t1, t2, t3, .. tn}

Now term-document matrix (which term member of which document) can be represented as:

t1 -> {d1, d5, d9,.. dn}
t2 -> {d11, d50, d2,.. dn}
t3 -> {d23, d67, d34,.. dn}
:
tn -> {d90, d87, d57,.. dn}

When the request comes in asking for "Get me all documents containing the word/term t1" - then the document set {d1, d5, d9,.. dn} is returned.

You could hack a de-normalized table schema to store documents - each row in MySQL table will be considered as "document" and a TEXT column could contain a paragraph etc. The inverted index will contain the terms as hash keys and the row-ids as the document ids.

Remember that this SQL query will have more or less O(1) performance. The query will be independent of

  1. Number of words/terms in the TEXT column
  2. The number of rows/documents matching the criteria
  3. The length of the words/terms

For instance this SQL could be fired to extract all rows matching the given word XYZ:

SELECT * 
FROM   my_table 
WHERE  MATCH (my_text_column) against ('XYZ' IN boolean mode) ;

Caveat: If you add ORDER BY to this query, your runtimes will vary based on the several parameters, one of which is the number of matching rows/documents. So beware.

The LIKE however has got nothing of this. It is forced to linearly scan the sentence/string and find all matching terms. Adding wild card adds to the mess. It works great for small length strings, as you can imagine, but will fail miserably for longer sentences. And definitely not comparable when having a paragraph or a whole page of text etc.

Mudlark answered 1/4, 2015 at 5:4 Comment(0)
R
4

FTS is more efficient, powerful (especially for Word Breakers and stemming functionalities) ... but check your requirements because sometimes DBs don't support all languages for example MSSQL doesn't support Greek (check on this page http://msdn.microsoft.com/en-us/library/ms176076(v=sql.110).aspx )

Ruggiero answered 9/2, 2012 at 15:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.