How do you get leading wildcard full-text searches to work in SQL Server?
Asked Answered
B

13

51

Note: I am using SQL's Full-text search capabilities, CONTAINS clauses and all - the * is the wildcard in full-text, % is for LIKE clauses only.

I've read in several places now that "leading wildcard" searches (e.g. using "*overflow" to match "stackoverflow") is not supported in MS SQL. I'm considering using a CLR function to add regex matching, but I'm curious to see what other solutions people might have.

More Info: You can add the asterisk only at the end of the word or phrase. - along with my empirical experience: When matching "myvalue", "my*" works, but "(asterisk)value" returns no match, when doing a query as simple as:

SELECT * FROM TABLENAME WHERE CONTAINS(TextColumn, '"*searchterm"');

Thus, my need for a workaround. I'm only using search in my site on an actual search page - so it needs to work basically the same way that Google works (in the eyes on a Joe Sixpack-type user). Not nearly as complicated, but this sort of match really shouldn't fail.

Bub answered 6/8, 2008 at 13:52 Comment(0)
P
28

Workaround only for leading wildcard:

  • store the text reversed in a different field (or in materialised view)
  • create a full text index on this column
  • find the reversed text with an *

    SELECT * 
    FROM TABLENAME 
    WHERE CONTAINS(TextColumnREV, '"mrethcraes*"');
    

Of course there are many drawbacks, just for quick workaround...

Not to mention CONTAINSTABLE...

Protuberance answered 23/9, 2008 at 23:11 Comment(0)
B
18

The problem with leading Wildcards: They are not sargable, hence you're doing a full table scan.

Barrault answered 6/8, 2008 at 14:7 Comment(1)
Why they cannot be indexed?Spinelli
T
11

It is possible to use the wildcard "*" at the end of the word or phrase (prefix search).

For example, this query will find all "datab", "database", "databases" ...

SELECT * FROM SomeTable WHERE CONTAINS(ColumnName, '"datab*"')

But, unforutnately, it is not possible to search with leading wildcard.

For example, this query will not find "database"

SELECT * FROM SomeTable WHERE CONTAINS(ColumnName, '"*abase"')
Tanganyika answered 26/11, 2008 at 9:22 Comment(1)
I did quite a few searches for this very thing, and sadly, most people are wrong and think they can do a leading wildcard search. A leading wildcard search does not work. Franjo is correct, the wildcard must be at the end of the search phrase. I'm using SQL 2008 R2. It doesn't find it at all (it doesn't do a table or index scan and find it - it won't find it at all)Menorrhagia
C
4

To perhaps add clarity to this thread, from my testing on 2008 R2, Franjo is correct above. When dealing with full text searching, at least when using the CONTAINS phrase, you cannot use a leading , only a trailing functionally. * is the wildcard, not % in full text.

Some have suggested that * is ignored. That does not seem to be the case, my results seem to show that the trailing * functionality does work. I think leading * are ignored by the engine.

My added problem however is that the same query, with a trailing *, that uses full text with wildcards worked relatively fast on 2005(20 seconds), and slowed to 12 minutes after migrating the db to 2008 R2. It seems at least one other user had similar results and he started a forum post which I added to... FREETEXT works fast still, but something "seems" to have changed with the way 2008 processes trailing * in CONTAINS. They give all sorts of warnings in the Upgrade Advisor that they "improved" FULL TEXT so your code may break, but unfortunately they do not give you any specific warnings about certain deprecated code etc. ...just a disclaimer that they changed it, use at your own risk.

http://social.msdn.microsoft.com/Forums/ar-SA/sqlsearch/thread/7e45b7e4-2061-4c89-af68-febd668f346c

Maybe, this is the closest MS hit related to these issues... http://msdn.microsoft.com/en-us/library/ms143709.aspx

Charyl answered 6/10, 2011 at 18:17 Comment(0)
R
3

Note: this was the answer I submitted for the original version #1 of the question before the CONTAINS keyword was introduced in revision #2. It's still factually accurate.

The wildcard character in SQL Server is the % sign and it works just fine, leading, trailing or otherwise.

That said, if you're going to be doing any kind of serious full text searching then I'd consider utilising the Full Text Index capabilities. Using % and _ wild cards will cause your database to take a serious performance hit.

Reiko answered 6/8, 2008 at 13:57 Comment(2)
Given that he used the CONTAINS keyword, he seems to already be using FullText Indexing.Guillemette
@Guillemette - I answered revision #1 of the question before the CONTAINS keyword was mentioned. This was a very long time ago.Reiko
S
3

One thing worth keeping in mind is that leading wildcard queries come at a significant performance premium, compared to other wildcard usages.

Select answered 6/8, 2008 at 14:3 Comment(0)
D
1

Just FYI, Google does not do any substring searches or truncation, right or left. They have a wildcard character * to find unknown words in a phrase, but not a word.

Google, along with most full-text search engines, sets up an inverted index based on the alphabetical order of words, with links to their source documents. Binary search is wicked fast, even for huge indexes. But it's really really hard to do a left-truncation in this case, because it loses the advantage of the index.

Dunigan answered 16/9, 2008 at 0:22 Comment(0)
P
1

As a parameter in a stored procedure you can use it as:

ALTER procedure [dbo].[uspLkp_DrugProductSelectAllByName]
(
    @PROPRIETARY_NAME varchar(10)
)
as
    set nocount on
    declare @PROPRIETARY_NAME2 varchar(10) = '"' + @PROPRIETARY_NAME + '*"'

    select ldp.*, lkp.DRUG_PKG_ID
    from Lkp_DrugProduct ldp
    left outer join Lkp_DrugPackage lkp on ldp.DRUG_PROD_ID = lkp.DRUG_PROD_ID
    where contains(ldp.PROPRIETARY_NAME, @PROPRIETARY_NAME2)
Piffle answered 4/1, 2016 at 23:59 Comment(0)
N
0

When it comes to full-text searching, for my money nothing beats Lucene. There is a .Net port available that is compatible with indexes created with the Java version.

There's a little work involved in that you have to create/maintain the indexes, but the search speed is fantastic and you can create all sorts of interesting queries. Even indexing speed is pretty good - we just completely rebuild our indexes once a day and don't worry about updating them.

As an example, this search functionality is powered by Lucene.Net.

Nabal answered 8/8, 2008 at 15:34 Comment(0)
A
0

Perhaps the following link will provide the final answer to this use of wildcards: Performing FTS Wildcard Searches.

Note the passage that states: "However, if you specify “Chain” or “Chain”, you will not get the expected result. The asterisk will be considered as a normal punctuation mark not a wildcard character. "

Addict answered 18/10, 2016 at 17:34 Comment(0)
S
0

If you have access to the list of words of the full text search engine, you could do a 'like' search on this list and match the database with the words found, e.g. a table 'words' with following words:

    pie
    applepie
    spies
    cherrypie
    dog
    cat

To match all words containing 'pie' in this database on a fts table 'full_text' with field 'text':

    to-match <- SELECT word FROM words WHERE word LIKE '%pie%'
    matcher = ""
    a = ""
    foreach(m, to-match) {
      matcher += a
      matcher += m
      a = " OR "
    }
    SELECT text FROM full_text WHERE text MATCH matcher
Solder answered 2/7, 2018 at 14:20 Comment(0)
S
-2

% Matches any number of characters _ Matches a single character

I've never used Full-Text indexing but you can accomplish rather complex and fast search queries with simply using the build in T-SQL string functions.

Stratocracy answered 6/8, 2008 at 14:0 Comment(0)
H
-2

From SQL Server Books Online:

To write full-text queries in Microsoft SQL Server 2005, you must learn how to use the CONTAINS and FREETEXT Transact-SQL predicates, and the CONTAINSTABLE and FREETEXTTABLE rowset-valued functions.

That means all of the queries written above with the % and _ are not valid full text queries.

Here is a sample of what a query looks like when calling the CONTAINSTABLE function.

SELECT RANK , * FROM TableName , CONTAINSTABLE (TableName, *, ' "*WildCard" ') searchTable WHERE [KEY] = TableName.pk ORDER BY searchTable.RANK DESC

In order for the CONTAINSTABLE function to know that I'm using a wildcard search, I have to wrap it in double quotes. I can use the wildcard character * at the beginning or ending. There are a lot of other things you can do when you're building the search string for the CONTAINSTABLE function. You can search for a word near another word, search for inflectional words (drive = drives, drove, driving, and driven), and search for synonym of another word (metal can have synonyms such as aluminum and steel).

I just created a table, put a full text index on the table and did a couple of test searches and didn't have a problem, so wildcard searching works as intended.

[Update]

I see that you've updated your question and know that you need to use one of the functions.

You can still search with the wildcard at the beginning, but if the word is not a full word following the wildcard, you have to add another wildcard at the end.

Example:  "*ildcar" will look for a single word as long as it ends with "ildcar".

Example:  "*ildcar*" will look for a single word with "ildcar" in the middle, which means it will match "wildcard".  [Just noticed that Markdown removed the wildcard characters from the beginning and ending of my quoted string here.]

[Update #2]

Dave Ward - Using a wildcard with one of the functions shouldn't be a huge perf hit. If I created a search string with just "*", it will not return all rows, in my test case, it returned 0 records.

Housecarl answered 6/8, 2008 at 15:8 Comment(7)
I cannot reproduce this in SQL 2005. Using a * at the front of the search string as shown results in no rows returned.Premonitory
Not sure why this is marked as the answer, because it's not completely accurate. The leading wildcard does not work on a full-text search. Verified this in SQL Server 2008 on a full-text index using containstable function. See Michael Stum's answer/post for an understanding why.Inainability
@Inainability - Provide a better answer then.Bub
Agreed with @Jagd, this should be voted down and unmarked as the answer. OP is clearly talking about FT indexes (hence the * as wildcard) and it's purely and simply incorrect to state that * works as a prefix in a FT index operation. Happy to be proven wrong but don't think I am.Androgen
When I asked this question, it was based on how SQL 2005 dealt with full text indexes, which is why I referenced the 2005 Books Online.Housecarl
a leading wildcard does not work in sqlserver, so this answer is incorrect when viewed from the perspective of the original question.Brevier
What is of note here is the double quotes inside of single quotes. My original function had 'term*' this did NOT work. However '"term*"' does. ThanksBrief

© 2022 - 2024 — McMap. All rights reserved.