SQL Server Full Text Search ampersand (&)
Asked Answered
M

2

11

have problem with ampersand (&)

How to search for the words (or sentences) that contain an ampersand (&).

For example, in the database are:

1: "Johnson & Johnson"
2: "AT&T"
3: "Sample & Sample"

How should I write a full text search query to search for individual records?

SELECT * from Companies c WHERE CONTAINS(c.CompanyName, '"AT&T"')

I know that character (&) is responsible for the logical AND operation. But I do not know how to encode it to search in text with use full text search.

Any idea?

Medwin answered 10/10, 2011 at 15:43 Comment(1)
Hi there, @Medwin . Any chance you would be willing to switch which answer is accepted? I'm not asking so much because the other answer is mine, but because the currently accepted answer is incorrect (as I explain in my answer) and it can be confusing / misleading to some readers to have an incorrect answer be the accepted one (they might not look at the other answer and realize that they can indeed solve this issue). Thanks 😺Mize
T
8

Short version: You can't (or at least you can, but you may get more results than you expected)

Long version: The character '&' is treated as a "word breaker", i.e. when SQL Server encounters an '&' it treats it as the start of a new "word" (i.e. token). What SQL Server Sees when parsing "AT&T" is two tokens, "AT" and "T".

You can check this for yourself using sys.dm_fts_parser:

SELECT * FROM sys.dm_fts_parser('AT&T', 1033, 0, 0)

keyword     group_id    phrase_id   occurrence  special_term  display_term  expansion_type source_term
----------- ----------- ----------- ----------- ------------- ------------- -------------- -----------
0x00610074  1           0           1           Noise Word    at            0              AT
0x0074      2           0           1           Noise Word    t             0              T

This means that search for "AT&T" is pretty much exactly the same as just searching for "AT T".

This is by design, as far as I can see the only way to modify this behaviour would be to install your own word breaker, however this isn't something that I would recommend doing.

Ticket answered 10/10, 2011 at 15:56 Comment(0)
M
6

The accepted answer isn't entirely correct. Enclosing the search term in double-quotes makes the grouping of words a "phrase" match. In this case, the ampsersand ( & ) can be treated as a literal character, such as when surrounded by one or more letters that do not form a known word. Just looking at your "AT&T" example, we see:

DECLARE @Term NVARCHAR(100);
SET @Term = N'"AT&T"';

SELECT * FROM sys.dm_fts_parser(@Term, 1033, 0, 0);
SELECT * FROM sys.dm_fts_parser(@Term, 1033, 0, 1);
SELECT * FROM sys.dm_fts_parser(@Term, 1033, NULL, 0);
SELECT * FROM sys.dm_fts_parser(@Term, 1033, NULL, 1);
GO

Returns:

keyword             group  phrase  occurrence  special      display  expansion  source
                    id     id                  term         term     type       term
0x0061007400260074  1      0       1           Exact Match  at&t     0          AT&T

As you can see, the ampersand presents no problem at all, as long as it is enclosed in double-quotes ( " ) which you are already doing, woo hoo!

However, that doesn't work as cleanly for the "Johnson & Johnson" example:

DECLARE @Term NVARCHAR(100);
SET @Term = N'"Johnson & Johnson"';

SELECT * FROM sys.dm_fts_parser(@Term, 1033, 0, 0);
SELECT * FROM sys.dm_fts_parser(@Term, 1033, 0, 1);
SELECT * FROM sys.dm_fts_parser(@Term, 1033, NULL, 0);
SELECT * FROM sys.dm_fts_parser(@Term, 1033, NULL, 1);
GO

Returns:

keyword                         group  phrase  occurrence  special      display  expansion  source
                                id     id                  term         term     type       term
0x006A006F0068006E0073006F006E  1      0       1           Exact Match  johnson  0          Johnson & Johnson
0x006A006F0068006E0073006F006E  1      0       2           Exact Match  johnson  0          Johnson & Johnson

That would seem to also match a search term of Johnson Johnson, which isn't technically correct.

So, in addition to enclosing in double-quotes, you can also convert the ampersand to be an underscore ( _ ) which is handled differently:

DECLARE @Term NVARCHAR(100);
SET @Term = N'"Johnson _ Johnson"';

SELECT * FROM sys.dm_fts_parser(@Term, 1033, 0, 0);
SELECT * FROM sys.dm_fts_parser(@Term, 1033, 0, 1);
SELECT * FROM sys.dm_fts_parser(@Term, 1033, NULL, 0);
SELECT * FROM sys.dm_fts_parser(@Term, 1033, NULL, 1);
GO

Returns:

keyword                         group  phrase  occurrence  special      display  expansion  source
                                id     id                  term         term     type       term
0x006A006F0068006E0073006F006E  1      0       1           Exact Match  johnson  0          Johnson _ Johnson
0x005F                          1      0       2           Exact Match  _        0          Johnson _ Johnson
0x006A006F0068006E0073006F006E  1      0       3           Exact Match  johnson  0          Johnson _ Johnson

AND, doing that one character translation does not seem to adversely affect the original "AT&T" search:

DECLARE @Term NVARCHAR(100);
SET @Term = N'"AT_T"';

SELECT * FROM sys.dm_fts_parser(@Term, 1033, 0, 0);
SELECT * FROM sys.dm_fts_parser(@Term, 1033, 0, 1);
SELECT * FROM sys.dm_fts_parser(@Term, 1033, NULL, 0);
SELECT * FROM sys.dm_fts_parser(@Term, 1033, NULL, 1);

Returns:

keyword             group  phrase  occurrence  special      display  expansion  source
                    id     id                  term         term     type       term
0x00610074005F0074  1      0       1           Exact Match  at_t     0          AT_T
Mize answered 23/9, 2016 at 20:9 Comment(2)
This should be the accepted answer. But, given that it's five years later, it's understandable that it's not.Outthink
@JacobStamm Thanks! I don't think people are always notified of new answers. Not sure. So, I added a comment to the question asking the O.P. (who appears to still be active according to their profile) to consider switching the acceptance.Mize

© 2022 - 2024 — McMap. All rights reserved.