Using SQLServer contains for partial words

Asked 7/6, 2016 at 10:45 Answered 22/8, 2020 at 10:38

sql-server azure-sql-database contains sql-like

We are running many products search on a huge catalog with partially matched barcodes.

We started with a simple like query

select * from products where barcode like '%2345%'

But that takes way too long since it requires a full table scan. We thought a fulltext search will be able to help us here using contains.

select * from products where contains(barcode, '2345')

But, it seems like contains doesn't support finding words that partially contains a text but, only full a word match or a prefix. (But in this example we're looking for '123456').

Interpolate answered 7/6, 2016 at 10:45 Comment(8)

FTS is language oriented - words and phrases. – Trillby 7/6, 2016 at 10:53

If you want to find 123456 with full-text search using CONTAINS you can use 123456 or "123*" its equivalent of =123456 and LIKE '123%'. It is how it works by design. – Hagiographer 7/6, 2016 at 10:54

@Hagiographer thanks, but we need a real contains not just prefix. – Interpolate 7/6, 2016 at 10:58

@Trillby does it mean I can find "low fat milk" when sending contains(name, 'il')? – Interpolate 7/6, 2016 at 10:59

@GuyKorland It is a REAL CONTAINS in full-text search. There is no equivalent to LIKE. – Hagiographer 7/6, 2016 at 11:6

There really isn't going to be a better option than LIKE for the problem as described. But I'm curious, if these are barcodes then they must conform to a certain set of rules. There may be other ways to get creative here and solve the problem using a different approach. Why exactly are you looking for a partial match? What is the end goal here? – Hogan 6/7, 2016 at 14:38

Well, you could always use PATINDEX then...PATINDEX returns a number or the position of the string (not strings), and since it is a Boolean statement, you at least stay lean. – Lynea 8/7, 2016 at 7:32

Have you tried columnstore index? – Honshu 13/7, 2016 at 12:48

My answer is: @DenisReznik was right :)

ok, let's take a look.
I have worked with barcodes and big catalogs for many years and I was curious about this question.

So I have made some tests on my own.

I have created a table to store test data:

CREATE TABLE [like_test](
    [N] [int] NOT NULL PRIMARY KEY,
    [barcode] [varchar](40) NULL
)

I know that there are many types of barcodes, some contains only numbers, other contains also letters, and other can be even much complex.

Let's assume our barcode is a random string.
I have filled it with 10 millions records of random alfanumeric data:

insert into like_test
select (select count(*) from like_test)+n, REPLACE(convert(varchar(40), NEWID()), '-', '') barcode 
from FN_NUMBERS(10000000)

FN_NUMBERS() is just a function I use in my DBs (a sort of tally_table) to get records quick.

I got 10 million records like that:

N   barcode
1   1C333262C2D74E11B688281636FAF0FB
2   3680E11436FC4CBA826E684C0E96E365
3   7763D29BD09F48C58232C7D33551E6C9

Let's declare a var to search for:

declare @s varchar(20) = 'D34F15' -- random alfanumeric string

Let's take a base try with LIKE to compare results to:

select * from like_test where barcode like '%'+@s+'%'

On my workstation it takes 24.4 secs for a full clustered index scan. Very slow.

SSMS suggests to add an index on barcode column:

CREATE NONCLUSTERED INDEX [ix_barcode] ON [like_test] ([barcode]) INCLUDE ([N])

500Mb of index, I retry the select, this time 24.0 secs for the non clustered index seek.. less than 2% better, almost the same result. Very far from the 75% supposed by SSMS. It seems to me this index really doesn't worth it. Maybe my SSD Samsung 840 is making the difference..
For the moment I let the index active.

Let's try the CHARINDEX solution:

select * from like_test where charindex(@s, barcode) > 0

This time it took 23.5 second to complete, not really so much better than LIKE.

Now let's check the @DenisReznik 's suggestion that using the Binary Collation should speed up things.

select * from like_test
where barcode collate Latin1_General_BIN like '%'+@s+'%' collate Latin1_General_BIN

WOW, it seems to work! Only 4.5 secs this is impressive! 5 times better..
So, what about CHARINDEX and Collation toghether? Let's try it:

select * from like_test
where charindex(@s collate Latin1_General_BIN, barcode collate Latin1_General_BIN)>0

Unbelivable! 2.4 secs, 10 times better..

Ok, so far I have realized that CHARINDEX is better than LIKE, and that Binary Collation is better than normal string collation, so from now on I will go on only with CHARINDEX and Collation.

Now, can we do anything else to get even better results? Maybe we can try reduce our very long strings.. a scan is always a scan..

First try, a logical string cut using SUBSTRING to virtually works on barcodes of 8 chars:

select * from like_test
where charindex(
        @s collate Latin1_General_BIN, 
        SUBSTRING(barcode, 12, 8) collate Latin1_General_BIN
      )>0

Fantastic! 1.8 seconds.. I have tried both SUBSTRING(barcode, 1, 8) (head of the string) and SUBSTRING(barcode, 12, 8) (middle of the string) with same results.

Then I have tried to phisically reduce the size of the barcode column, almost no difference than using SUBSTRING()

Finally I have tried to drop the index on barcode column and repeated ALL above tests... I was very surprised to get almost same results, with very little differences.
Index performs 3-5% better, but at cost of 500Mb of disk space and and maintenance cost if the catalog is updated.

Naturally, for a direct key lookup like where barcode = @s with the index it takes 20-50 millisecs, without index we can't get less than 1.1 secs using Collation syntax where barcode collate Latin1_General_BIN = @s collate Latin1_General_BIN

This was interesting.
I hope this helps

Dehydrate answered 12/7, 2016 at 12:57 Comment(4)

It worked in the keyword you searched, but when the keyword is shorted, for example, "44" (only 2 characters, and in this case the result consists of 1.9M rows return), the execution time with and without collation cast are the same. – Knuckleduster 30/3, 2021 at 3:41

This doesn't work for logical operations. AND, OR, AND NOT and etc. – Juror 22/8, 2022 at 7:20

Latin1_General_BIN is a case sensitive collation. This is not what I needed for my use case. It may also be part of the reason for a speed increase over a case-insensitive collation. – Inflatable 6/4, 2023 at 2:39

@Inflatable here we are talking about to find a barcode with a partial match in a very huge catalog and barcodes are rarely case sensitive. – Dehydrate 14/4, 2023 at 7:44

I often use charindex and just as often have this very debate.

As it turns out, depending on your structure you may actually have a substantial performance boost.

http://cc.davelozinski.com/sql/like-vs-substring-vs-leftright-vs-charindex

Hirundine answered 6/7, 2016 at 18:39 Comment(1)

Thanks charindex seems like a short term nice improvement! But, once I'll have millions of barcodes it might not be enough... – Interpolate 8/7, 2016 at 8:34

The good option here for your case - creating your FTS index. Here is how it could be implemented:

1) Create table Terms:

CREATE TABLE Terms
(
    Id int IDENTITY NOT NULL,
    Term varchar(21) NOT NULL,
    CONSTRAINT PK_TERMS PRIMARY KEY (Term),
    CONSTRAINT UK_TERMS_ID UNIQUE (Id)
)

Note: index declaration in the table definition is a feature of 2014. If you have a lower version, just bring it out of CREATE TABLE statement and create separately.

2) Cut barcodes to grams, and save each of them to a table Terms. For example: barcode = '123456', your table should have 6 rows for it: '123456', '23456', '3456', '456', '56', '6'.

3) Create table BarcodeIndex:

CREATE TABLE BarcodesIndex
(
    TermId int NOT NULL,
    BarcodeId int NOT NULL,
    CONSTRAINT PK_BARCODESINDEX PRIMARY KEY (TermId, BarcodeId),
    CONSTRAINT FK_BARCODESINDEX_TERMID FOREIGN KEY (TermId) REFERENCES Terms (Id),
    CONSTRAINT FK_BARCODESINDEX_BARCODEID FOREIGN KEY (BarcodeId) REFERENCES Barcodes (Id)
)

4) Save a pair (TermId, BarcodeId) for the barcode into the table BarcodeIndex. TermId was generated on the second step or exists in the Terms table. BarcodeId - is an identifier of the barcode, stored in Barcodes (or whatever name you use for it) table. For each of the barcodes, there should be 6 rows in the BarcodeIndex table.

5) Select barcodes by their parts using the following query:

SELECT b.* FROM Terms t
INNER JOIN BarcodesIndex bi
    ON t.Id = bi.TermId
INNER JOIN Barcodes b
    ON bi.BarcodeId = b.Id
WHERE t.Term LIKE 'SomeBarcodePart%'

This solution force all similar parts of barcodes to be stored nearby, so SQL Server will use Index Range Scan strategy to fetch data from the Terms table. Terms in the Terms table should be unique to make this table as small as possible. This could be done in the application logic: check existence -> insert new if a term doesn't exist. Or by setting option IGNORE_DUP_KEY for clustered index of the Terms table. BarcodesIndex table is used to reference Terms and Barcodes.

Please note that foreign keys and constraints in this solution are the points of consideration. Personally, I prefer to have foreign keys, until they hurt me.

Rottweiler answered 7/7, 2016 at 6:46 Comment(16)

I thought that breaking to grams should be an option in the full text index, like any other search engine – Interpolate 7/7, 2016 at 6:48

No. Default word breakers in SQL Server break text only on whitespaces and special characters. Implementing n-grams or another algorithm should be done manually. – Rottweiler 7/7, 2016 at 6:57

By the way, Guy, as far as I understand barcodes will be numeric, so you have nothing to deal with the case and national characters. You can set Binary collation for this column and search it with like. This will improve scan speed significantly. And yes, creating an index on this field (if you don't have it) will also make scan faster. Probably this is not what are you looking for, but this modification will be quick in comparison with the solution I suggested in the answer. – Rottweiler 7/7, 2016 at 7:2

Barcodes are not numbers since they might have leading zeros and in some cases might even have letters – Interpolate 8/7, 2016 at 8:32

That does not matter. Since they are not letters, you can use this trick. – Rottweiler 8/7, 2016 at 14:18

@DenisReznik this creative solutions in my opinion will be much more expensive than your first idea of straight charindex() + collate binary. when working with HUGE catalogs, adding 5-6 columns and their indexes especially if the table is often updated could not be the best solution both for disk space and performances. in this case barcode seems to be only 6 chars long, but for a barcode it is very easy to reach widths of 8-13-21 chars this solution couldn't be easily scalable – Dehydrate 13/7, 2016 at 8:8

@Dehydrate yeah, I supposed that barcode has only 6 chars. Choosing this or that solution should depend on overall data size and fetched data size. Using this solution in a worst case you do 6 index seeks. Depends on search term length, this could be optimized to search only on columns with equal or bigger length. For example: if a search term is "12345", it could be found only in col1 and col2, so we do 2 index seeks. The second option with collations and charindex requires full index scan for each search. This is the point of consideration, which is better for your data and queries. – Rottweiler 13/7, 2016 at 8:26

@Dehydrate thanks for pointing me on this. I will correct the answer with more appropriate solution. – Rottweiler 13/7, 2016 at 8:28

@DenisReznik I have made a little test and maybe your solutions can be an option, but instead of creating phisical columns I have created Calculated Columns (to save disk space), and their indexes. As long as you select ONLY the primary key the query is hell fast! if you add other columns optimizer will go for a very slow complete index scan, so the trick is to use a CTE or a subquery to extract only the keys and then rejoin (or EXISTS or IN ) master table to extract other columns. good job :) – Dehydrate 13/7, 2016 at 9:15

@Dehydrate please consider the updated solution. I wonder why I didn't write it from the beginning, as this one is much better for this case than creating several columns for grams. And it is scalable enough. – Rottweiler 13/7, 2016 at 9:25

@DenisReznik it is a clever idea, theoretically I like it very much. I would recommend to avoid to put into table Terms the full barcode and also the single char, probably there is already an index on barcode table and they will search for a "partial" code only after a search for the full code, and the single char will match so many records that it will be not useful. I'm a bit concerned about populating Terms and BarcodesIndex tables, it will require extra application logic, and a LOT of space – Dehydrate 13/7, 2016 at 12:35

@DenisReznik think about cardinality: with a string of 6 chars containing only numbers you can have up to 1 million codes, excluding search string of 6 chars and of 1 char, Terms table will be up to 99999+9999+999+99 = 111096 and the BarcodesIndex can be up to (99999*2+9999*3+999*4+99*5)*1000000 which is 234486 millions (you'll need BIGINT). I think it will be too much to handle with.. – Dehydrate 13/7, 2016 at 12:36

@Dehydrate include or not 1 character is completely related to the requirements. Yes, I agree that it will be more performant to not index it, and this is a point of consideration. Doing 2 index range scan for full and for part of the barcode, here I will not agree with you. I'd prefer to store data twice but do one range scan. – Rottweiler 13/7, 2016 at 13:18

@Dehydrate "Terms table will be up to 99999+9999+999+99 = 111096", I don't see a big issue here. Each part of a barcode will be stored in Terms Table only once, so for 1000000 of barcodes, in a worst case, we will have 1000000+100000+10000+1000+100+10 = 1111110. How much space will be used? 1111110 * 6 Bytes / 1024 / 1024 = 6.3 MB. Not too much, and this is the worst case which we could have for 6-digits barcodes. – Rottweiler 13/7, 2016 at 13:43

@Dehydrate calculations for a BarcodesIndex table looks a bit incorrect. Supposing that it should have 6 references for each of barcodes (each barcode is broken to 6 parts), it will have 6 mln. rows for 1 mln. of 6-digits barcodes. Yes, this solution may require bigint, but not for 1 mln. barcodes. – Rottweiler 13/7, 2016 at 13:50

Let us continue this discussion in chat. – Dehydrate 13/7, 2016 at 14:55

After further testing and reading and talking with @DenisReznik I think the best option could be to add virtual columns to barcode table to split barcode.

We only need columns for start positions from 2nd to 4th because for the 1st we will use original barcode column and the last I think it is not useful at all (what kind of partial match is 1 char on 6 when 60% of records will match?):

CREATE TABLE [like_test](
    [N] [int] NOT NULL PRIMARY KEY,
    [barcode] [varchar](6) NOT NULL,
    [BC2]  AS (substring([BARCODE],(2),(5))),
    [BC3]  AS (substring([BARCODE],(3),(4))),
    [BC4]  AS (substring([BARCODE],(4),(3))),
    [BC5]  AS (substring([BARCODE],(5),(2)))
)

and then to add indexes on this virtual columns:

CREATE NONCLUSTERED INDEX [IX_BC2] ON [like_test2] ([BC2]);
CREATE NONCLUSTERED INDEX [IX_BC3] ON [like_test2] ([BC3]);
CREATE NONCLUSTERED INDEX [IX_BC4] ON [like_test2] ([BC4]);
CREATE NONCLUSTERED INDEX [IX_BC5] ON [like_test2] ([BC5]);
CREATE NONCLUSTERED INDEX [IX_BC6] ON [like_test2] ([barcode]);

now we can simply find partial matches with this query

declare @s varchar(40) 
declare @l int

set @s = '654'
set @l = LEN(@s)

select N from like_test 
where 1=0
OR ((barcode = @s) and (@l=6)) -- to match full code (rem if not needed)
OR ((barcode like @s+'%') and (@l<6)) -- to match strings up to 5 chars from beginning
or ((BC2 like @s+'%') and (@l<6)) -- to match strings up to 5 chars from 2nd position
or ((BC3 like @s+'%') and (@l<5)) -- to match strings up to 4 chars from 3rd position
or ((BC4 like @s+'%') and (@l<4)) -- to match strings up to 3 chars from 4th position
or ((BC5 like @s+'%') and (@l<3)) -- to match strings up to 2 chars from 5th position

this is HELL fast!

for search strings of 6 chars 15-20 milliseconds (full code)
for search strings of 5 chars 25 milliseconds (20-80)
for search strings of 4 chars 50 milliseconds (40-130)
for search strings of 3 chars 65 milliseconds (50-150)
for search strings of 2 chars 200 milliseconds (190-260)

There will be no additional space used for table, but each index will take up to 200Mb (for 1 million barcodes)

PAY ATTENTION
Tested on a Microsoft SQL Server Express (64-bit) and Microsoft SQL Server Enterprise (64-bit) the optimizer of the latter is slight better but the main difference is that:

on express edition you have to extract ONLY the primary key when searching your string, if you add other columns in the SELECT, the optimizer will not use indexes anymore but it will go for full clustered index scan so you will need something like

;with
k as (-- extract only primary key
    select N from like_test
    where 1=0
    OR ((barcode = @s) and (@l=6))
    OR ((barcode like @s+'%') and (@l<6))
    or ((BC2 like @s+'%') and (@l<6))
    or ((BC3 like @s+'%') and (@l<5))
    or ((BC4 like @s+'%') and (@l<4))
    or ((BC5 like @s+'%') and (@l<3))
)
select N 
from like_test t
where exists (select 1 from k where k.n = t.n)

on standard (enterprise) edition you HAVE to go for

    select * from like_test -- take a look at the star
    where 1=0
    OR ((barcode = @s) and (@l=6))
    OR ((barcode like @s+'%') and (@l<6))
    or ((BC2 like @s+'%') and (@l<6))
    or ((BC3 like @s+'%') and (@l<5))
    or ((BC4 like @s+'%') and (@l<4))
    or ((BC5 like @s+'%') and (@l<3))

Dehydrate answered 13/7, 2016 at 14:31 Comment(0)

UPDATED:

We know from that FULL-TEXT searches can be used for the following:

Full-Text Search - MSDN

One or more specific words or phrases (simple term)

A word or a phrase where the words begin with specified text (prefix term)

Inflectional forms of a specific word (generation term)

A word or phrase close to another word or phrase (proximity term)

Synonymous forms of a specific word (thesaurus)

Words or phrases using weighted values (weighted term)

Are any of these fulfilled by your query requirements? If you are having to search for patterns as you described, without an consistent pattern (such as '1%'), then there may not be a way for SQL to use a SARG.

You could use Boolean statements

Coming from a C++ perspective, B-Trees are accessed from Pre-Order, In-Order, and Post-Order traversals and utilize Boolean statements to search the B-Tree. Processed much faster than string comparisons, booleans offer at the least an improved performance.

We can see this in the following two options:

PATINDEX

Only if your column is not numeric, as PATINDEX is designed for strings.
Returns an integer (like CHARINDEX) which is easier to process than strings.

CHARINDEX is a solution

CHARINDEX has no problem searching INTs and again, returns a number.
May require some extra cases built in (i.e. first number is always ignored), but you can add them like so: CHARINDEX('200', barcode) > 1.

Proof of what I am saying, let us go back to the old [AdventureWorks2012].[Production].[TransactionHistory]. We have TransactionID which contains the number of the items we want, and lets for fun assume you want every transactionID that has 200 at the end.

-- WITH LIKE
SELECT TOP 1000 [TransactionID]
      ,[ProductID]
      ,[ReferenceOrderID]
      ,[ReferenceOrderLineID]
      ,[TransactionDate]
      ,[TransactionType]
      ,[Quantity]
      ,[ActualCost]
      ,[ModifiedDate]
  FROM [AdventureWorks2012].[Production].[TransactionHistory]
  WHERE TransactionID LIKE '%200'

-- WITH CHARINDEX(<delimiter>, <column>) > 3
SELECT TOP 1000 [TransactionID]
      ,[ProductID]
      ,[ReferenceOrderID]
      ,[ReferenceOrderLineID]
      ,[TransactionDate]
      ,[TransactionType]
      ,[Quantity]
      ,[ActualCost]
      ,[ModifiedDate]
  FROM [AdventureWorks2012].[Production].[TransactionHistory]
  WHERE CHARINDEX('200', TransactionID) > 3

Note CHARINDEX removes the value 200200 in the search, so you may need to adjust your code appropriately. But look at the results:

Clearly, booleans and numbers are faster comparisons.
LIKE uses string comparisons, which again is much slower to process.

I was a bit surprised at the size of the difference, but the fundamentals are the same. Integers and Boolean statements are always faster to process than string comparisons.

Lynea answered 8/7, 2016 at 8:4 Comment(9)

Barcodes are strings since they might have leading zeroes – Interpolate 8/7, 2016 at 8:30

@GuyKorland Not they do not have to be, though that is not of concern. Not the update to prove my point. :) – Lynea 8/7, 2016 at 8:34

@they are, but like DATEs can be stored in VARCHAR, strings that store numbers can be easily altered by SQL. – Lynea 8/7, 2016 at 8:49

Type of LIKE comparison for strings in SQL Server depends on the collation. The fastest comparison - binary comparison. If column uses Binary collation, LIKE will be as fast as a hell. – Rottweiler 8/7, 2016 at 14:26

@Denis.Reznik testing this out on a table with 100 million rows (index, but it is still unique), is that like using SOUNDEX? Or DIFFERENCE? SQL SERVER does not seem to have a hint option for LIKE BINARY and SQL Server by default is case-insensitive. Plus, those numbers are not changing which makes the comparison sort of moot. – Lynea 8/7, 2016 at 19:23

@Lynea No. Looks like you are doing something wrong. I'm talking about COLLATION of the column which you are searching with LIKE predicate. If you change it to some kind of Binary collation, LIKE works faster. – Rottweiler 8/7, 2016 at 20:52

@Denis.Reznik lets not say things that are unproven. I would be interested in seeing an example of what you say and how much the cost of conversion is to the query. – Lynea 8/7, 2016 at 21:24

@Lynea this is not unproven, this is a quite well-known approach. Please check this article, if my explanations were not enough for you: aboutsqlserver.com/2015/01/20/…. – Rottweiler 10/7, 2016 at 21:49

@Denis.Reznik so you know, the COLLATION strategy you mentioned intrigued me and sounds incredible if there was a consistent design strategy around it. Any strategy that uses smaller and smarter datatypes to process is a WIN/WIN to me. However, that does not mean I am doing anything wrong. Maybe not THE best solution, but the foundation is solid. – Lynea 11/7, 2016 at 3:9

You do not include many constraints, which means you want to search for string in a string -- and if there was a way to optimized an index to search a string in a string, it would be just built in!

Other things that make it hard to give a specific answer:

It's not clear what "huge" and "too long" mean.
It's not clear as to how your application works. Are you searching in batch as you add a 1,000 new products? Are you allowing a user to enter a partial barcode in a search box?

I can make some suggestions that may or may not be helpful in your case.

Speed up some of the queries

I have a database with lots of licence plates; sometimes an officer wants to search by the last 3-characters of the plate. To support this I store the license plate in reverse, then use LIKE ('ZYX%') to match ABCXYZ. When doing the search, they have the option of a 'contains' search (like you have) which is slow, or an option of doing 'Begins/Ends with' which is super because of the index. This would solve your problem some of the time (which may be good enough), especially if this is a common need.

Parallel Queries

An index works because it organizes data, an index cannot help with a string within a string because there is no organization. Speed seems to be your focus of optimization, so you could store/query your data in a way that searches in parallel. Example: if it takes 10-seconds to sequentially search 10-million rows, then having 10-parallel processes (so process searches 1-million) will take you from 10-seconds to 1-second (kind'a-sort'a). Think of it as scaling out. There are various options for this, within your single SQL Instance (try data partitioning) or across multiple SQL Servers (if that's an option).

BONUS: If you're not on a RAID setup, that can help with reads since it's a effectively of reading in parallel.

Reduce a bottleneck

One reason searching "huge" datasets take "too long" is because all that data needs to be read from the disk, which is always slow. You can skip-the-disk, and use InMemory Tables. Since "huge" isn't defined, this may not work.

Godmother answered 8/7, 2016 at 10:5 Comment(0)

I'm late to the game but here's another way to get a full-text like index in the spirit of @MtwStark's second answer.

This is a solution using a search table join

drop table if exists #numbers
select top 10000 row_number() over(order by t1.number) as n 
into #numbers
from master..spt_values t1 
    cross join master..spt_values t2


drop table if exists [like_test]
create TABLE [like_test](
    [N] INT IDENTITY(1,1) not null,
    [barcode] [varchar](40) not null,
    constraint pk_liketest primary key ([N])
) 

insert into dbo.like_test (barcode)
select top (1000000) replace(convert(varchar(40), NEWID()), '-', '') barcode 
from #numbers t,#numbers t2


drop table if exists barcodesearch
select distinct ps.n, trim(substring(ps.barcode,ty.n,100)) as searchstring
    into barcodesearch
from like_test ps
inner join #numbers ty on ty.n < 40
where len(ps.barcode) > ty.n

create clustered index idx_barcode_search_index on barcodesearch (searchstring)

The final search should look like this:

declare @s varchar(20) = 'D34F15'
select distinct lt.* from dbo.like_test lt
inner join barcodesearch bs on bs.N = lt.N
where bs.searchstring like @s+'%'

If you have the option of full-text searching, you can speed this up even further by adding the full-text search column directly to the barcode table

drop table if exists #liketestupdates
select n, string_agg(searchstring, ' ') 
    within group (order by reverse(searchstring)) as searchstring
into #liketestupdates
from barcodesearch
group by n

alter table dbo.like_test add search_column varchar(559)

update lt
    set search_column = searchstring
from like_test lt
inner join #liketestupdates lu on lu.n = lt.n


CREATE FULLTEXT CATALOG ftcatalog as default;  

create fulltext index on dbo.like_test  ( search_column )  
key index pk_liketest

The final full-text search would look like this:

declare @s varchar(20) = 'D34F15'
set @s = '"*' + @s + '*"'
select n,barcode from dbo.like_test where contains(search_column, @s)

I understand that Estimated Costs aren't the best measure of expected performance but the number's aren't wildly off here.

With the search table join, the Estimated Subtree Cost is 2.13

With the full-text search, the Estimated Subtree Cost is 0.008

Dear answered 22/8, 2020 at 10:38 Comment(1)

Smart approach, but you should put a trigger on main catalog to keep search table updated – Dehydrate 14/4, 2023 at 7:40

-1

Full-text is aimed for bigger texts, let's say texts with more than about 100 chars. You can use LIKE '%string%'. (However it depends how the barcode column is defined.) Do you have an index for barcode? If not, then create one and it will improve your query.

Afeard answered 6/7, 2016 at 17:28 Comment(2)

If you have wildcards on both ends of the search string then an index won't help. #1388559 – Hogan 6/7, 2016 at 21:8

The index on the barcode field will help as much as this index scan will be faster than clustered index scan. But probably this is not a solution which are you looking for. – Rottweiler 7/7, 2016 at 6:49

-1

First make the index on column on which you have to put as where clause .

Secondly for the datatype of the column which are used in where clause make them as Char in place of Varchar which will save you some space,in the table and in the indexes that will include that column. varchar(1) column needs one more byte over char(1)

Do pull only the number of columns you need try to avoid * , be specific to number of columns you wish to select. Don't write as

select * from products

In place of it write as

 Select Col1, Col2 from products with (Nolock)

Como answered 13/7, 2016 at 14:23 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags