SQL Server Performance of Parameterized Queries with leading wildcards
Asked Answered
L

4

6

I have a SQL 2008 R2 Database with about 2 million rows in one of the tables and am struggling with the performance of a specific query when using parameterized SQL.

In the table, there's a field containing a name in it:

[PatientsName] nvarchar NULL,

There's also a simple index on the field:


CREATE NONCLUSTERED INDEX [IX_Study_PatientsName] ON [dbo].[Study] 
(
    [PatientsName] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [INDEXES]
GO

When I do this query in the management studio, it takes around 4 seconds to execute:


declare @StudyPatientsName nvarchar(64)
set @StudyPatientsName= '%Jones%'

SELECT COUNT(*) FROM Study WHERE Study.PatientsName like @StudyPatientsName

But, when I execute this query:


SELECT COUNT(*) FROM Study WHERE Study.PatientsName like '%Jones%'

it takes a bit more than a half second to execute.

Looking at the execution plans, the query without parameterization does an Index Scan using the above mentioned index, which obviously is efficient. The parameterized query uses the index, but does a range seek on the index.

Part of the issue is having the leading wildcard. When I remove the leading wildcard, both queries return in a fraction of a second. Unfortunately, I do need to support leading wildcards.

We have a home grown ORM that does parameterized queries where the problem originated. These queries are done based on input from a user, so parameterized queries make sense to avoid things like a SQL injection attack. I'm wondering if there's a way to make the parameterized query function as well as the non-parameterized query?

I've done some research looking at different ways to give hints to the query optimizer, trying to force the optimizer to redo the query plan on each query, but haven't found anything yet to improve the performance. I tried this query:


SELECT COUNT(*) FROM Study WHERE Study.PatientsName like @StudyPatientsName
OPTION ( OPTIMIZE FOR (@StudyPatientsName = '%Jones%'))

which was mentioned as a solution in this question, but it didn't make a difference.

Any help would be appreciated.

Lavellelaven answered 16/8, 2010 at 16:50 Comment(2)
Frankly if you are doing this kind of serch it seems to me your data model is bad. You should not have a patient name field that has both the first and last name. Fix your structural problem first.Macaroon
I would concur that splitting the name between first and last name would be a viable method to solve the issue in this particular case. Unfortunately, we also have queries on several other non-patient name fields that have the same issue, which have to be resolved also.Lavellelaven
I
4

It seems like you want to force a scan. There is a FORCESEEK hint but I couldn't see any analogous FORCESCAN hint. This should do it though.

SELECT COUNT(*) 
FROM Study 
WHERE Study.PatientsName + '' like @StudyPatientsName

Although maybe you could try the following on your data and see how it works out .

SELECT COUNT(*) 
FROM Study 
WHERE Study.PatientsName  like @StudyPatientsName
option (recompile)
Insincere answered 16/8, 2010 at 19:28 Comment(4)
He could probably force the scan with an index hint as well. SELECT COUNT(*) FROM Study with (index(IX_Study_PatientsName)) WHERE Study.PatientsName like @StudyPatientsNameAhimsa
@Joe - It is using the index but doing a range seek on it. I think the plan he is getting is like this declare @StudyPatientsName nvarchar(64) set @StudyPatientsName= '%a%' SELECT COUNT(*) FROM master.dbo.spt_values WHERE type like @StudyPatientsName img826.imageshack.us/img826/3977/executionplan.jpgInsincere
Adding the empty string to the field make (e.g., +'') does work, although it definitely feels like a hack. Adding "option (recompile)" does not improve the performance. I really thought when I initially looked into this that using "option (recompile)" would resolve the issue. there must be something going on with SQL that its not forcing a scan of the index in this case. In any case, I'll give it another day to see if there's other suggested answers, but it looks like this should help solve the issue for me.Lavellelaven
@Steve - I got fairly strange results with recompile as well. It did seem to cause the plan to change for different inputs but I think it must use the statistics in some way and I couldn't really work out whether it was for good or bad.Insincere
A
3

I think your best chance of improving performance here is to look into using a full text index.

Ahimsa answered 16/8, 2010 at 16:55 Comment(1)
I would concur that this is a viable solution to the problem. In my particular instance, its not practical to use a full text index, but this would be another way to get around the quirk in SQL Server it seems I've discovered.Lavellelaven
H
0

I'm having trouble finding the documentation to verify this, but IIRC, COUNT(*) does a full table scan in MS SQL (as opposed to using a cached value). If you run it against a column which cannot be null and/or has an index defined, I believe (again, still can't find docs to confirm, so I could be off base here) that will be faster.

What happens when you modify the query to something like:

SELECT COUNT(id) FROM Study WHERE Study.PatientsName Like @StudyPatientsName

or

SELECT COUNT(PatientsName) FROM Study 
WHERE Study.PatientsName 
LIKE @StudyPatientsName
Hispania answered 16/8, 2010 at 17:5 Comment(1)
I tried a query select count(Guid) and select count(PatientsName) and in both cases, the queries were slow. I also tries just a select * from Study with the same where clause, and it also took roughly the same amount of time to execute.Lavellelaven
T
0

If all else fails you could try

SELECT COUNT(*) FROM Study WITH(INDEX(0)) WHERE Study.PatientsName like @StudyPatientsName

Perhaps you could wrap it in an IF

IF substring(@StudyPatientsName, 1, 1) = '%'
    SELECT COUNT(*) FROM Study WITH(INDEX(0)) WHERE Study.PatientsName like @StudyPatientsName
ELSE
    SELECT COUNT(*) FROM Study WHERE Study.PatientsName like @StudyPatientsName

Edit: As martin pointed out, for this specific query this is probably not the best way to do it since an index scan of the existing index is likely faster. It might be applicable in similar situations, though.

Tyndall answered 17/8, 2010 at 13:33 Comment(4)
I agree that making a special case for leading wildcards would be a good idea. I don't think the index hint would make any difference though. The issue is that it uses the correct index but in an inefficient way that seems to add to the amount of I/O. To reproduce try set statistics io on declare @StudyPatientsName nvarchar(64) set @StudyPatientsName= '%%' SELECT COUNT(*) FROM master.dbo.spt_values WHERE type like @StudyPatientsName then compare plan and I/0 with WHERE type+'' like @StudyPatientsNameInsincere
According to the docs, INDEX(0) will force a table/clustered index scan (not seek). I doubt you can do better than this. The problem is that there is no way to use an index to do a LIKE with a leading wildcard.Tyndall
My answer does better than this! The nonclustered index IX_Study_PatientsName should completely cover this query so that's the one that needs to be scanned. This will presumably be a lot narrower as it only contains the name column and a scan of it will involve a lot less I/O.Insincere
@Martin: You are right. I'll let the answer stand with this remark, though.Tyndall

© 2022 - 2024 — McMap. All rights reserved.