Slow query with cfqueryparam searching on indexed column containing hashes
Asked Answered
A

3

4

I have the following query that runs in 16ms - 30ms.

<cfquery name="local.test1" datasource="imagecdn">
    SELECT hash FROM jobs WHERE hash in(
        'EBDA95630915EB80709C69089315399B',
        '3617B8E6CF0C62ECBD3C48DDF8585466',
        'D519A38F09FDA868A2FEF1C55C9FEE76',
        '135F94C3774F7719CFF8FF3A275D2D05',
        'D58FAE69C559273D8427673A08193789',
        '2BD7276F209768F2FCA6635659D7922A',
        'B1E3CFBFCCFF6F5B48A849A050E6D424',
        '2288F5B8A797F5302E8CA24323617236',
        '8951883E36B5D38A4643DFAA0396BF13',
        '839210BD564E30BE1355D1A6D4EF7081',
        'ED4A2CB0C28B608C29576819CF7BE19B',
        'CB26925A4874945B810707D5FF0B91F2',
        '33B2FC229F0CC797A02AD163CDBA0875',
        '624986E7547DBAC0F47B3005CFDE0A16',
        '6F692C289BD805CEE41EF59F83F16F4D',
        '8551F0033C617BD9EADAAD6CEC4B3E9E',
        '94C3C0A74C2DE085FF9F1BBF928821A4',
        '28DC1A9D2A69C2EDF5E6C0E6368A0B3C'
    )
</cfquery>

If I execute the same query but use cfqueryparam it runs in 500ms - 2000ms.

<cfset local.hashes = "[list of the same ids as above]">
<cfquery name="local.test2" datasource="imagecdn">
    SELECT hash FROM jobs WHERE hash in(
        <cfqueryparam cfsqltype="cf_sql_varchar" value="#local.hashes#" list="yes">
    )
</cfquery>

The table has roughly 60,000 rows. The "hash" column is varchar(50) and has a unique non-clustered index, but is not the primary key. DB server is MSSQL 2008. The web server is running the latest version of CF9.

Any idea why the cfqueryparam causes the performance to bomb out? It behaves this way every single time, no matter how many times I refresh the page. If I pair the list down to only 2 or 3 hashes, it still performs poorly at like 150-200ms. When I eliminate the cfqueryparam the performance is as expected. In this situation there is the possibility for SQL injection and thus using cfqueryparam would certainly be preferable, but it shouldn't take 100ms to find 2 records from an indexed column.

Edits:

  1. We are using hashes generated by hash() not UUIDS or GUIDS. The hash is generated by a hash(SerializeJSON({ struct })) which contains the plan for a set of operations to execute on an image. The purpose for this is that it allows us to know before insert and before query the exact unique id for that structure. These hashes act as an "index" of what structures have already been stored in the DB. In addition with hashes the same structure will hash to the same result, which is not true for UUIDS and GUIDS.

  2. The query is being executed on 5 different CF9 servers and all of them exhibit the same behavior. To me this rules out the idea that CF9 is caching something. All servers are connecting to the exact same DB so if caching was occurring it would have to be the DB level.

Asaasabi answered 11/5, 2012 at 0:6 Comment(3)
Have you tried using just cf_sql_char instead of varchar? It might force MSSQL to take a closer look at the array and give you a better execution plan. Clearly the plan it's coming up with from the cache is not as efficient as the one it compiles in real time. Also try adding an index hint. Run the trace analyzer on it if you can isolate it in dev - might give you some clues in the execution plan. that's all I have :)Dispensable
It might be because, it cannot reuse the prepare statement for lists, so it is recompiling everytime.... If you are confident to use your own logic to prevent sql injection, maybe it's suitable to skip cfqueryparam for this.Selfregard
Here are some ways to improve performance of SELECT IN florianreischl.blogspot.ca/2012/03/…Selfregard
S
8

Your issue may be related to VARCHAR vs NVARCHAR. These 2 links may help Querying MS SQL Server G/UUIDs from ColdFusion and nvarchar vs. varchar in SQL Server, BEWARE

What might be happening is there is a setting in ColdFusion administrator if cfqueryparam sends varchars as unicode or not. If that setting does not match the column setting (in your case, if that setting is enabled) then MS SQL will not use that index.

Slowdown answered 11/5, 2012 at 16:28 Comment(6)
Do you know where I would alter this setting or check if it's the case? On this specific datasource I do not have -- Enable High ASCII characters and Unicode for data sources configured for non-Latin characters checked. Should I? The column is varchar.Asaasabi
Looks like it should be unchecked. Can you try changing the setting to see if it makes a difference?Slowdown
BOOM! That's it Yisroel, thank you so much. On my local dev box I had it unchecked. On our dev/live servers it was checked. As soon as I unchecked it, the performance immediately improved. If in the future I ran into this same problem but needed that checked (because unicode was neccessary), would I want my columns to be nvarchar this way the index is in unicode as well?Asaasabi
the decision which type of column to use depends on the data youre storing. just make sure they match the setting in CF Administrator :)Slowdown
I'm going to blog this one. nice solution Yisroel - love it!Dispensable
Oh finally! CF10 now supports CF_SQL_NVARCHAR. So we can now use that as a per-query setting instead of per-server setting.Selfregard
P
0

As Mark points out it is is probably got a bad execution plan in the cache. One of the advantages of cfqueryparam is that when you pass in different values it can reuse the cached plan it has for that statement. This is why when you try it with a smaller list you see no improvement. When you do not use cfqueryparam SQL Server has to work out the Execution Plan each time. This normally a bad thing unless it has a sub optimal plan in the cache. Try clearing the cache as explained here http://www.devx.com/tips/Tip/14401 this hopefully will mean that the next time you run your statement with cfqueryparam in it'll cache the better plan.

Make sense?

Pianoforte answered 11/5, 2012 at 5:4 Comment(3)
I suspect - although am not 100% sure - there'll be a different plan for different-sized lists though. CF doesn't pass the SQL as SELECT * FROM table WHERE col IN (:listParam) (ie: there's a single bind param), it passes it as SELECT * FROM table WHERE col IN (:each, :element, :separately). So the server will see a query for a three-element list as different from one with a ten-element list, and I strongly suspect each will be compiled separately.Humdrum
I totally agree with you that the IN statement is not bound as one list. But I do not thing that will identify it as a new prepared statement and make it calculate a new query plan. If you think about it the number of items in an IN statement really should have no bearing on whether to use a table scan, or an index etc.Pianoforte
I'm not sure if it's relevant but the same query is being run on 5 different CF9 boxes. Each one exhibits the same behavior which I think rules out any form of Coldfusion query caching. All boxes are connecting to the same DB, so if the caching is at the SQL level that could be a possibility.Asaasabi
B
0

I don't think cfqueryparam causing issue. As you have mention big hike in execution it may be index not going to use for your query when trying with cfqueryparam. I have created same scenario on my development computer but I got same execution time with and without cfqueryparam. There may be some overhead using list as in first query you are passing it directly as test and in second coldfusion need to create from query parameter from provided list but again this should not that much. I will suggest to start "SQL Server Profiler" and monitor query executed on server, this will give you better who costing another 500 ms.

Burnside answered 11/5, 2012 at 5:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.