OPTION (RECOMPILE) is Always Faster; Why?
Asked Answered
P

6

239

I encountered an odd situation where appending OPTION (RECOMPILE) to my query causes it to run in half a second, while omitting it causes the query to take well over five minutes.

This is the case when the query is executed from Query Analyzer or from my C# program via SqlCommand.ExecuteReader(). Calling (or not calling) DBCC FREEPROCCACHE or DBCC dropcleanbuffers makes no difference; Query results are always returned instantaneously with OPTION (RECOMPILE) and greater than five minutes without it. The query is always called with the same parameters [for the sake of this test].

I'm using SQL Server 2008.

I'm fairly comfortable with writing SQL but have never used an OPTION command in a query before and was unfamiliar with the whole concept of plan caches until scanning the posts on this forum. My understanding from the posts is that OPTION (RECOMPILE) is an expensive operation. It apparently creates a new lookup strategy for the query. So why is it then, that subsequent queries that omit the OPTION (RECOMPILE) are so slow? Shouldn't the subsequent queries be making use of the lookup strategy that was computed on the previous call which included the recompilation hint?

Is it highly unusual to have a query that requires a recompilation hint on every single call?

Sorry for the entry-level question but I can't really make heads or tails of this.

UPDATE: I've been asked to post the query...

select acctNo,min(date) earliestDate 
from( 
    select acctNo,tradeDate as date 
    from datafeed_trans 
    where feedid=@feedID and feedDate=@feedDate 

    union 

    select acctNo,feedDate as date 
    from datafeed_money 
    where feedid=@feedID and feedDate=@feedDate 

    union 

    select acctNo,feedDate as date 
    from datafeed_jnl 
    where feedid=@feedID and feedDate=@feedDate 
)t1 
group by t1.acctNo
OPTION(RECOMPILE)

When running the test from Query Analyzer, I prepend the following lines:

declare @feedID int
select @feedID=20

declare @feedDate datetime
select @feedDate='1/2/2009'

When calling it from my C# program, the parameters are passed in via the SqlCommand.Parameters property.

For the purposes of this discussion, you can assume that the parameters never change so we can rule out sub-optimal parameter smelling as the cause.

Pentheam answered 1/1, 2014 at 1:59 Comment(6)
What are the parameters to the query? Check this article out. blogs.msdn.com/b/turgays/archive/2013/09/10/… Basically, SQL attempts to generate the query plan based on parameters when the proc is first compiled. It may generate a plan that is not optimal when you start passing different, possibly more realistic parametersMccants
Is the query concise enough to list here? I think Sparky is correct and it's probably related to parameter sniffing, I had a similar problem which confused the heck out of me until reading this excellent article: sommarskog.se/query-plan-mysteries.htmlMerrileemerrili
But in this case (for the sake of this test) I'm always passing in the same parameters. No other apps were able to sneak in and call the query using other params. Thanks for the articles. Will review.Pentheam
This can happen either because it sniffs the values of the parameters and variables or because it does greater simplifications. Examples of the greater simplifications would be collapsing X = @X OR @X IS NULL to X=@X and performing a seek See here or pushing predicates down further against a view with window functionsLeffert
Following your edit the Query Analyzer example uses variables, not parameters. the value of those is never sniffed except with RECOMPILE. In any event capture the execution plans and look at the differences.Leffert
I have a question...Will OPTION(RECOMPILE) rearrange the field names? I have a set of field names and another person added the Recompile and now the field names are in a different order.Jeter
P
216

There are times that using OPTION(RECOMPILE) makes sense. In my experience the only time this is a viable option is when you are using dynamic SQL. Before you explore whether this makes sense in your situation I would recommend rebuilding your statistics. This can be done by running the following:

EXEC sp_updatestats

And then recreating your execution plan. This will ensure that when your execution plan is created it will be using the latest information.

Adding OPTION(RECOMPILE) rebuilds the execution plan every time that your query executes. I have never heard that described as creates a new lookup strategy but maybe we are just using different terms for the same thing.

When a stored procedure is created (I suspect you are calling ad-hoc sql from .NET but if you are using a parameterized query then this ends up being a stored proc call) SQL Server attempts to determine the most effective execution plan for this query based on the data in your database and the parameters passed in (parameter sniffing), and then caches this plan. This means that if you create the query where there are 10 records in your database and then execute it when there are 100,000,000 records the cached execution plan may no longer be the most effective.

In summary - I don't see any reason that OPTION(RECOMPILE) would be a benefit here. I suspect you just need to update your statistics and your execution plan. Rebuilding statistics can be an essential part of DBA work depending on your situation. If you are still having problems after updating your stats, I would suggest posting both execution plans.

And to answer your question - yes, I would say it is highly unusual for your best option to be recompiling the execution plan every time you execute the query.

Predispose answered 1/1, 2014 at 3:41 Comment(7)
Yes, sp_updatestats did the trick. You hit the nail on the head when you mentioned a query initially run on a table with 10 records, and now the table has millions of records. That was my case exactly. I didn't mention it in the post because I didn't think it mattered. Fascinating stuff. Thanks again.Pentheam
It's the only way I found to work with table-variables, becuase SQL always think that there is one single row in. When it contains several thousands of rows it becomes a problem.Sheilahshekel
One interesting detail: updating statistics implicitly invalidates all cached plans which use these statistics, but only if statistics actually changed after the update action. So for highly skewed readonly tables, it seems an explicit OPTION (RECOMPILE) might be the only solution.Illuminating
would this mean, we have to 'update statistics' consistently?Weld
Thanks for the answer. I have a follow up question : Why doesn't server just updates statistics automaticlly ? Why do I need to scheldule a job to do it ?Indecision
You shouldn't schedule. it. Sql server will do it when it thinks it needs to. It doesn't always run it as soon as you load a ton of data in. So your stats will be off from data. It will catch up though.Pomade
I'll note that @AlexZhukovskiy's comment predates SQL 2019, which fixes this issue.Devotional
L
192

Often when there is a drastic difference from run to run of a query I find that it is often one of 5 issues.

  1. STATISTICS - Statistics are out of date. A database stores statistics on the range and distribution of the types of values in various column on tables and indexes. This helps the query engine to develop a "Plan" of attack for how it will do the query, for example the type of method it will use to match keys between tables using a hash or looking through the entire set. You can call Update Statistics on the entire database or just certain tables or indexes. This slows down the query from one run to another because when statistics are out of date, its likely the query plan is not optimal for the newly inserted or changed data for the same query (explained more later below). It may not be proper to Update Statistics immediately on a Production database as there will be some overhead, slow down and lag depending on the amount of data to sample. You can also choose to use a Full Scan or Sampling to update Statistics. If you look at the Query Plan, you can then also view the statistics on the Indexes in use such using the command DBCC SHOW_STATISTICS (tablename, indexname). This will show you the distribution and ranges of the keys that the query plan is using to base its approach on.

  2. PARAMETER SNIFFING - The query plan that is cached is not optimal for the particular parameters you are passing in, even though the query itself has not changed. For example, if you pass in a parameter which only retrieves 10 out of 1,000,000 rows, then the query plan created may use a Hash Join, however if the parameter you pass in will use 750,000 of the 1,000,000 rows, the plan created may be an index scan or table scan. In such a situation you can tell the SQL statement to use the option OPTION (RECOMPILE) or an SP to use WITH RECOMPILE. To tell the Engine this is a "Single Use Plan" and not to use a Cached Plan which likely does not apply. There is no rule on how to make this decision, it depends on knowing the way the query will be used by users.

  3. INDEXES - Its possible that the query haven't changed, but a change elsewhere such as the removal of a very useful index has slowed down the query.

  4. ROWS CHANGED - The rows you are querying drastically changes from call to call. Usually statistics are automatically updated in these cases. However if you are building dynamic SQL or calling SQL within a tight loop, there is a possibility you are using an outdated Query Plan based on the wrong drastic number of rows or statistics. Again in this case OPTION (RECOMPILE) is useful.

  5. THE LOGIC Its the Logic, your query is no longer efficient, it was fine for a small number of rows, but no longer scales. This usually involves more indepth analysis of the Query Plan. For example, you can no longer do things in bulk, but have to Chunk things and do smaller Commits, or your Cross Product was fine for a smaller set but now takes up CPU and Memory as it scales larger, this may also be true for using DISTINCT, you are calling a function for every row, your key matches don't use an index because of CASTING type conversion or NULLS or functions... Too many possibilities here.

In general when you write a query, you should have some mental picture of roughly how certain data is distributed within your table. A column for example, can have an evenly distributed number of different values, or it can be skewed, 80% of the time have a specific set of values, whether the distribution will varying frequently over time or be fairly static. This will give you a better idea of how to build an efficient query. But also when debugging query performance have a basis for building a hypothesis as to why it is slow or inefficient.

Lucindalucine answered 29/5, 2014 at 20:34 Comment(2)
thanks, friend. This is excellent information. I wouldn't have been able to understand your answer when I originally posted my question but now it makes perfect sense to me.Pentheam
PARAMETER SNIFFING is by far the biggest bane to my existence. I didn't even know about this command until a failed interview question. My solution to parameter sniffing has always been to hash the parameter values and append "AND {hash} = {hash}" so that the sql was always different for different values. A hack, but it worked.Kiele
S
39

UPDATE: if you are SQL Server 2019 or higher (this answer is almost 10 years old, and I think I was on a SQL Server 2012 instance when I wrote it), then please look into deferred table variable compilation

To add to the excellent list (given by @CodeCowboyOrg) of situations where OPTION(RECOMPILE) can be very helpful,

  1. Table Variables. When you are using table variables, there will not be any pre-built statistics for the table variable, often leading to large differences between estimated and actual rows in the query plan. Using OPTION(RECOMPILE) on queries with table variables allows generation of a query plan that has a much better estimate of the row numbers involved. I had a particularly critical use of a table variable that was unusable, and which I was going to abandon, until I added OPTION(RECOMPILE). The run time went from hours to just a few minutes. That is probably unusual, but in any case, if you are using table variables and working on optimizing, it's well worth seeing whether OPTION(RECOMPILE) makes a difference.
Splenetic answered 4/10, 2015 at 1:52 Comment(8)
I have a query with 5 table variables. On my machine it executes for more than half an hour. On my co-worker's machine it executes in <1 second. The machines have similar hardware and the same SQL Server version. If we both add OPTION(RECOMPILE) then it executes in 2 seconds on both machines. In all cases the execution test is carried out in SSMS. What could be causing this difference?Apothegm
Can you compare the execution plan for it on your machine and on your colleagues machine without Option(recompile)? That might show the source of the difference.Splenetic
for temporary tables, it is a same situation ?Minyan
@muflix: good question. I don't believe the effect is the same for temporary tables, since they have statistics and the engine should be making automatic recompile choices like for other tables, I believe (but am not certain). Maybe someone else knows with greater certainty.Splenetic
The statistics in temporary tables are not automatically updated or recompiled so the programmer needs to do that.Risa
If you are doing this to fix performance with your Table Variables, then you have a Code Smell and something else needs fixing. I use Table Variables all the time and have yet to add this Option. I've seen developers create Table Variables instead of including the query as a Derived Table sub-query (also good is using a Function, View, or CTE for reuse). If I'm doing massive ETL, then I go with Temp Tables (or a Staging Database). Keep it small (<1000 records) and use PK's and Indexes in your Variable definition to join on. sqlservercentral.com/blogs/should-i-use-optionrecompileMure
SQL 2019 mostly fixes this issue with Table Variable Deferred Compilation .Devotional
My answer was for SQL Server 2012--not surprised this has changed/gotten betterSplenetic
H
4

The very first actions before tunning queries is to defrag/rebuild the indexes and statistics, otherway you're wasting your time.

You must check the execution plan to see if it's stable (is the same when you change the parameters), if not, you might have to create a cover index (in this case for each table) (knowing th system you can create one that is usefull for other queries too).

as an example : create index idx01_datafeed_trans On datafeed_trans ( feedid, feedDate) INCLUDE( acctNo, tradeDate)

if the plan is stable or you can stabilize it you can execute the sentence with sp_executesql('sql sentence') to save and use a fixed execution plan.

if the plan is unstable you have to use an ad-hoc statement or EXEC('sql sentence') to evaluate and create an execution plan each time. (or a stored procedure "with recompile").

Hope it helps.

Hooded answered 23/10, 2015 at 12:49 Comment(0)
L
3

Necroing this question but there's an explanation that no-one seems to have considered.

STATISTICS - Statistics are not available or misleading

If all of the following are true:

  1. The columns feedid and feedDate are likely to be highly correlated (e.g. a feed id is more specific than a feed date and the date parameter is redundant information).
  2. There is no index with both columns as sequential columns.
  3. There are no manually created statistics covering both these columns.

Then sql server may be incorrectly assuming that the columns are uncorrelated, leading to lower than expected cardinality estimates for applying both restrictions and a poor execution plan being selected. The fix in this case would be to create a statistics object linking the two columns, which is not an expensive operation.

Loosestrife answered 7/3, 2019 at 8:59 Comment(0)
D
1

OPTION (RECOMPILE) is used in real word generation scenarios. I have used it to eliminate the smell of parameters and optimize big queries. This may be the answer to your problem, but the signs are that optimizing for the unknown (like local variables) may also solve the problem.

I certainly do not avoid the option just because there was a time bug that was fixed several years ago. The main risk of OPTION (RECOMPILE) is when it is used improperly, such as high-frequency requests.

Designer answered 23/8, 2021 at 9:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.