Performance impact of chained CTE vs Temp table

Asked 2/11, 2018 at 17:7 Answered 3/11, 2018 at 2:32

Solved sql sql-server t-sql common-table-expression sqlperformance

I have the following Chained CTE query (simplified):

;WITH CTE1
AS(    
    SELECT * FROM TableA
),
CTE2
AS(
    SELECT * FROM TableB b INNER JOIN CTE1 c ON b.id = c.id
)

SELECT * FROM CTE2

If I break CTE chain and store data of CTE1 into a temp table then the performance of the overall query improves (from 1 minute 20 seconds to 8 seconds).

;WITH CTE1
AS(    
    SELECT * FROM TableA
)

SELECT * INTO #Temp FROM CTE1

;WITH CTE2
AS(
    SELECT * FROM TableB b INNER JOIN #Temp c ON b.id = c.id
)

SELECT * FROM CTE2
DROP TABLE #Temp

There are complex queries in CTE1 and CTE2. I have just created a simplified version to explain here.

Should breaking CTE chair improve the performance?

SQL Server version: 2008 R2

Soto answered 2/11, 2018 at 17:7 Comment(3)

You may want to ask this question on dba.stackexchange.com. I'm not saying it's not relevant here, but DBAs typically know this information more readily than the average developer. – Disastrous 2/11, 2018 at 17:11

This is not uncommon. The optimizer treats the CTE as a normal subquery, so it may choose to produce an execution plan that doesn't involve materializing any data first, and explicitly materializing it can produce better results. (However, if you find a query as simple as this improves from it, it's likely you've got a problem with missing indexes or out-of-date statistics, so your example is likely over simplified.) Sometimes splitting out the CTE(s) into temp tables improves things, sometimes not doing that is the better choice. – Steradian 2/11, 2018 at 17:12

For any query of any real complexity, the optimizer has lots of different options when it comes to creating the execution plan. The more complex the query, the number of possible execution options... To the point where it's impossible to evaluate them all and it simply chooses a "good enough" plan. By braking the query into separate commands, you simplified things to the point where the optimizer was able to choose a better plan. – Christopherchristopherso 2/11, 2018 at 17:45

Obviously, it can, as you yourself have shown.

Why? The most obvious reason is that the optimizer knows the size of a temporary table. That gives it more information for optimizing the query. The CTE is just an estimate. So, the improvement you are seeing is due to the query plan.

Another reason would be if the CTE is referenced multiple times in the query. SQL Server does not materialize CTEs, so the definition code would be run multiple times.

Sometimes, you purposely materialize CTEs as temporary tables so you can add indexes to them. That can also improve performance.

All that said, I prefer to avoid temporary tables. The optimizer is usually pretty good.

Kampong answered 2/11, 2018 at 17:13 Comment(5)

I also prefer to delegate to optimizer the dirty work, but, for big queries, it's not enough. – Schiller 2/11, 2018 at 18:34

True. Often SQL Server calculates poor estimates for complex queries and a temporal table with the result of some specific part of the query improves drastically the performance – Labefaction 2/11, 2018 at 19:20

It's pretty good until it you get an optimizer timeout and a sub-par plan. Then, you have to decompose your query. – Solnit 3/11, 2018 at 2:25

@PittsburghDBA Yes, and what can start as an OK query with CTEs, can turn into a query that runs horribly in terms of performance as a database grows. My preference is the opposite of Gordon's: I prefer temporary tables to CTEs except for the really simple queries (i.e. a trivial CTE and just one CTE in a query), and for recursive queries where performance is still OK for your data set. This is for SQL Server, no idea about the implementation of CTEs in other RDMSs. – Willaims 3/11, 2018 at 2:53

Thanks for clarifying. – Soto 5/11, 2018 at 11:1

Consider cte1 is expensive

;WITH CTE1
AS(    
    SELECT * FROM TableA
)

SELECT * INTO #Temp FROM CTE1

Above guarantees cte1 is only run once.

The chained cte can evaluate cte1 multiple times.

And even with #temp you should consider index / PK and sort the insert.

Showplace answered 2/11, 2018 at 19:14 Comment(0)

This depends upon many factors. Always try to write the single statement, if you can. Premature optimization is the root of a lot of evil.

If you do experience a performance problem, these are some of the advantages to decomposing your single statement:

It can increase maintainability, which is one of many non-functional requirements, by reducing complexity.
It can yield a better plan, so long as the cost of the intermediate materialization and the time saved is less than the original cost.
The intermediate tables can be indexed.
Indexes, primary keys, and unique constraints are very helpful to the optimizer, not only for choosing join types, but also for estimating cardinality, which has a large effect on memory grants.
You can choose to apply optimizer hints, such as MAXDOP only to select statements, rather than one gigantic statement. This is especially helpful when you need to manipulate memory grants.
You can tune individual statements to eliminate spill to tempdb.
Depending upon the complexity and total execution time of your process, you can potentially release resource locks earlier, depending also upon which isolation level your statements run under.
If your query plan is poor, due to an optimizer time-out, using less complex individual statements may yield better overall results.

Solnit answered 3/11, 2018 at 2:32 Comment(0)

Recommended topics

Hot tags