Why is performance increased when moving from a derived table to a temp table solution?

Asked 28/2, 2012 at 14:46 Answered 1/3, 2023 at 13:47

Solved sql-server sql-execution-plan database-performance

I'm reading "Dissecting SQL Server Execution Plans" from Grant Fritchey and it's helping me a lot to see why certain queries are slow.

However, I am stumped with this case where a simple rewrite performs quite a lot faster.

This is my first attempt and it takes 21 secs. It uses a derived table:

-- 21 secs
SELECT *
  FROM Table1 AS o JOIN( 
    SELECT col1
    FROM    Table1
    GROUP BY    col1
    HAVING  COUNT( * ) > 1
) AS i ON ON i.col1= o.col1

My second attempt is 3 times faster and simply moves out the derived table to a temp table. Now it's 3 times faster:

-- 7 secs
SELECT col1
INTO    #doubles
FROM    Table1
GROUP BY    col1
HAVING  COUNT( * ) > 1

SELECT *
FROM Table1 AS o JOIN #doubles AS i ON i.col1= o.col1

My main interest is into why moving from a derived table to a temp table improves performance so much, not on how to make it even faster.

I would be grateful if someone could show me how I can diagnose this issue using the (graphical) execution plan.

Xml Execution plan: https://www.sugarsync.com/pf/D6486369_1701716_16980

Edit 1

When I created statistics on the 2 columns that were specified in the group by and the optimizer started doing "the right thing", after giving up the procedure cache (don't forget that if you are a beginner!). I simplified the query in the question which was not a good simplification in retrospect. The attached sqlplan shows the 2 columns but this was not obvious.

The estimates are now a lot more accurate as is the performance which is up to par with the temp table solution. As you know the optimizer creates stats on single columns automatically (if not disabled) but 2 column statistics have to be create by the DBA.

A (non clustered) index on these 2 columns made the query perform the same but in this case a stat is just as good and it doesn't suffer the downside of index maintenance. I'm going forward with the 2 column stat and see how it performs. @Grant Do you know if the stats on an index are more reliable than that of a column stat?

Edit 2

I always follow up once a problem is solved on how a similar problem can be diagnosed faster in the future.

The problem here was that the estimated row couns were way of. The graphical execution plans shows these when you hover over a row but that's about it.

Some tools that can help:

SET STATISTICS PROFILE ON

I heard this one will become obsolete and be replaced by its XML variant but I still like the output which is in grid format. Here the big diff between columns "Rows" and "EstimateRows" would have shown the problem

External Tool: SQL Sentry Plan Explorer http://www.sqlsentry.net/

This is a nice tool especially if you are a beginner. It highlights problems

enter image description here

External Tool: SSMS Tools Pack http://www.ssmstoolspack.com/

A more general purpose tool but again directs the user to potential problems

enter image description here

Kind Regards, Tom

Couplet answered 28/2, 2012 at 14:46 Comment(12)

Without the query plan for both queries we can not tell why. Potentially it is related to having accurate statistics on the temp table which are incorrectly estimated on the derived table. – Arella 28/2, 2012 at 14:54

What is the purpose of selecting the results of the subquery into a temp table in the first example? Why not just do SELECT col1 FROM Table1 GROUP BY col1 HAVING COUNT( * ) > 1 in the subselect? – Zavala 28/2, 2012 at 14:55

Your first attempt is invalid syntax. You can't join onto a SELECT ... INTO. I presume the INTO #doubles is a mistake there? In any event probably different join strategies as it does not estimate the number of rows matching the HAVING accurately but when inserted into the #temp table it knows exactly the number of rows that are involved. Please post the plans. You could also evaluate ;WITH CTE AS (SELECT *, COUNT(*) OVER (PARTITION BY col1) AS C FROM Table1) SELECT * FROM CTE WHERE C > 1 – Biller 28/2, 2012 at 15:6

Damn you guys are fast :) Indeed, the select into in the first query was not ment to be. I added the graphical and xml query plans as links. – Couplet 28/2, 2012 at 15:25

From the image you posted, it looks like the query cost is almost the same for the first query (47% of the batch) and the next two (which add up to 53% of the batch). Strange that the first query takes 3 times longer... – Zavala 28/2, 2012 at 16:0

Hello @rsbarro. Indeed but these numbers are indications as I understand it. – Couplet 28/2, 2012 at 16:15

I can’t see the plan, just the graphic. Without the plan itself, I’d just be guessing as to what’s going on. Things to look at, what's the reason for early termination on the first plan? What are the stats like on the table that you're using to load to the temp table? If the stats are out of date, loading to the temp table could be giving you a cleaner set of stats. Again, just looking at the graphics so these are guesses. – Deist 28/2, 2012 at 17:11

Are you sure its 3 times faster? It looks like the first way costs 47% of the batch and the second way costs 35+18 = 53% so its only faster by 6% – Cankerworm 28/2, 2012 at 18:21

Those values are estimated costs to the execution plans, not a measure of speed. – Deist 28/2, 2012 at 18:58

+1 Grant. On the first query, the plan shows a hash join, but the pipe width from the 2 clustered scans look nearly identical. With hash joins, I expect one pipe to be tiny and the other to be huge. To me, this implies dated or missing stats, which result in suboptimal plans (poor join algorithm choice, insufficient memory preallocated for sorts and joins, etc). Like Grant pointed out, the graphic doesn't fully explain the plan (estimated row count, actual row count...). – Finis 28/2, 2012 at 18:59

I made a link available to the XML plan. good comment about the has join as it indeed makes sense that 1 pipe is best small for good performance. It seems out of date stats are directing the optimizer to do a hash join while it should be considering an other strategy – Couplet 29/2, 2012 at 8:41

@Couplet No differences in stats between just stats and stats in an index. They're all still stats. – Deist 5/3, 2012 at 17:10

Looking at the values for the first execution plan, it looks like it's statistics. You have an estimated number of rows at 800 and an actual of 1.2 million. I think you'll find that updating the statistics will change the way the first query's plan is generated.

Deist answered 28/2, 2012 at 18:54 Comment(3)

We updated all the statistics but the estimates are still off. I'm still thinking it has to do with bad statistics though. Do you have clue at what we could look next? – Couplet 1/3, 2012 at 10:15

did you update the stats with a FULL SCAN? If you did a sampled update it might still show a disparity. If not, then something else is going on that I'm not seeing from the plan there. – Deist 2/3, 2012 at 16:3

Creating stats on the table with a full scan did not improve the execution plan. You were right that SQL Server did not have the right estimates to come up with a good execution plan tough. I updated the question with the solution. – Couplet 5/3, 2012 at 15:47

bumping a very old post but still relevant as we ran into this as well.

Our query was way too long (6.5 hrs) and used a derived table on some bigger tables. The estimates were that the derived table produced 1 row and so did everything on top of that.

As this was clearly not true we also updated all statistics with full scan. No result. When we had the actual plan it was massivly underestimating the number of rows and doing a nested loop on 20 million by 20 million rows that took up most of the time. Also huge spills to disk as well. The expected parallel query was not there, all was executed single threaded. When i took an estimated plan of the derived table (isolated query) it directly showed a parallel plan.

Putting the derived table in a temp table and using that for the main query did everything parallel and wasfinished in 30 minutes with adding anindex on the temp table in the batch.

I am thinking that derived tables are threated like a black box and therefore get the guessed 1 row return. This will affect your query more and more as the actual numbers of rows deviate more.

EDIT: a simple setup with a few of the large tables in a derived table was going parallel. So it is not a black box where the CBO cant get grip on. A comparison with an identical query with just plain joins was faster, the derived query had 58% cost compared to the regular joins way (with 42% cost relative to the batch, so looking to be faster than derived). Still puzzled about the original query going bad tho...

Downing answered 1/3, 2023 at 13:47 Comment(0)

Recommended topics

Hot tags