I'm reading "Dissecting SQL Server Execution Plans" from Grant Fritchey and it's helping me a lot to see why certain queries are slow.
However, I am stumped with this case where a simple rewrite performs quite a lot faster.
This is my first attempt and it takes 21 secs. It uses a derived table:
-- 21 secs
SELECT *
FROM Table1 AS o JOIN(
SELECT col1
FROM Table1
GROUP BY col1
HAVING COUNT( * ) > 1
) AS i ON ON i.col1= o.col1
My second attempt is 3 times faster and simply moves out the derived table to a temp table. Now it's 3 times faster:
-- 7 secs
SELECT col1
INTO #doubles
FROM Table1
GROUP BY col1
HAVING COUNT( * ) > 1
SELECT *
FROM Table1 AS o JOIN #doubles AS i ON i.col1= o.col1
My main interest is into why moving from a derived table to a temp table improves performance so much, not on how to make it even faster.
I would be grateful if someone could show me how I can diagnose this issue using the (graphical) execution plan.
Xml Execution plan: https://www.sugarsync.com/pf/D6486369_1701716_16980
Edit 1
When I created statistics on the 2 columns that were specified in the group by and the optimizer started doing "the right thing", after giving up the procedure cache (don't forget that if you are a beginner!). I simplified the query in the question which was not a good simplification in retrospect. The attached sqlplan shows the 2 columns but this was not obvious.
The estimates are now a lot more accurate as is the performance which is up to par with the temp table solution. As you know the optimizer creates stats on single columns automatically (if not disabled) but 2 column statistics have to be create by the DBA.
A (non clustered) index on these 2 columns made the query perform the same but in this case a stat is just as good and it doesn't suffer the downside of index maintenance. I'm going forward with the 2 column stat and see how it performs. @Grant Do you know if the stats on an index are more reliable than that of a column stat?
Edit 2
I always follow up once a problem is solved on how a similar problem can be diagnosed faster in the future.
The problem here was that the estimated row couns were way of. The graphical execution plans shows these when you hover over a row but that's about it.
Some tools that can help:
- SET STATISTICS PROFILE ON
I heard this one will become obsolete and be replaced by its XML variant but I still like the output which is in grid format. Here the big diff between columns "Rows" and "EstimateRows" would have shown the problem
- External Tool: SQL Sentry Plan Explorer http://www.sqlsentry.net/
This is a nice tool especially if you are a beginner. It highlights problems
- External Tool: SSMS Tools Pack http://www.ssmstoolspack.com/
A more general purpose tool but again directs the user to potential problems
Kind Regards, Tom
SELECT col1 FROM Table1 GROUP BY col1 HAVING COUNT( * ) > 1
in the subselect? – ZavalaSELECT ... INTO
. I presume theINTO #doubles
is a mistake there? In any event probably different join strategies as it does not estimate the number of rows matching theHAVING
accurately but when inserted into the#temp
table it knows exactly the number of rows that are involved. Please post the plans. You could also evaluate;WITH CTE AS (SELECT *, COUNT(*) OVER (PARTITION BY col1) AS C FROM Table1) SELECT * FROM CTE WHERE C > 1
– Biller