SQL Server Query Optimisation - Unexpected slowness in a simple query
Asked Answered
S

2

9

Possible explanation is here in the comment

In SQL Server 2014 Enterprise Edition (64-bit) - I am trying to read from a View. A standard query contains just an ORDER BYand OFFSET-FETCH clause like this.

Approach 1

SELECT

    *

FROM Metadata
ORDER BY
    AgeInHours      ASC, 
    RankingPoint    DESC, 
    PublishDate     DESC
OFFSET 150000 ROWS
FETCH NEXT 40 ROWS ONLY

However, this fairly simple query performs almost 9 times slower (noticable when skipping large number of rows like 150k) than the following query which returns the same result.

In this case I am reading the primary key first and then using that as a parameter for WHERE...IN function

Approach 2

SELECT 
    * 
FROM Metadata
WHERE NewsId IN (
    SELECT

        NewsId

    FROM Metadata
    ORDER BY
        AgeInHours      ASC, 
        RankingPoint    DESC, 
        PublishDate     DESC
    OFFSET 150000 ROWS
    FETCH NEXT 40 ROWS ONLY
)
ORDER BY
    AgeInHours      ASC, 
    RankingPoint    DESC, 
    PublishDate     DESC

Bench-marking these two shows this difference

(40 row(s) affected)

 SQL Server Execution Times:
   CPU time = 14748 ms,  elapsed time = 3329 ms.

(40 row(s) affected)

 SQL Server Execution Times:
   CPU time = 3828 ms,  elapsed time = 469 ms.

I have indexes on the primary key, PubilshDate and their fragmentation is very low. I have also tried to run similar queries against the database table, but in every cases the second approach yields great performance gains. I have also tested this on SQL Server 2012.

Can someone explain what is going on?

Schema

Schema

Approach 1: Execution Plan

Approach 1: Execution Plan

Approach 2: Execution Plan (Left part)

Approach 2: Execution Plan (Left part)

Approach 2: Execution Plan (Right part)

Approach 2: Execution Plan (Right part)

Splint answered 24/8, 2016 at 10:59 Comment(17)
If you run first approach and select only ID column, what happens? I suspect the * asterisk slows down the query because it forces it to scan the entire columns .Military
Much faster (almost 9 times, lets say), as I reckon it only seeks the indexed column (Id). The puzzling thing is, in the second query it is still reading the same amount of data when I used WHERE-IN clause - but isn't taking time. What is happening behind the scene?Splint
Can you include the query plans and table schema?Presumption
@destination-data will post in a moment, thanks.Splint
Please add Schema where you added "*" how many columns should be included ?Accredit
@destination-data both the execution plans and schema has been added in the post, please note the primary key name is NewsId, i referred this to Id for simplification (updated the post).Splint
@Accredit I need all the columns, it's just a view. I think this happens in general to all quries in (other tables as well) when you seek large number of rows if certain condition is met. Just do not know what it is. I have tried this in another table with large data. So the schema might not be the problem here (IMHO).Splint
@Accredit as a matter of fact, if I just add just a second column with the primary key in the query (where I used *), the query slows down.Splint
@Military Even if i specify the columns i want, instead of *, the query slows down. BTW, I am doing the same in the second query.Splint
please post execution plans as xml in paste bin and share the linkCounterirritant
The schema picture doesn't show the gold key next to your primary key. Is it possible SQL Server is viewing your table as a heap? This could affect the execution plan.Presumption
Looks like this slow down is because of the volumes of data sorted. First approach leads to sorting wider rows with all view columns from [NewsItems] included while the second sorts only 4 columns wide datasetJohanajohanan
@Johanajohanan This seems to be the most plausible explanation, as I tried the first (slow) query with only one ORDER BY column, the execution time went down to 289ms. Does this mean - in such situation (large datasets, many ORDER BY columns) - we should always read the primary key and pass it in the WHERE...IN clause, instead of running a simple SELECT * FROM ... ORDER BY query?. Is there any article you'd recommend for further reading?Splint
@destination-data Sorry, I am running the query against a view. Please see the above comment, I guess I've got a sensible explanation.Splint
@IOptimise, I don't think so. It totally depends on query optimizer internals which we sure do not know. Some queries may show no perfomance gain at all.Johanajohanan
Have you tried covering index ......... Please try to create covering index that may resolve your problem.Accredit
@Accredit I Use SSMS's 'Display Estimated Execution Plan' feature and have created all the suggested covering index by this. Am I missing anything more?Splint
Q
1

For differently structured queries with even same result set you get different query plans with different approach and query cost. That is common for variety of SQL RDBMS implementations.

Basically in sample above when selecting small part of data from large table is good approach first to reduce and minimize number of rows in result and then select full rows with all columns just like your 2. query.

Another approach is to build exact proper index for reducing result set in first step. In query above probably columns from ORDER BY clause in just same column and sort order could be a solution.

(You didn't sent structure of indexes mentioned in query plans I can just imagine what is hidden behind their names.)

You can also use SQL index hinting to direct SQL optimizer to specific index which you consider as best for task in case SQL optimizer doesn't do the job.

Quintain answered 29/11, 2016 at 9:56 Comment(0)
F
0

When you execute a query the engine look up for an index that could be used in order to get the best performance. Your approach 1 is using an index which doesn't include all columns in the SELECT statement, this cause the Key Lookup in the query plan, in my experience this always get a lower performance that use only indexed columns in your SELECT statement.

You can see the difference if you create an index for AgeInHours, RankingPoint, PublishDate and INCLUDE all the columns (recommended only for testing purposes).

For your second approach you can even get a better performance if you use a CTE and then make a JOIN instead of WHERE with IN or a temp table with index if you have a millions of rows.

Feldman answered 25/8, 2016 at 23:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.