Avoid sort operator in index plan
Asked Answered
C

3

9

I have two tables [LogTable] and [LogTable_Cross].

Below is the schema and script to populate them:

 --Main Table

 CREATE TABLE [dbo].[LogTable]
    (
      [LogID] [int] NOT NULL
                    IDENTITY(1, 1) ,
      [DateSent] [datetime] NULL,
    )
 ON [PRIMARY]
GO
 ALTER TABLE [dbo].[LogTable] ADD CONSTRAINT [PK_LogTable] PRIMARY KEY CLUSTERED  ([LogID]) ON [PRIMARY]
GO
 CREATE NONCLUSTERED INDEX [IX_LogTable_DateSent] ON [dbo].[LogTable] ([DateSent] DESC) ON [PRIMARY]
GO
 CREATE NONCLUSTERED INDEX [IX_LogTable_DateSent_LogID] ON [dbo].[LogTable] ([DateSent] DESC) INCLUDE ([LogID]) ON [PRIMARY]
GO


--Cross table

 CREATE TABLE [dbo].[LogTable_Cross]
    (
      [LogID] [int] NOT NULL ,
      [UserID] [int] NOT NULL
    )
 ON [PRIMARY]
GO
 ALTER TABLE [dbo].[LogTable_Cross] WITH NOCHECK ADD CONSTRAINT [FK_LogTable_Cross_LogTable] FOREIGN KEY ([LogID]) REFERENCES [dbo].[LogTable] ([LogID])
GO
 CREATE NONCLUSTERED INDEX [IX_LogTable_Cross_UserID_LogID]
 ON [dbo].[LogTable_Cross] ([UserID])
 INCLUDE ([LogID])
GO


-- Script to populate them
 INSERT INTO [LogTable]
        SELECT TOP 100000
                DATEADD(day, ( ABS(CHECKSUM(NEWID())) % 65530 ), 0)
        FROM    sys.sysobjects
                CROSS JOIN sys.all_columns


 INSERT INTO [LogTable_Cross]
        SELECT  [LogID] ,
                1
        FROM    [LogTable]
        ORDER BY NEWID()

 INSERT INTO [LogTable_Cross]
        SELECT  [LogID] ,
                2
        FROM    [LogTable]
        ORDER BY NEWID()

 INSERT INTO [LogTable_Cross]
        SELECT  [LogID] ,
                3
        FROM    [LogTable]
        ORDER BY NEWID()


GO

I want to select all those logs (from LogTable) which has given userid (user id will be checked from cross table LogTable_Cross) with datesent desc.

SELECT  DI.LogID              
FROM    LogTable DI              
        INNER JOIN LogTable_Cross DP ON DP.LogID = DI.LogID  
        WHERE  DP.UserID = 1  
ORDER BY DateSent DESC

After running this query here is my execution plan: enter image description here

As you can see there is a sort operator coming in role and that should be probably because of following line "ORDER BY DateSent DESC"

My question is that why that Sort operator is coming in the plan even though I have the following index applied on the table

GO
 CREATE NONCLUSTERED INDEX [IX_LogTable_DateSent] ON [dbo].[LogTable] ([DateSent] DESC) ON [PRIMARY]
GO
 CREATE NONCLUSTERED INDEX [IX_LogTable_DateSent_LogID] ON [dbo].[LogTable] ([DateSent] DESC) INCLUDE ([LogID]) ON [PRIMARY]
GO

On the other hand if I remove the join and write the query in this way:

SELECT  DI.LogID              
FROM    LogTable DI              
  --      INNER JOIN LogTable_Cross DP ON DP.LogID = DI.LogID  
        --WHERE  DP.UserID = 1  
ORDER BY DateSent DESC

the plan changes to

enter image description here

i.e Sort operator is removed and the plan is showing that my query is using my non clustered index.

So is that a way to remove "Sort" operator in the plan for my query even if I am using join.

EDIT:

I went further and limited the "Max Degree of Parallelism" to 1

enter image description here

Ran the following query again:

SELECT  DI.LogID              
FROM    LogTable DI              
        INNER JOIN LogTable_Cross DP ON DP.LogID = DI.LogID  
        WHERE  DP.UserID = 1  
ORDER BY DateSent DESC

and the plan is still having that Sort operator:

enter image description here

Edit 2

Even if I have the following index as suggested:

 CREATE NONCLUSTERED INDEX [IX_LogTable_Cross_UserID_LogID_2]
 ON [dbo].[LogTable_Cross] ([UserID], [LogID])

the plan is still having the Sort operator: enter image description here

Cocklebur answered 19/4, 2017 at 8:54 Comment(2)
this question is very interesting and very well framedDiscursive
i have reraised this on DBA.SE with my observations as well:dba.stackexchange.com/questions/171476/…Discursive
I
2

The second query of yours does not contain the UserId condition and therefore it is not an equivalent query. The reason why the first query is not covered by your indexes on LogTable is the fact, that UserId is not present in them (and you need to perform the join as well). Therefore, SQL Server has to join the tables (Hash Join, Merge Join or Nested-Loop join). SQL Server correctly selects the Hash Join, since the intermediate results are large and they are not sorted according to the LogID. If you give them the intermediate result sorted according to the LogID (your second edit) then he uses merge join, however, sort according to the DateSend is stil needed. The only solution without sort is to create an indexed materialized view:

CREATE VIEW vLogTable
WITH SCHEMABINDING
AS
   SELECT  DI.LogID, DI.DateSent, DP.UserID           
   FROM dbo.LogTable DI              
   INNER JOIN dbo.LogTable_Cross DP ON DP.LogID = DI.LogID  

CREATE UNIQUE CLUSTERED INDEX CIX_vCustomerOrders 
   ON dbo.vLogTable(UserID, DateSent, LogID);

The view has to be used with noexpand hint, so the optimizer can find the CIX_vCustomerOrders index:

SELECT  LogID              
FROM dbo.vLogTable   WITH(NOEXPAND)
    WHERE  UserID = 1  
ORDER BY DateSent DESC

This query is equivalent query to your first query. You may check the correctness if you insert the following row:

INSERT INTO LogTable VALUES (CURRENT_TIMESTAMP)

then my query still returns the correct result (10000 rows), however, your second query returns 10001 rows. You may try to delete or insert some other rows and the view will still be up-to-date and you recieve correct results from my query.

Iasis answered 19/4, 2017 at 14:3 Comment(2)
Can you please elaborate your answer with an example specific to my data?Cocklebur
I did. Does it answer your question?Calgary
H
1

You have sort operation when you have the join because of the parallelism in the previous steps. When SQL Server processes the records in multiple threads, the order is not determined anymore. Each thread just pushes the results to the next item in the pipeline (Hash match in your case).

Since the order is not determined and you are asking for an order, SQL Server has to sort the result.

You can try to add the MAXDOP = 1 hint to force SQL Server to run the query using only one thread. This might help in this case, but can cause performance degradation too.

The second query can be satisfied using an index scan and the index is ordered and that order is the same as the requested one. The records (keys) in the index are ordered by definition. SQL Server guessed that running the query on one thread and just reading the data using the index is more beneficial than reading the data using multiple threads and sorting them later.

Hannover answered 19/4, 2017 at 9:14 Comment(1)
Sorry, it took me some time to come back, but I think plain_talk and Shnugo are added the extra info you need.Hannover
E
1

I think the reason might be here:

CREATE NONCLUSTERED INDEX [IX_LogTable_Cross_UserID_LogID]
 ON [dbo].[LogTable_Cross] ([UserID])
 INCLUDE ([LogID])

Your table does not have an index on LogID. But this column is used for the JOIN. INCLUDE LogID does not mean, that this index is searchable for LogID. This is just a little bit faster, if you search for UserId and you need the corresponding LogID (no need to look this up)

As you are joining over LogID it should be fastest to pre-sort the list as there is no index available...

If space does not matter (and insert/update performance, you might add an index vice-versa, but I would suggest do use a two column clustered key with LogId in the first position and - if needed - a simple non-clustered index on UserId.

Expressage answered 19/4, 2017 at 9:29 Comment(3)
Please see my EDIT 2Cocklebur
@Cocklebur Your new index - again - has UserId in the first place... Try it with LogId in the first place... Is your real table really consisting of two columns only? a clustered PK (LogId,UserId) stores your data physically sorted. A clustered key is the table itself. It is including all other columns...Expressage
I created the following index " CREATE NONCLUSTERED INDEX [IX_LogTable_Cross_LogID_UserID] ON [dbo].[LogTable_Cross] ([LogID], [UserID])" but the "Sort" operator is still there.Cocklebur

© 2022 - 2024 — McMap. All rights reserved.