Why is there a sort showing up in my execution plan?
Asked Answered
K

5

24

I have the sql query below that is running very slowly. I took a look at the execution plan and it is claiming that a sort on Files.OrderId is the highest cost operation (53%). Why would this be happening if I am not ordering by OrderId anywhere? Is my best bet to create an index on File.OrderId?

Execution plan if anyone is interested.

with custOrders as
(
    SELECT c.firstName + ' ' + c.lastname as Customer, c.PartnerId , c.CustomerId,o.OrderId,o.CreateDate, c.IsPrimary
    FROM Customers c
    LEFT JOIN CustomerRelationships as cr
        ON c.CustomerId = cr.PrimaryCustomerId
    INNER JOIN Orders as o
       ON c.customerid = o.customerid 
           OR (cr.secondarycustomerid IS NOT NULL AND o.customerid = cr.secondarycustomerid)
    where c.createdate >= @FromDate + ' 00:00' 
       AND c.createdate <= @ToDate + ' 23:59' 
),
 temp as
(
SELECT Row_number() 
         OVER ( 
           ORDER BY c.createdate DESC)                    AS 'row_number', 
       c.customerid as customerId, 
       c.partnerid as partnerId, 
       c.Customer, 
       c.orderid as OrderId, 
       c.createdate as CreateDate, 
       Count(f.orderid)                                   AS FileCount, 
       dbo.Getparentcustomerid(c.isprimary, c.customerid) AS ParentCustomerId, 
       au.firstname + ' ' + au.lastname                   AS Admin, 
       '' as blank, 
       0  as zero
FROM   custOrders c 
       INNER JOIN files f 
               ON c.orderid = f.orderid 
       INNER JOIN admincustomers ac 
               ON c.customerid = ac.customerid 
       INNER JOIN adminusers au 
               ON ac.adminuserid = au.id 
       INNER JOIN filestatuses s 
               ON f.statusid = s.statusid 
WHERE  ac.adminuserid IS NOT NULL 
       AND f.statusid NOT IN ( 5, 6 ) 
GROUP  BY c.customerid, 
          c.partnerid, 
          c.Customer, 
          c.isprimary, 
          c.orderid, 
          c.createdate, 
          au.firstname, 
          au.lastname 
)
Krugersdorp answered 8/1, 2013 at 16:25 Comment(6)
Error while validating source XMLMoncton
Check out this article, the 'How to avoid SORT operations' section.Limbate
Yeah I saw that. Does it really make a difference? You can still see the XML...Krugersdorp
Doesn't OVER imply a sort?Moncton
@njk yes, but not by OrderIdBromate
@njk, we are explicitly ordering within the OVER, but not by order idKrugersdorp
B
17

SQL Server has three algorithms to choose from when it needs to join two tables. The Nested-Loops-Join, the Hash-Join and the Sort-Merge-Join. Which one it selects it bases on cost estimates. In this case it figured, that based on the information it had available a Sort-Merge-Join was the right choice.

In SQL Server execution plans a Sort-Merge is split into two operators, the Sort and the Merge-Join, because the sort operation might not be necessary, for example if the data is sorted already.

For more information about joins check out my join series here: http://sqlity.net/en/1146/a-join-a-day-introduction/ The article about the Sort-Merg-Join is here: http://sqlity.net/en/1480/a-join-a-day-the-sort-merge-join/


To make your query faster, I first would look at indexes. You have a bunch of clustered index scans in the query. If you can replace a few of them with seeks you will be most likely better off. Also check if the estimates that SQL Server produces match the actual row counts in an actual execution plan. If they are far off, SQL Server often makes bad choices. So providing better statistics can help your query performance too.

Bromate answered 8/1, 2013 at 16:41 Comment(0)
P
6

SQL Server is performing the sort to enable the merge join between the dataset to the right of that sort operator and the records in the Orders table. Merge join itself is a very efficient way to join all the records in a dataset, but it requires that each dataset to be joined is sorted according to the join keys and in the same order.

Since the PK_Orders key is already ordered by OrderID, SQL Server decided to take advantage of that by sorting the other end of the join (the other stuff to the right of the sort) so that the two datasets can be merged together at that point in the plan. The common alternative to merge join is a hash join, but that wouldn't help you because you would instead have an expensive hash join operator instead of the sort and merge. The query optimizer has determined the sort and merge to be more efficient in this case.

The root cause of the expensive step in the plan is the need to combine all the records from the orders table into the dataset. Is there a way to limit the records coming from the files table? An index on files.statusid may be helpful if the records not in 5,6 are less than 10% of the total table size.

The QO thinks that most of the records are going to be filtered out at the end. Try to push as many of those filter conditions back to the record sources so that less records have to be handled in the middle of the plan.

EDIT: I forgot to mention, it is very helpful to have an execution plan that we can look at. Is there any way we can get an actual execution plan result to see the real number of records going through those operators? Sometimes the estimated record counts can be a little off.

EDIT: Looking deeper into the 2nd to last filter operator's predicate field, summarized:

c.CustomerId=o.CustomerId
OR o.CustomerId=cr.SecondaryCustomerId AND cr.SecondaryCustomerId IS NOT NULL

Looks like SQL Server is producing a cross join between all possible matching records between Orders and Customers up to this point in the query (the plan on the right of the 2nd to last filter operator) and then looking at each record with that condition to see if it does indeed match. Notice how the line going into the filter is really fat and the line coming out is really thin? That's because the estimated row count goes from 21k to 4 after that operator. Forget what I said earlier, this is probably the main problem in the plan. Even if there are indexes on these columns, SQL Server can't use them because the join condition is too complex. It's causing the plan to merge all the records together instead of seeking to just the ones you need because it can't use the full join predicate right away.

My first thought is to rephrase the CTE custOrders as a union of two datasets: one using CustomerId and one using SecondaryCustomerId to join. This will duplicate the work of the rest of the CTE but if it enables proper use of the indexes, it could be a big win.

Phosphatize answered 8/1, 2013 at 16:49 Comment(1)
Still looks like the original estimated plan. Did you edit the question or change the document on XML Playground?Phosphatize
O
3

I know this question is quite old, however I had this same issue and realised there was a completely different reason my tables suddenly slowed. The symptoms were the same, slow to update views that were previously lightning fast. "Sort" giving a cost of 40%. This solution may prove useful to someone, and it is simple. When joining tables, ensure you are joining on a "like for like" basis. I was joining two tables on ID. However in one table my ID was set as an int and in the other as nvarchar. I corrected this to have them both defined as the same type and the view is back to lightning speed.

hopefully this will help someone else to avoid spending a week trying to figure out what's wrong with SQL, when its really a PEBKAC moment.

(Problem Exists Between Keyboard And Chair)

Oozy answered 30/12, 2016 at 15:10 Comment(0)
J
1

I think the sort is occurring for this join:

FROM   custOrders c 
       INNER JOIN files f 
               ON c.orderid = f.orderid 

I would create an index on files that includes the columns orderid and statusid since the query also uses the statusid column.

You might also want to consider the following changes:

  1. You don't need "ac.adminuserid IS NOT NULL" as this is covered by the inner join between adminusers and admincustomers
  2. Change the test "f.statusid NOT IN ( 5, 6 )" to a positive condition (e.g. In) as negative conditions are more expensive to process.
Jejune answered 8/1, 2013 at 16:40 Comment(2)
Yeah, it's already doing a non-clustered index seek on Files. Is there any way to further reduce the record count coming from Files?Phosphatize
A non-clustered index seek is fine from a performance perspective. The other change to reduce the record count is to eliminate the NOT IN check against statusid as I recommended above.Jejune
T
0

When I met the same issue last time, two questions were raised in my mind

  • Why there is a sort-and-merge in execution plan?
  • Why it's slow and how to speed it up?

For the 1st one above, I found the answer here, by @sebastian-meine. I did go through his articles, especially for

  • The Nested Loops Join
  • The Sort Merge Join
  • The Hash Join

They're really helpful. And then I realized that in my case, it's slow because SQL Server would pick up the bad execution plan with a soft merge join and a nested loops join, rather than a hash join.

Actually there is a way to enforce SQL Server to take a specific join strategy, i.e. join hint.

See More: https://learn.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-join?view=sql-server-ver16

I simplify solve my second question by adding join hint into sql.

  • from t1 join t2 -> from t1 inner hash join t2
  • from t1 left join t2 -> from t1 left outer hash join t2
Tithing answered 23/5, 2024 at 7:45 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.