CTE very slow when Joined
Asked Answered
H

3

15

I have posted something similar before, but I am approaching this from a different direction now so I opened a new question. I hope this is OK.

I have been working with a CTE that creates a sum of charges based on a Parent Charge. The SQL and details can be seen here:

CTE Index recommendations on multiple keyed table

I don't think I am missing anything on the CTE, but I am getting a problem when I use it with a big table of data (3.5 million rows).

The table tblChargeShare contains some other information that I need, such as an InvoiceID, so I placed my CTE in a view vwChargeShareSubCharges and joined it to the table.

The query:

Select t.* from vwChargeShareSubCharges t
inner join 
tblChargeShare  s 
on t.CustomerID = s.CustomerID 
and t.MasterChargeID = s.ChargeID 
Where  s.ChargeID = 1291094

Returns a result in a few ms.

The query:

Select ChargeID from tblChargeShare Where InvoiceID = 1045854

Returns 1 row:

1291094

But the query:

Select t.* from vwChargeShareSubCharges t
inner join 
tblChargeShare  s 
on t.CustomerID = s.CustomerID 
and t.MasterChargeID = s.ChargeID 
Where  InvoiceID = 1045854

Takes 2-3 minutes to run.

I saved the execution plans and loaded them into SQL Sentry. The Tree for the fast query looks like this:

Fast Query

The plan from the slow query is:

Slow Query

I have tried reindexing, running the query through tuning advisor and various combinations of sub queries. Whenever the join contains anything other than the PK, the query is slow.

I had a similar question here:

SQL Server Query time out depending on Where Clause

Which used functions to do the summimg of child rows instead of a CTE. This is the rewrite using CTE to try and avoid the same problem I am now experiencing. I have read the responses in that answer but I am none the wiser - I read some information about hints and parameters but I can't make it work. I had thought that rewriting using a CTE would solve my problem. The query is fast when running on a tblCharge with a few thousand rows.

Tested in both SQL 2008 R2 and SQL 2012

Edit:

I have condensed the query into a single statement, but the same issue persists:

WITH RCTE AS
(
SELECT  ParentChargeId, s.ChargeID, 1 AS Lvl, ISNULL(TotalAmount, 0) as TotalAmount,  ISNULL(s.TaxAmount, 0) as TaxAmount,  
ISNULL(s.DiscountAmount, 0) as DiscountAmount, s.CustomerID, c.ChargeID as MasterChargeID
from tblCharge c inner join tblChargeShare s
on c.ChargeID = s.ChargeID Where s.ChargeShareStatusID < 3 and ParentChargeID is NULL

UNION ALL

SELECT c.ParentChargeID, c.ChargeID, Lvl+1 AS Lvl, ISNULL(s.TotalAmount, 0),  ISNULL(s.TaxAmount, 0),  ISNULL(s.DiscountAmount, 0) , s.CustomerID 
, rc.MasterChargeID 
from tblCharge c inner join tblChargeShare s
on c.ChargeID = s.ChargeID
INNER JOIN RCTE rc ON c.PArentChargeID = rc.ChargeID and s.CustomerID = rc.CustomerID  Where s.ChargeShareStatusID < 3 
)

Select MasterChargeID as ChargeID, rcte.CustomerID, Sum(rcte.TotalAmount) as TotalCharged, Sum(rcte.TaxAmount) as TotalTax, Sum(rcte.DiscountAmount) as TotalDiscount
from RCTE inner join tblChargeShare s on rcte.ChargeID = s.ChargeID and RCTE.CustomerID = s.CustomerID 
Where InvoiceID = 1045854
Group by MasterChargeID, rcte.CustomerID 
GO

Edit: More playing around,I just don't understand this.

This query is instant (2ms):

Select t.* from
vwChargeShareSubCharges t
Where  t.MasterChargeID = 1291094

Whereas this takes 3 minutes:

DECLARE @ChargeID int = 1291094

Select t.* from
vwChargeShareSubCharges t
Where  t.MasterChargeID = @ChargeID

Even if I put heaps of numbers in an "In", the query is still instant:

Where  t.MasterChargeID in (1291090, 1291091, 1291092, 1291093,  1291094, 1291095, 1291096, 1291097, 1291098, 1291099, 129109)

Edit 2:

I can replicate this from scratch using this example data:

I have created some dummy data to replicate the issue. It isn't so significant, as I only added 100,000 rows, but the bad execution plan still happens (run in SQLCMD mode):

CREATE TABLE [tblChargeTest](
[ChargeID] [int] IDENTITY(1,1) NOT NULL,
[ParentChargeID] [int] NULL,
[TotalAmount] [money] NULL,
[TaxAmount] [money] NULL,
[DiscountAmount] [money] NULL,
[InvoiceID] [int] NULL,
CONSTRAINT [PK_tblChargeTest] PRIMARY KEY CLUSTERED 
(
[ChargeID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,     ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
END
GO

Insert into tblChargeTest
(discountAmount, TotalAmount, TaxAmount)
Select ABS(CHECKSUM(NewId())) % 10, ABS(CHECKSUM(NewId())) % 100, ABS(CHECKSUM(NewId())) % 10
GO 100000

Update tblChargeTest
Set ParentChargeID = (ABS(CHECKSUM(NewId())) % 60000) + 20000
Where ChargeID = (ABS(CHECKSUM(NewId())) % 20000)
GO 5000

CREATE VIEW [vwChargeShareSubCharges] AS
WITH RCTE AS
(
SELECT  ParentChargeId, ChargeID, 1 AS Lvl, ISNULL(TotalAmount, 0) as TotalAmount,   ISNULL(TaxAmount, 0) as TaxAmount,  
ISNULL(DiscountAmount, 0) as DiscountAmount,  ChargeID as MasterChargeID
FROM tblChargeTest Where ParentChargeID is NULL

UNION ALL

SELECT rh.ParentChargeID, rh.ChargeID, Lvl+1 AS Lvl, ISNULL(rh.TotalAmount, 0),    ISNULL(rh.TaxAmount, 0),  ISNULL(rh.DiscountAmount, 0) 
, rc.MasterChargeID 
FROM tblChargeTest rh
INNER JOIN RCTE rc ON rh.PArentChargeID = rc.ChargeID --and rh.CustomerID =  rc.CustomerID 
)

Select MasterChargeID,  ParentChargeID, ChargeID, TotalAmount, TaxAmount, DiscountAmount , Lvl
FROM  RCTE r 
GO

Then run these two queries:

--Slow Query:
Declare @ChargeID int = 60900

Select *
from [vwChargeShareSubCharges]
Where MasterChargeID = @ChargeID

--Fast Query:
Select *
from [vwChargeShareSubCharges]
Where MasterChargeID = 60900
Hekate answered 2/4, 2014 at 9:33 Comment(2)
Just a quick thought...If you take your query Select t.* from vwChargeShareSubCharges t, replace the view vwChargeShareSubCharges with its actual tsql definition, join to the other tables as appropriate, and run the query, is it any faster?Cachucha
Exactly the same if I paste in the full CTE.Hekate
W
16

The best SQL Server can do for you here is to push the filter on ChargeID down into the anchor part of the recursive CTE inside the view. That allows a seek to find the only row you need to build the hierarchy from. When you provide the parameter as a constant value SQL Server can make that optimization (using a rule called SelOnIterator, for those who are interested in that sort of thing):

Pushed predicate with a constant value

When you use a local variable it can not do this, so the predicate on ChargeID gets stuck outside the view (which builds the full hierarchy starting from all NULL ids):

Stuck Predicate

One way to get the optimal plan when using a variable is to force the optimizer to compile a fresh plan on every execution. The resulting plan is then tailored to the specific value in the variable at execution time. This is achieved by adding an OPTION (RECOMPILE) query hint:

Declare @ChargeID int = 60900;

-- Produces a fast execution plan, at the cost of a compile on every execution
Select *
from [vwChargeShareSubCharges]
Where MasterChargeID = @ChargeID
OPTION (RECOMPILE);

A second option is to change the view into an inline table function. This allows you to specify the position of the filtering predicate explicitly:

CREATE FUNCTION [dbo].[udfChargeShareSubCharges]
(
    @ChargeID int
)
RETURNS TABLE AS RETURN
(
  WITH RCTE AS
  (
  SELECT  ParentChargeID, ChargeID, 1 AS Lvl, ISNULL(TotalAmount, 0) as TotalAmount,   ISNULL(TaxAmount, 0) as TaxAmount,  
  ISNULL(DiscountAmount, 0) as DiscountAmount,  ChargeID as MasterChargeID
  FROM tblChargeTest 
  Where ParentChargeID is NULL 
  AND ChargeID = @ChargeID -- Filter placed here explicitly

  UNION ALL

  SELECT rh.ParentChargeID, rh.ChargeID, Lvl+1 AS Lvl, ISNULL(rh.TotalAmount, 0),    ISNULL(rh.TaxAmount, 0),  ISNULL(rh.DiscountAmount, 0) 
  , rc.MasterChargeID 
  FROM tblChargeTest rh
  INNER JOIN RCTE rc ON rh.ParentChargeID = rc.ChargeID --and rh.CustomerID =  rc.CustomerID 
  )

  Select MasterChargeID,  ParentChargeID, ChargeID, TotalAmount, TaxAmount, DiscountAmount , Lvl
  FROM  RCTE r 
)

Use it like this:

Declare @ChargeID int = 60900

select *
from dbo.udfChargeShareSubCharges(@ChargeID)

The query can also benefit from an index on ParentChargeID.

create index ix_ParentChargeID on tblChargeTest(ParentChargeID)

Here is another answer about a similar optimization rule in a similar scenario. Optimizing Execution Plans for Parameterized T-SQL Queries Containing Window Functions

Wulfenite answered 4/4, 2014 at 6:21 Comment(3)
Thanks. I will play around with these today and see what I can achieve. Thanks for detailed answer.Hekate
OPTION (RECOMPILE) works perfectly for the @ChargeID variable, but still painfully slow if I join to the CTE in any way. I have added the InvoiceID key to the grounding table and created a TABLE function as suggested here and it is nice and fast.Hekate
I had a similar performing issue and used your suggestion of moving the cte to a temp table and the difference is huge. The results came back immediately instead of taking several seconds. It would be great to know what is going on with this, but the temp table sure work way fasterWeigela
M
13

Next to get to a solution, I would recommend to SELECT INTO the CTE into e temp table and join from there. From personal experience joining with CTE my query was returning for 5mins while simply inserting the data generated by CTE into a temp table brought it down to just 4secs. I was actually joining two CTEs together but I guess this would apply to all long running queries when a CTE is joined to a LONG table (especially outer joins).


    --temp tables if needed to work with intermediate values
    If object_id('tempdb..#p') is not null
    drop table #p

    ;WITH cte as ( 
    select * from t1
    )

    select * 
    into #p
    from cte

    --then use the temp table as you would normally use the CTE
    select * from #p
Micheal answered 25/10, 2014 at 12:45 Comment(2)
Thanks very much for this suggestion. I had a query joining two CTEs which I looked at for a while and couldn't see any readily apparent issues. When I made one of them a temp table, the time reduced from 3+ minutes to 2 seconds. All I can say to MS is (HSJ/MoG), maybe it's on the klugey side, but why not implement it internally that way as well if it brings ~2 orders of magnitude performance improvement???Boswell
If you are not allowed to create tables, table variables come to your rescue: red-gate.com/simple-talk/databases/sql-server/…Dextro
T
0

Using a CTE can occasionally cause the compiler to take a wrong turn: taking the same query and moving it from a CTE to a subquery took me from "not a single row after 5 minutes" to "100 rows in well under a second".

Tieshatieup answered 4/4, 2024 at 17:40 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.