Simple Self Join Query Bad Performance
Asked Answered
M

3

5

Could anyone advice on how do I improve the performance of the following query. Note, the problem seems to be caused by where clause.

Data (table contains a huge set of rows - 500K+, the set of parameters it's called with assums the return of 2-5K records per query, which takes 8-10 minutes currently):

USE [SomeDb]
GO

    SET ANSI_NULLS ON
    GO

    SET QUOTED_IDENTIFIER ON
    GO

    CREATE TABLE [dbo].[Data](
        [x] [money] NOT NULL,
        [y] [money] NOT NULL,
     CONSTRAINT [PK_Data] PRIMARY KEY CLUSTERED 
    (
        [x] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]

    GO

The Query

select top 10000
s.x as sx,
e.x as ex,
s.y as sy,
e.y as ey,
e.y - s.y as y_delta,
e.x - s.x as x_delta
from Data s 
    inner join Data e
    on e.x > s.x and e.x - s.x between xFrom and xTo
--where e.y - s.y > @yDelta -- when uncommented causes a huge delay

Update 1 - Execution Plan


<?xml version="1.0" encoding="utf-16"?>
<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.2" Build="11.0.2100.60" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
  <BatchSequence>
    <Batch>
      <Statements>
        <StmtSimple StatementCompId="1" StatementEstRows="100" StatementId="1" StatementOptmLevel="FULL" StatementOptmEarlyAbortReason="GoodEnoughPlanFound" StatementSubTreeCost="0.0263655" StatementText="select top 100&#xD;&#xA;s.x as sx,&#xD;&#xA;e.x as ex,&#xD;&#xA;s.y as sy,&#xD;&#xA;e.y as ey,&#xD;&#xA;e.y - s.y as y_delta,&#xD;&#xA;e.x - s.x as x_delta&#xD;&#xA;from Data s &#xD;&#xA;    inner join Data e&#xD;&#xA; on e.x &gt; s.x and e.x - s.x between 100 and 105&#xD;&#xA;where e.y - s.y &gt; 0.01&#xD;&#xA;" StatementType="SELECT" QueryHash="0xAAAC02AC2D78CB56" QueryPlanHash="0x747994153CB2D637" RetrievedFromCache="true">
          <StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />
          <QueryPlan DegreeOfParallelism="0" NonParallelPlanReason="NoParallelPlansInDesktopOrExpressEdition" CachedPlanSize="24" CompileTime="13" CompileCPU="13" CompileMemory="424">
            <MemoryGrantInfo SerialRequiredMemory="0" SerialDesiredMemory="0" />
            <OptimizerHardwareDependentProperties EstimatedAvailableMemoryGrant="52199" EstimatedPagesCached="14561" EstimatedAvailableDegreeOfParallelism="4" />
            <RelOp AvgRowSize="55" EstimateCPU="1E-05" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="100" LogicalOp="Compute Scalar" NodeId="0" Parallel="false" PhysicalOp="Compute Scalar" EstimatedTotalSubtreeCost="0.0263655">
              <OutputList>
                <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="x" />
                <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="y" />
                <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="x" />
                <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="y" />
                <ColumnReference Column="Expr1004" />
                <ColumnReference Column="Expr1005" />
              </OutputList>
              <ComputeScalar>
                <DefinedValues>
                  <DefinedValue>
                    <ColumnReference Column="Expr1004" />
                    <ScalarOperator ScalarString="[SomeDb].[dbo].[Data].[y] as [e].[y]-[SomeDb].[dbo].[Data].[y] as [s].[y]">
                      <Arithmetic Operation="SUB">
                        <ScalarOperator>
                          <Identifier>
                            <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="y" />
                          </Identifier>
                        </ScalarOperator>
                        <ScalarOperator>
                          <Identifier>
                            <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="y" />
                          </Identifier>
                        </ScalarOperator>
                      </Arithmetic>
                    </ScalarOperator>
                  </DefinedValue>
                  <DefinedValue>
                    <ColumnReference Column="Expr1005" />
                    <ScalarOperator ScalarString="[SomeDb].[dbo].[Data].[x] as [e].[x]-[SomeDb].[dbo].[Data].[x] as [s].[x]">
                      <Arithmetic Operation="SUB">
                        <ScalarOperator>
                          <Identifier>
                            <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="x" />
                          </Identifier>
                        </ScalarOperator>
                        <ScalarOperator>
                          <Identifier>
                            <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="x" />
                          </Identifier>
                        </ScalarOperator>
                      </Arithmetic>
                    </ScalarOperator>
                  </DefinedValue>
                </DefinedValues>
                <RelOp AvgRowSize="39" EstimateCPU="1E-05" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="100" LogicalOp="Top" NodeId="1" Parallel="false" PhysicalOp="Top" EstimatedTotalSubtreeCost="0.0263555">
                  <OutputList>
                    <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="x" />
                    <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="y" />
                    <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="x" />
                    <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="y" />
                  </OutputList>
                  <RunTimeInformation>
                    <RunTimeCountersPerThread Thread="0" ActualRows="100" ActualEndOfScans="1" ActualExecutions="1" />
                  </RunTimeInformation>
                  <Top RowCount="false" IsPercent="false" WithTies="false">
                    <TopExpression>
                      <ScalarOperator ScalarString="(100)">
                        <Const ConstValue="(100)" />
                      </ScalarOperator>
                    </TopExpression>
                    <RelOp AvgRowSize="39" EstimateCPU="151828" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="100" LogicalOp="Inner Join" NodeId="2" Parallel="false" PhysicalOp="Nested Loops" EstimatedTotalSubtreeCost="0.0263455">
                      <OutputList>
                        <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="x" />
                        <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="y" />
                        <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="x" />
                        <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="y" />
                      </OutputList>
                      <RunTimeInformation>
                        <RunTimeCountersPerThread Thread="0" ActualRows="100" ActualEndOfScans="0" ActualExecutions="1" />
                      </RunTimeInformation>
                      <NestedLoops Optimized="false">
                        <OuterReferences>
                          <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="x" />
                          <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="y" />
                        </OuterReferences>
                        <RelOp AvgRowSize="23" EstimateCPU="1.80448" EstimateIO="3.76461" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1" LogicalOp="Clustered Index Scan" NodeId="3" Parallel="false" PhysicalOp="Clustered Index Scan" EstimatedTotalSubtreeCost="0.0032831" TableCardinality="1640290">
                          <OutputList>
                            <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="x" />
                            <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="y" />
                          </OutputList>
                          <RunTimeInformation>
                            <RunTimeCountersPerThread Thread="0" ActualRows="15225" ActualEndOfScans="0" ActualExecutions="1" />
                          </RunTimeInformation>
                          <IndexScan Ordered="false" ForcedIndex="false" ForceScan="false" NoExpandHint="false">
                            <DefinedValues>
                              <DefinedValue>
                                <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="x" />
                              </DefinedValue>
                              <DefinedValue>
                                <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="y" />
                              </DefinedValue>
                            </DefinedValues>
                            <Object Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Index="[PK_Data]" Alias="[e]" IndexKind="Clustered" />
                          </IndexScan>
                        </RelOp>
                        <RelOp AvgRowSize="23" EstimateCPU="0.902317" EstimateIO="1.88387" EstimateRebinds="1" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="100" LogicalOp="Clustered Index Seek" NodeId="4" Parallel="false" PhysicalOp="Clustered Index Seek" EstimatedTotalSubtreeCost="0.0263655" TableCardinality="1640290">
                          <OutputList>
                            <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="x" />
                            <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="y" />
                          </OutputList>
                          <RunTimeInformation>
                            <RunTimeCountersPerThread Thread="0" ActualRows="100" ActualEndOfScans="15224" ActualExecutions="15225" />
                          </RunTimeInformation>
                          <IndexScan Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" ForceScan="false" NoExpandHint="false" Storage="RowStore">
                            <DefinedValues>
                              <DefinedValue>
                                <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="x" />
                              </DefinedValue>
                              <DefinedValue>
                                <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="y" />
                              </DefinedValue>
                            </DefinedValues>
                            <Object Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Index="[PK_Data]" Alias="[s]" IndexKind="Clustered" />
                            <SeekPredicates>
                              <SeekPredicateNew>
                                <SeekKeys>
                                  <EndRange ScanType="LT">
                                    <RangeColumns>
                                      <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="x" />
                                    </RangeColumns>
                                    <RangeExpressions>
                                      <ScalarOperator ScalarString="[SomeDb].[dbo].[Data].[x] as [e].[x]">
                                        <Identifier>
                                          <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="x" />
                                        </Identifier>
                                      </ScalarOperator>
                                    </RangeExpressions>
                                  </EndRange>
                                </SeekKeys>
                              </SeekPredicateNew>
                            </SeekPredicates>
                            <Predicate>
                              <ScalarOperator ScalarString="([SomeDb].[dbo].[Data].[x] as [e].[x]-[SomeDb].[dbo].[Data].[x] as [s].[x])&gt;=($100.0000) AND ([SomeDb].[dbo].[Data].[x] as [e].[x]-[SomeDb].[dbo].[Data].[x] as [s].[x])&lt;=($105.0000) AND ([SomeDb].[dbo].[Data].[y] as [e].[y]-[SomeDb].[dbo].[Data].[y] as [s].[y])&gt;(0.01)">
                                <Logical Operation="AND">
                                  <ScalarOperator>
                                    <Compare CompareOp="GE">
                                      <ScalarOperator>
                                        <Arithmetic Operation="SUB">
                                          <ScalarOperator>
                                            <Identifier>
                                              <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="x" />
                                            </Identifier>
                                          </ScalarOperator>
                                          <ScalarOperator>
                                            <Identifier>
                                              <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="x" />
                                            </Identifier>
                                          </ScalarOperator>
                                        </Arithmetic>
                                      </ScalarOperator>
                                      <ScalarOperator>
                                        <Const ConstValue="($100.0000)" />
                                      </ScalarOperator>
                                    </Compare>
                                  </ScalarOperator>
                                  <ScalarOperator>
                                    <Compare CompareOp="LE">
                                      <ScalarOperator>
                                        <Arithmetic Operation="SUB">
                                          <ScalarOperator>
                                            <Identifier>
                                              <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="x" />
                                            </Identifier>
                                          </ScalarOperator>
                                          <ScalarOperator>
                                            <Identifier>
                                              <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="x" />
                                            </Identifier>
                                          </ScalarOperator>
                                        </Arithmetic>
                                      </ScalarOperator>
                                      <ScalarOperator>
                                        <Const ConstValue="($105.0000)" />
                                      </ScalarOperator>
                                    </Compare>
                                  </ScalarOperator>
                                  <ScalarOperator>
                                    <Compare CompareOp="GT">
                                      <ScalarOperator>
                                        <Arithmetic Operation="SUB">
                                          <ScalarOperator>
                                            <Identifier>
                                              <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="y" />
                                            </Identifier>
                                          </ScalarOperator>
                                          <ScalarOperator>
                                            <Identifier>
                                              <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="y" />
                                            </Identifier>
                                          </ScalarOperator>
                                        </Arithmetic>
                                      </ScalarOperator>
                                      <ScalarOperator>
                                        <Const ConstValue="(0.01)" />
                                      </ScalarOperator>
                                    </Compare>
                                  </ScalarOperator>
                                </Logical>
                              </ScalarOperator>
                            </Predicate>
                          </IndexScan>
                        </RelOp>
                      </NestedLoops>
                    </RelOp>
                  </Top>
                </RelOp>
              </ComputeScalar>
            </RelOp>
          </QueryPlan>
        </StmtSimple>
      </Statements>
    </Batch>
  </BatchSequence>
</ShowPlanXML>
Manouch answered 11/9, 2012 at 22:8 Comment(5)
What does `set showplan on' show as the query plan?Isothermal
@Mark Please find the execution plan attached. The highest contributors - 12% Clustered Index Scan, 100% - Clustered Index Seek, where Cluster Index is Data.sManouch
Your query is rather strange. It is returning 10000 rows, without an order by. Yet, your question says that it returns 2-5k. Can you explain a bit more what it is doing?Hypabyssal
What are xFrom and xTo? THese variables? if so they should have a "@" prefix.Reflexion
Basically xFrom and XTo are the variables forming the X-range, you're right @ is just omitted. I'm measuring certain X-ranges ans check how many times Y exceeded a certain value within these ranges.Manouch
R
5

I have often seen big performance gains by inserting the results of the first query (in your case without the where clause) into a TEMP table or a table variable, and selecting from this afterwards (which basically helps the query optimiser to select an appropriate execution plan).

Also just noticed you don't have an INDEX on column Y, which may speed up a bit.

EDIT Also, try the following (gives me slightly better performance):

SELECT * FROM 
    (SELECT
        s.x as sx,
        e.x as ex,
        s.y as sy,
        e.y as ey,
        e.y - s.y as y_delta,
        e.x - s.x as x_delta
    FROM Data s 
    JOIN Data e
    ON e.x > s.x 
) data
WHERE data.y_delta > @yDelta AND data.x_delta BETWEEN @xFrom AND @xTo
Reflexion answered 11/9, 2012 at 22:12 Comment(9)
Could you post some code as part of your answer? There're only two columns in the table, should be fairly simple for someone who knows the subject.Manouch
Will post an example later today. Please try my suggested INDEX on the Y column too.Reflexion
tried indexing y - made no difference, I too had high hopes for that idea... Should I do anything to fit in the index or it should improve the performance right after it's been created?Manouch
You'd see a difference immediately after it is created, so doesn't look like that will help.Reflexion
I can see that pre-fetching data into the temp table and applying my where clause to the temp table definitely 100s times faster, but I'm concerned about scalability of the solution... any thoughts?Manouch
I've used that method for queries returning millions of rows, and hasn't caused any major problems. General rule I stick to is if your results sets are likely to be under 1000 rows, use a table variable, for larger datasets use a TEMP table.Reflexion
I have tried to replicate your problem, but the query runs much faster with the where clause than without. I get 192,720 results back in 2.8 seconds...Reflexion
it's likely got to do with the data... I've to wait for 4 minutes to get 1000 rows. Actually the execution time grately depends of the value used in where clause - the close to improbable the longer it takes to search.Manouch
thanks for that, the results are the fastest of what I've seen, still slightly beyond the acceptability level. Thinking of caching makes me fill depressive...Manouch
I
1

The main issues are that the where clause gives a cross join (not what I would call a simple join) (giving many rows is that the join compares many rows in e for a row in s) and also the .y comparison has to use a table scan (over at least temporary data if not the whole table).

If the query is common then there is a possible fix in doing the join once and copying the data into a pre calculated table and indexing the differences.

Isothermal answered 11/9, 2012 at 22:37 Comment(1)
could you post a sample sql improving the performance? re the common scenario there is absolutely no pattern in how it's called, indexing all possible options will take forever.Manouch
H
1

I can think of an odd strategy to improve performance. This involves adding an auto-incrementing id into the table, then finding the bounds for Xfrom and Xto in terms of the id, and then looking for anything in the range.

The following query suggests what I mean:

with e1 as (
    select e.*,
           (select min(id) from data s where e.x between s.x + @xfrom and s.x + @xto) as idstart,
           (select max(id) from data s where e.x between s.x + @xfrom and s.x + @xto) as idend
    from data e
)
select <whatever>
from e1 join
     e1 s
     on e1.idstart = s.id
where e1.idstart <= e1.idend union all
select <whatever>
from e1 join
     e1 s
     on e1.idstart+1 = s.id
where e1.idstart+1 <= e1.idend
. . . 

I have had good luck with the correlated subquery on an indexed field returning the next value. After that, the joins are equijoins, which should be very fast. In the end, you would want to modify the query so it can do the comparisons using a table of enumerated values.

Hypabyssal answered 12/9, 2012 at 2:33 Comment(3)
that sounds very interesting. I'm not that strong in tsql could you put a complete sql so I can execute in sql analyzer?Manouch
This requires restructuring the table. Can you better describe what you are doing with the rows you are getting and the type of data you are working with? There might be other approaches as well.Hypabyssal
this is a pattern recognition system used for analyzing stochastic processes.Manouch

© 2022 - 2024 — McMap. All rights reserved.