Very slow DELETE query
Asked Answered
H

7

15

I have problems with SQL performance. For sudden reason the following queries are very slow:

I have two lists which contains Id's of a certain table. I need to delete all records from the first list if the Id's already exists in the second list:

DECLARE @IdList1 TABLE(Id INT)
DECLARE @IdList2 TABLE(Id INT)

-- Approach 1
DELETE list1
FROM @IdList1 list1
INNER JOIN @IdList2 list2 ON list1.Id = list2.Id

-- Approach 2
DELETE FROM @IdList1
WHERE Id IN (SELECT Id FROM @IdList2)

It is possible the two lists contains more than 10.000 records. In that case both queries takes each more than 20 seconds to execute.

The execution plan also showed something I don't understand. Maybe that explains why it is so slow: Queryplan of both queries

I Filled both lists with 10.000 sequential integers so both list contained value 1-10.000 as starting point.

As you can see both queries shows for @IdList2 Actual Number of Rows is 50.005.000!!. @IdList1 is correct (Actual Number of Rows is 10.000)

I know there are other solutions how to solve this. Like filling a third list instaed of removing from first list. But my question is:

Why are these delete queries so slow and why do I see these strange query plans?

Honeysuckle answered 23/5, 2013 at 12:59 Comment(2)
Is this a problem that might be encountered in a real world scenario or, just in this specilaized situation?Vinaigrette
@Vinaigrette - The underlying issues of no statistics based recompiles for table variables (and lack of useful indexes on them) is very common.Spenser
G
17

Add a Primary key to your table variables and watch them scream

DECLARE @IdList1 TABLE(Id INT primary Key not null)
DECLARE @IdList2 TABLE(Id INT primary Key not null)

because there's no index on these table variables, any joins or subqueries must examine on the order of 10,000 times 10,000 = 100,000,000 pairs of values.

Gadgeteer answered 23/5, 2013 at 13:2 Comment(5)
Will it help having an index on @IdList1?Vinaigrette
"Any joins or subqueries must examine on the order of 10,000 times 10,000 = 100,000,000 pairs of values." this is only true for nested loops. A hash or merge join would process each input once (though a merge join would also need a sort)Spenser
@martin, I have not read that stuff for a while, so I've forgotten the rules, but Is it not choosing the nested loops because there's no index? To do the other looping algorithms doesn't it need an index to sort the values? Also, without an index, it still has to examine every pair of values - no matter what looping algorithm it uses to create them. - the exception being, as you note, a merge join, but there it has to presort them.Gadgeteer
@CharlesBretana - No it can use hash or merge join as long as there is an equi join. Merge join will require sorting both inputs (as will creating an index) but once an index is created obviously it is potentially more useful as it will benefit other queries (so +1)Spenser
Your answer and comments together with @MartinSmith was a huge improvement. Thanks!Honeysuckle
S
12

SQL Server compiles the plan when the table variable is empty and does not recompile it when rows are added. Try

DELETE FROM @IdList1
WHERE Id IN (SELECT Id FROM @IdList2)
OPTION (RECOMPILE)

This will take account of the actual number of rows contained in the table variable and get rid of the nested loops plan

Of course creating an index on Id via a constraint may well be beneficial for other queries using the table variable too.

Spenser answered 23/5, 2013 at 13:2 Comment(5)
This is new to me. Can you clarify - The cacheplan initial compile would happen when the Delete statement is encountered, correct? Not when the table variables are declared ? I mean, the plan being compiled is for the Delete, not for the table variable declaration... If so, then at that point wouldn't the table variables be populated ? Also, if you don't mind, could you provide a reference ? I'd like to read up on this.Gadgeteer
@CharlesBretana - There are some links and example code in my answer hereSpenser
again, however, since the cacheplan is created for each statement, not for the entire batch or for a stored proc, does it create cache plans for every statement in a batch or in a procedure before it starts executing?Gadgeteer
@CharlesBretana - It compiles all statements in a batch before executing it except if the statement references a non existent object and is marked for deferred compile. So in this case the DELETE statement is compiled when the table variables are empty. Then (due to OPTION (RECOMPILE)) it gets recompiled at the point of the DELETE and can take account of the actual number of rows after the table variables are populated.Spenser
Your answer and comments together with @CharlesBretana was a huge improvement. I desiced to accept Charels' answer because I cannot accept two answers ;). Thanks!Honeysuckle
B
2

The tables in table variables can have primary keys, so if your data supports uniqueness for these Ids, you may be able to improve performance by going for

DECLARE @IdList1 TABLE(Id INT PRIMARY KEY)
DECLARE @IdList2 TABLE(Id INT PRIMARY KEY)
Borreri answered 23/5, 2013 at 13:3 Comment(0)
M
2

Possible solutions:

1) Try to create indices thus

1.1) If List{1|2}.Id column has unique values then you could define a unique clustered index using a PK constraint like this:

DECLARE @IdList1 TABLE(Id INT PRIMARY KEY);
DECLARE @IdList2 TABLE(Id INT PRIMARY KEY);

1.2) If List{1|2}.Id column may have duplicate values then you could define a unique clustered index using a PK constraint using a dummy IDENTITY column like this:

DECLARE @IdList1 TABLE(Id INT, DummyID INT IDENTITY, PRIMARY KEY (ID, DummyID) );
DECLARE @IdList2 TABLE(Id INT, DummyID INT IDENTITY, PRIMARY KEY (ID, DummyID) );

2) Try to add HASH JOIN query hint like this:

DELETE list1
FROM @IdList1 list1
INNER JOIN @IdList2 list2 ON list1.Id = list2.Id
OPTION (HASH JOIN);
Masorete answered 23/5, 2013 at 13:10 Comment(0)
D
1

You are using Table Variables, either add a primary key to the table or change them to Temporary Tables and add an INDEX. This will result in much more performance. As a rule of thumb, if the table is only small, use TABLE Variables, however if the table is expanding and contains a lot of data then either use a temp table.

Defrayal answered 23/5, 2013 at 13:3 Comment(0)
V
0

I'd be tempted to try

DECLARE @IdList3 TABLE(Id INT);

INSERT @IdList3
SELECT Id FROM @IDList1 ORDER BY Id
EXCEPT
SELECT Id FROM @IDList2 ORDER BY Id

No deleting required.

Vinaigrette answered 23/5, 2013 at 13:11 Comment(2)
But what if OP needs to delete, like he/she said: I need to delete all records from the first list if the Id's already exists in the second listJubbah
@Jubbah true, the OP indicates its a contrived example concerned with those two table variables and specifically deletetion. However, this may still be useful for another reader.Vinaigrette
R
-1

Try this alternate syntax:

DELETE deleteAlias
FROM @IdList1 deleteAlias
WHERE EXISTS (
        SELECT NULL
        FROM @IdList2 innerList2Alias
        WHERE innerList2Alias.id=deleteAlias.id
    )

EDIT.....................

Try using #temp tables with indexes instead.

Here is a generic example where "DepartmentKey" is the PK and the FK.

IF OBJECT_ID('tempdb..#Department') IS NOT NULL
begin
        drop table #Department
end


CREATE TABLE #Department 
( 
    DepartmentKey int , 
    DepartmentName  varchar(12)
)



CREATE INDEX IX_TEMPTABLE_Department_DepartmentKey ON #Department (DepartmentKey)




IF OBJECT_ID('tempdb..#Employee') IS NOT NULL
begin
        drop table #Employee
end


CREATE TABLE #Employee 
( 
    EmployeeKey int , 
    DepartmentKey int ,
    SSN  varchar(11)
)



CREATE INDEX IX_TEMPTABLE_Employee_DepartmentKey ON #Employee (DepartmentKey)


Delete deleteAlias 
from #Department deleteAlias
where exists ( select null from #Employee innerE where innerE.DepartmentKey = deleteAlias.DepartmentKey )





IF OBJECT_ID('tempdb..#Employee') IS NOT NULL
begin
        drop table #Employee
end

IF OBJECT_ID('tempdb..#Department') IS NOT NULL
begin
        drop table #Department
end
Repeat answered 23/5, 2013 at 13:7 Comment(3)
Unfortunately this is slow too. Same result and exact same query plan.Honeysuckle
Are you forced to use @variable-tables, or can you try #temp tables?Repeat
If you can use #temp tables, try the example in my response.Repeat

© 2022 - 2024 — McMap. All rights reserved.