I'm currently working my way through the Microsoft SQL Server 2008 - Database Development (MCTS Exam 70-433) certification. In one of the earlier chapters on Combining Datasets, I came across the EXCEPT
(and INTERSECT
) commands. One example shows how to use EXCEPT
to get all values from one table that doesn't have a related value in a second table like this:
SELECT EmployeeKey FROM DimEmployee
EXCEPT
SELECT EmployeeKey FROM FactResellerSales
The EXCEPT
command was new to me, but with what I knew before today I would still easily solve the problem using a LEFT JOIN
and check for IS NULL
on the join constraint the following way:
SELECT DISTINCT DimEmployee.EmployeeKey FROM DimEmployee
LEFT JOIN FactResellerSales ON FactResellerSales.EmployeeKey = DimEmployee.EmployeeKey
WHERE FactResellerSales.EmployeeKey IS NULL
Now I started to wonder which one of these have the best performance. I tried to study the query execution plans, but I'm not too good at reading those, so it didn't make me any wiser. For the query using EXCEPT
, the plan looks like this:
|--Nested Loops(Left Anti Semi Join, OUTER REFERENCES:([AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey], [Expr1006]) WITH UNORDERED PREFETCH)
|--Index Scan(OBJECT:([AdventureWorksDW2008].[dbo].[DimEmployee].[IX_DimEmployee_SalesTerritoryKey]))
|--Top(TOP EXPRESSION:((1)))
|--Index Seek(OBJECT:([AdventureWorksDW2008].[dbo].[FactResellerSales].[IX_FactResellerSales_EmployeeKey]), SEEK:([AdventureWorksDW2008].[dbo].[FactResellerSales].[EmployeeKey]=[AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey]) ORDERED FORWARD)
And for the one using the LEFT JOIN
it looks like this:
|--Stream Aggregate(GROUP BY:([AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey]))
|--Filter(WHERE:([AdventureWorksDW2008].[dbo].[FactResellerSales].[EmployeeKey] IS NULL))
|--Merge Join(Left Outer Join, MERGE:([AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey])=([AdventureWorksDW2008].[dbo].[FactResellerSales].[EmployeeKey]), RESIDUAL:([AdventureWorksDW2008].[dbo].[FactResellerSales].[EmployeeKey]=[AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey]))
|--Clustered Index Scan(OBJECT:([AdventureWorksDW2008].[dbo].[DimEmployee].[PK_DimEmployee_EmployeeKey]), ORDERED FORWARD)
|--Index Scan(OBJECT:([AdventureWorksDW2008].[dbo].[FactResellerSales].[IX_FactResellerSales_EmployeeKey]), ORDERED FORWARD)
The tables used in the query are from the AdventureWorksDW2008 example database, so below I also include the create scripts for the two tables in case that is needed to answer the question correctly:
USE [AdventureWorksDW2008]
GO
/****** Object: Table [dbo].[DimEmployee] Script Date: 11/22/2010 20:30:20 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[DimEmployee](
[EmployeeKey] [int] IDENTITY(1,1) NOT NULL,
[ParentEmployeeKey] [int] NULL,
[EmployeeNationalIDAlternateKey] [nvarchar](15) NULL,
[ParentEmployeeNationalIDAlternateKey] [nvarchar](15) NULL,
[SalesTerritoryKey] [int] NULL,
[FirstName] [nvarchar](50) NOT NULL,
[LastName] [nvarchar](50) NOT NULL,
[MiddleName] [nvarchar](50) NULL,
[NameStyle] [bit] NOT NULL,
[Title] [nvarchar](50) NULL,
[HireDate] [date] NULL,
[BirthDate] [date] NULL,
[LoginID] [nvarchar](256) NULL,
[EmailAddress] [nvarchar](50) NULL,
[Phone] [nvarchar](25) NULL,
[MaritalStatus] [nchar](1) NULL,
[EmergencyContactName] [nvarchar](50) NULL,
[EmergencyContactPhone] [nvarchar](25) NULL,
[SalariedFlag] [bit] NULL,
[Gender] [nchar](1) NULL,
[PayFrequency] [tinyint] NULL,
[BaseRate] [money] NULL,
[VacationHours] [smallint] NULL,
[SickLeaveHours] [smallint] NULL,
[CurrentFlag] [bit] NOT NULL,
[SalesPersonFlag] [bit] NOT NULL,
[DepartmentName] [nvarchar](50) NULL,
[StartDate] [date] NULL,
[EndDate] [date] NULL,
[Status] [nvarchar](50) NULL,
CONSTRAINT [PK_DimEmployee_EmployeeKey] PRIMARY KEY CLUSTERED
(
[EmployeeKey] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[DimEmployee] WITH CHECK ADD CONSTRAINT [FK_DimEmployee_DimEmployee] FOREIGN KEY([ParentEmployeeKey])
REFERENCES [dbo].[DimEmployee] ([EmployeeKey])
GO
ALTER TABLE [dbo].[DimEmployee] CHECK CONSTRAINT [FK_DimEmployee_DimEmployee]
GO
ALTER TABLE [dbo].[DimEmployee] WITH CHECK ADD CONSTRAINT [FK_DimEmployee_DimSalesTerritory] FOREIGN KEY([SalesTerritoryKey])
REFERENCES [dbo].[DimSalesTerritory] ([SalesTerritoryKey])
GO
ALTER TABLE [dbo].[DimEmployee] CHECK CONSTRAINT [FK_DimEmployee_DimSalesTerritory]
GO
And the second table:
USE [AdventureWorksDW2008]
GO
/****** Object: Table [dbo].[FactResellerSales] Script Date: 11/22/2010 20:30:47 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[FactResellerSales](
[ProductKey] [int] NOT NULL,
[OrderDateKey] [int] NOT NULL,
[DueDateKey] [int] NOT NULL,
[ShipDateKey] [int] NOT NULL,
[ResellerKey] [int] NOT NULL,
[EmployeeKey] [int] NOT NULL,
[PromotionKey] [int] NOT NULL,
[CurrencyKey] [int] NOT NULL,
[SalesTerritoryKey] [int] NOT NULL,
[SalesOrderNumber] [nvarchar](20) NOT NULL,
[SalesOrderLineNumber] [tinyint] NOT NULL,
[RevisionNumber] [tinyint] NULL,
[OrderQuantity] [smallint] NULL,
[UnitPrice] [money] NULL,
[ExtendedAmount] [money] NULL,
[UnitPriceDiscountPct] [float] NULL,
[DiscountAmount] [float] NULL,
[ProductStandardCost] [money] NULL,
[TotalProductCost] [money] NULL,
[SalesAmount] [money] NULL,
[TaxAmt] [money] NULL,
[Freight] [money] NULL,
[CarrierTrackingNumber] [nvarchar](25) NULL,
[CustomerPONumber] [nvarchar](25) NULL,
CONSTRAINT [PK_FactResellerSales_SalesOrderNumber_SalesOrderLineNumber] PRIMARY KEY CLUSTERED
(
[SalesOrderNumber] ASC,
[SalesOrderLineNumber] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[FactResellerSales] WITH CHECK ADD CONSTRAINT [FK_FactResellerSales_DimCurrency] FOREIGN KEY([CurrencyKey])
REFERENCES [dbo].[DimCurrency] ([CurrencyKey])
GO
ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimCurrency]
GO
ALTER TABLE [dbo].[FactResellerSales] WITH CHECK ADD CONSTRAINT [FK_FactResellerSales_DimDate] FOREIGN KEY([OrderDateKey])
REFERENCES [dbo].[DimDate] ([DateKey])
GO
ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimDate]
GO
ALTER TABLE [dbo].[FactResellerSales] WITH CHECK ADD CONSTRAINT [FK_FactResellerSales_DimDate1] FOREIGN KEY([DueDateKey])
REFERENCES [dbo].[DimDate] ([DateKey])
GO
ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimDate1]
GO
ALTER TABLE [dbo].[FactResellerSales] WITH CHECK ADD CONSTRAINT [FK_FactResellerSales_DimDate2] FOREIGN KEY([ShipDateKey])
REFERENCES [dbo].[DimDate] ([DateKey])
GO
ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimDate2]
GO
ALTER TABLE [dbo].[FactResellerSales] WITH CHECK ADD CONSTRAINT [FK_FactResellerSales_DimEmployee] FOREIGN KEY([EmployeeKey])
REFERENCES [dbo].[DimEmployee] ([EmployeeKey])
GO
ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimEmployee]
GO
ALTER TABLE [dbo].[FactResellerSales] WITH CHECK ADD CONSTRAINT [FK_FactResellerSales_DimProduct] FOREIGN KEY([ProductKey])
REFERENCES [dbo].[DimProduct] ([ProductKey])
GO
ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimProduct]
GO
ALTER TABLE [dbo].[FactResellerSales] WITH CHECK ADD CONSTRAINT [FK_FactResellerSales_DimPromotion] FOREIGN KEY([PromotionKey])
REFERENCES [dbo].[DimPromotion] ([PromotionKey])
GO
ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimPromotion]
GO
ALTER TABLE [dbo].[FactResellerSales] WITH CHECK ADD CONSTRAINT [FK_FactResellerSales_DimReseller] FOREIGN KEY([ResellerKey])
REFERENCES [dbo].[DimReseller] ([ResellerKey])
GO
ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimReseller]
GO
ALTER TABLE [dbo].[FactResellerSales] WITH CHECK ADD CONSTRAINT [FK_FactResellerSales_DimSalesTerritory] FOREIGN KEY([SalesTerritoryKey])
REFERENCES [dbo].[DimSalesTerritory] ([SalesTerritoryKey])
GO
ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimSalesTerritory]
GO
This question is using a very specific example, but I'm also interested in general information about when/if it would be appropriate to use EXCEPT
instead of a LEFT JOIN
and check on IS NULL
.
I also noticed, that if I tried running the 1st query with INTERCEPT
, the equivalent in the 2nd query would be to use a standard JOIN
and SELECT DISTINCT DimEmployee.EmployeeKey
(and no WHERE
clause at all). In this case however, the execution plan was exactly the same in the two cases.
Update
Minor update to the 2nd query (see revision history), which resulted in a bit more complex query plan for that one. I would guess that a bigger query plan indicates a less optimal query, but I'd still like this question to be answered.