Intersect in SQL Server

Asked 18/9, 2014 at 20:31 Answered 19/9, 2014 at 7:33

Is there a way to use intersect without selecting distinct values only? Something like INTERSECT ALL.

For example, consider table A and B

A --> 1, 1, 1, 2, 3, 4

B --> 1, 1, 2

Would result in

Result --> 1, 1, 2

EDIT

I think this link explains well what I want. This other link is also intersting to understand the question. Or this other link explains event better.

EDIT 2

Suppose the tables:

Table A

╔════════╦════╦═══╦════╦════╗
║   A    ║ B  ║ C ║ D  ║ E  ║
╠════════╬════╬═══╬════╬════╣
║ Car    ║ 10 ║ 1 ║ OK ║ -1 ║
║ Car    ║ 10 ║ 1 ║ OK ║ -1 ║
║ Car    ║ 10 ║ 1 ║ OK ║ -1 ║
║ House  ║ 10 ║ 1 ║ NO ║ -5 ║
║ Monkey ║ 15 ║ 1 ║ OK ║ -1 ║
║ Dog    ║  3 ║ 1 ║ OK ║ -1 ║
╚════════╩════╩═══╩════╩════╝

Table B

╔═════╦════╦═══╦════╦════╗
║  A  ║ B  ║ C ║ D  ║ E  ║
╠═════╬════╬═══╬════╬════╣
║ Car ║ 10 ║ 1 ║ OK ║ -1 ║
║ Car ║ 10 ║ 1 ║ OK ║ -1 ║
║ Car ║ 15 ║ 1 ║ OK ║ -1 ║
║ Dog ║  3 ║ 1 ║ OK ║ -1 ║
╚═════╩════╩═══╩════╩════╝

The answer for intersect (select * from A INTERSECT select * from B) would be:

╔═════╦════╦═══╦════╦════╗
║  A  ║ B  ║ C ║ D  ║ E  ║
╠═════╬════╬═══╬════╬════╣
║ Car ║ 10 ║ 1 ║ OK ║ -1 ║
║ Dog ║  3 ║ 1 ║ OK ║ -1 ║
╚═════╩════╩═══╩════╩════╝

Because it takes only distinct values. What I want is taking common rows, just like:

╔═════╦════╦═══╦════╦════╗
║  A  ║ B  ║ C ║ D  ║ E  ║
╠═════╬════╬═══╬════╬════╣
║ Car ║ 10 ║ 1 ║ OK ║ -1 ║
║ Car ║ 10 ║ 1 ║ OK ║ -1 ║
║ Dog ║  3 ║ 1 ║ OK ║ -1 ║
╚═════╩════╩═══╩════╩════╝

Observe I don't need to know what I have to link (the connection is positional, just like INTERSECT). The ID would be something constructed using all columns (the link between table are all columns, based on their position).

Blus answered 18/9, 2014 at 20:31 Comment(5)

HUH? Why does the result have two values for 1? That is not what INTERSECT means at all. It is designed to return distinct values that are present in both. You would need something other than intersect for this to work. – Fullrigged 18/9, 2014 at 20:34

It is an example. The connection of tables would be positional. Just like intersection. – Blus 18/9, 2014 at 20:58

@SeanLange - Ansi SQL has an INTERSECT ALL and an EXCEPT ALL. – Outpost 19/9, 2014 at 8:3

Regardless the logic is not clear in your example. Why are there two cars not three? Also be aware there is no 'position' in a dataset without an order by. – Bugg 19/9, 2014 at 8:39

@MartinSmith that is true but we are using sql server and it doesn't support INTERSECT ALL. At least not in all versions but I don't think even in 2014 it is supported. – Fullrigged 19/9, 2014 at 14:42

In SQL Server, INTERSECT works on distinct rows only. If you want it to distinguish between duplicate rows, you will need to make the rows distinct. The only way to do so I can think of is to add another column and populate it with unique values per duplicate, but in such a way that the resulting rows would be matchable across different tables.

The problem, however, is that so far there is no universal syntax for that. For instance, you could use ROW_NUMBER() to enumerate every duplicate, but you would have to write out its PARTITION BY clause for every case individually: there is no PARTITION BY *, not in SQL Server at least.

Anyway, for the purpose of illustration, here is how the ROW_NUMBER method would look like:

SELECT
  A, B, C, D, E,
  ROW_NUMBER() OVER (PARTITION BY A, B, C, D, E ORDER BY (SELECT 1))
FROM
  dbo.A

INTERSECT

SELECT
  A, B, C, D, E,
  ROW_NUMBER() OVER (PARTITION BY A, B, C, D, E ORDER BY (SELECT 1))
FROM
  dbo.B
;

As written above, the query would also return an extra column, the row number column, in the output. If you wanted to suppress it, you would need to make the query more complex:

SELECT
  A, B, C, D, E
FROM
  (
    SELECT
      A, B, C, D, E,
      rn = ROW_NUMBER() OVER (PARTITION BY A, B, C, D, E ORDER BY (SELECT 1))
    FROM
      dbo.A

    INTERSECT

    SELECT
      A, B, C, D, E,
      rn = ROW_NUMBER() OVER (PARTITION BY A, B, C, D, E ORDER BY (SELECT 1))
    FROM
      dbo.B
  ) AS s
;

And just to clarify, when I said above there was no universal syntax, I meant you could not do it without resorting to dynamic SQL. With dynamic SQL, a great many things are possible but such a solution would be much more complex and, in my opinion, much less maintainable.

Again, to illustrate the point, this is an example of how you could solve it with dynamic SQL:

DECLARE
  @table1 sysname,
  @table2 sysname,
  @columns nvarchar(max),
  @sql nvarchar(max)
;

SET @table1 = 'dbo.A';
SET @table2 = 'dbo.B';

-- collecting the columns from one table only,
-- assuming the structures of both tables are identical
-- if the structures differ, declare and populate
-- @columns1 and @columns2 separately
SET @columns = STUFF(
  (
    SELECT
      N', ' + QUOTENAME(name)
    FROM
      sys.columns
    WHERE
      object_id = OBJECT_ID(@table1)
    FOR XML
      PATH (''), TYPE
  ).value('text()[1]', 'nvarchar(max)'),
  1,
  2,
  ''
);

SET @sql =
N'SELECT ' + @columns + N'
FROM
  (
    SELECT
      ' + @columns + N',
      ROW_NUMBER() OVER (PARTITION BY ' + @columns + N' ORDER BY (SELECT 1))
    FROM
      ' + @table1 + N'

    INTERSECT

    SELECT
      ' + @columns + N',
      ROW_NUMBER() OVER (PARTITION BY ' + @columns + N' ORDER BY (SELECT 1))
    FROM
      ' + @table2 + N'
  ) AS s
';

EXECUTE sp_executesql @sql;

You can probably see now what I meant by "much more complex" at least.

Apochromatic answered 19/9, 2014 at 7:33 Comment(2)

Why did you have to go to dynamic sql? The intersect example with ROW_NUMBER should work perfectly fine. Maybe I am missing something but dynamic sql doesn't seem to be needed here. – Fullrigged 19/9, 2014 at 14:43

@SeanLange: Yes, the ROW_NUMBER method works, it's that with that method you'd need to specify all the columns explicitly every time (in PARTITION BY at least, if you are fine with having an extra column in the output), and it's different columns in different comparisons. Dynamic SQL allows you to specify just table names, and my point was to show at which cost (i.e. the solution itself becomes rather complicated). – Apochromatic 19/9, 2014 at 15:2

SELECT
  COLUMN1
FROM B
WHERE B.COLUMN1 IN
  (SELECT COLUMN1
   FROM A)

Combs answered 18/9, 2014 at 20:33 Comment(0)

SELECT * FROM TableB 
WHERE EXISTS (SELECT 1
              FROM TableA 
              WHERE ColumnName = TableB.ColumnName)

Injured answered 18/9, 2014 at 20:46 Comment(3)

I like this answer a lot, but I would have to know the columns to connect the tables and they can be many. One solution near yours would be select A.* from A where exists (select A.* intersect select * from B ). What do you think? – Blus 18/9, 2014 at 20:48

@Blus if you don't know how many columns there is gonna be then chances are that the way you are doing things is wrong – Combs 18/9, 2014 at 20:51

The intersect take tables with distinct values in rows (considering all the columns). The connection is positional in this case. – Blus 18/9, 2014 at 20:57

Recommended topics

Hot tags