Full Outer Join Using Each Row Once
Asked Answered
T

3

5

I'm wondering if anyone's come across a neat solution to this problem. I'm trying to select data from a couple of tables, having the records match up row by row. I'm basically after a full outer join, but there's one crucial difference. If I have four rows with a particular value in the column I'm joining on in one table, and three rows with this value in another, I only want the first three results to be joined, and the fourth to act as if there had been no match.

The reason for this is to create a reconciliation report which ensures transactions are not counted multiple times when comparing results. I can get around this issue by using a bit of grouping and some aggregate functions, but this hides some of the detail which I'd like to keep.

Below is an example to show the sort of thing I'm after, with the invalid/pseudo code in the comments illustrating how I'm thinking of this as working:

declare @t1 table (id bigint identity(1,1) primary key clustered, foreignKeyId bigint, otherData nvarchar(10))
declare @t2 table (id bigint identity(1,1) primary key clustered, foreignKeyId bigint, moreData nvarchar(10))

insert @t1 select 1, '1.1.1'
 union all select 1, '1.1.2'
 union all select 1, '1.1.3'
 union all select 3, '1.3.1'
 union all select 3, '1.3.2'
 union all select 3, '1.3.3'
 union all select 4, '1.4.3'

insert @t2 select 1, '2.1.1'
 union all select 1, '2.1.2'
 union all select 1, '2.1.3'
 union all select 2, '2.2.1'
 union all select 3, '2.3.1'
 union all select 3, '2.3.2'
 union all select 5, '2.5.1'
 union all select 5, '2.5.2'

--demo of the functionality i'm hoping to acheive 
--
 /*
 select t1.id           id1
 , t2.id                id2
 , t1.foreignKeyId  fk1
 , t2.foreignKeyId  fk2
 , t1.otherData     otherData
 , t2.moreData      moreData
 from @t1 t1
 full funky join @t2 t2
 on t1.foreignKeyId = t2.foreignKeyId
 order by t1.id, t2.id --we'd need an order by to ensure the match could be applied in a predictable manner
 */
--
declare @funkyjoin table (id1 bigint, id2 bigint, fk1 bigint, fk2 bigint, otherData nvarchar(10), moreData nvarchar(10))
declare @id1 bigint, @id2 bigint
insert @funkyjoin (id1, fk1, otherData) 
select id, foreignKeyId, otherData from @t1

while exists(select 1 from @t2)
begin
    select top 1 @id2 = id from @t2 order by id 

    set @id1 = null

    select top 1 @id1 = id1 
    from @funkyjoin
    where fk2 is null
    and fk1 in (select foreignKeyId from @t2 where id = @id2)

    if @id1 is null
    begin
        insert @funkyjoin (id2, fk2, moreData)
        select id, foreignKeyId, moreData
        from @t2
        where id = @id2
    end
    else
    begin
        update @funkyjoin
        set id2 = @id2
        , fk2 = fk1 --since we're joining on this we can just match it
        , moreData = (select moreData from @t2 where id = @id2)
        where id1 = @id1 
    end

    delete from @t2 where id = @id2 --since this is only an example let's not worry about keeping our source data
end

select * 
from @funkyjoin
order by coalesce(id1, id2)

I've written a similar solution for when this scenario occurs on spreadsheets previously: http://officemacros.codeplex.com/#WorksheetMergeMacro

Tomasatomasina answered 6/6, 2012 at 15:3 Comment(2)
FYI: A colleague just pointed me in the direction of the Quirky Update as a potential solution to this issue: simple-talk.com/sql/learn-sql-server/…. I'm reading up on this now, but first sight this looks promising. . .Tomasatomasina
Quirky update is unsupported, undocumented and may not work in the future. I suggest avoiding it.Formate
K
6

If I understand correctly, this may be what you're after:

select *
from (
  select *,
    row_number() over (partition by foreignKeyId order by id) as n
  from @t1
) t1
full outer join (
  select *,
    row_number() over (partition by foreignKeyId order by id) as n
  from @t2
) t2 on t1.foreignKeyId = t2.foreignKeyId and t1.n = t2.n
Kiosk answered 6/6, 2012 at 15:24 Comment(1)
Ahh of course - I had a feeling my old friend partition by would be involved, but didn't quite get there. Thanks Chad.Tomasatomasina
R
1

The best way to use up the rows is to add a pseudo-row number (using ROW_NUMBER) and include that in the join.

Root answered 6/6, 2012 at 15:21 Comment(0)
N
0

The given answers are all that's needed. Just mentioning an anti-answer if it helps by giving context. For a syntactically more concise way to achieve the behavior, but which is NOT supported at the moment, would be:

full outer apply (select top(1) * ...)
Navigator answered 26/4, 2024 at 8:8 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.