How do I join the first row of a subquery?
Asked Answered
K

7

27

I've got a table of invoices and a child table of related data related by key. In particular, for each invoice, I'm interested in only the first related row from the child table. Given that I want the one related row for every invoice key - how do I accomplish this?

Select i.[Invoice Number],
       c.[Carrier Name]
From Invoice i
    Left Join Carriers c on i.[InvoiceKey] = c.[InvoiceKey]
Where -- what?

I guess semantically speaking, what I'm looking for something akin to the concept of Top 1 c.CarrierName Group by InvoiceKey (or what would be the concept of that if that were possible in T-SQL.)

I've thought about doing a left join on a subquery, but that doesn't seem very efficient. Does anyone have any T-SQL tricks to achieve this efficiently?

Edit: Sorry guys, I forgot to mention this is SQL Server 2000, so while I'm going to give upvotes for the current SQL Server 2005/2008 responses that will work, I can't accept them I'm afraid.

Kendall answered 14/1, 2011 at 15:5 Comment(2)
Does second table have any attribute that says which row is first second etc.Bibliographer
@Cybernate No, other than the sequence of the indexKendall
A
37

Provided that Carriers has a PRIMARY KEY called id:

SELECT  i.[Invoice Number],
        c.[Carrier Name]
FROM    Invoice i
JOIN    Carriers c
ON      c.id = 
        (
        SELECT  TOP 1 ID
        FROM    Carriers ci
        WHERE   ci.InvoiceKey = i.InvoiceKey
        ORDER BY
                id -- or whatever
        )
Aubry answered 14/1, 2011 at 16:54 Comment(3)
The performance of this is likely to be bad compared to a group by and having clause. You are executing a correlated subquery against the Carriers table for every row in Invoice.Numbles
@Chris: compared to what? You didn't provide a working example.Aubry
@Aubry wonderful, beautiful, amazing, perfect, saved my life.Odericus
R
3

Alternatively you could use OUTER APPLY as well. Please notice the use of angle brackets for unknown field names:

Select i.[Invoice Number], c.[Carrier Name], x.<Carrier_field1>
From Invoice i
OUTER APPLY 
(
    SELECT TOP 1
    FROM Carriers c 
    WHERE c.[InvoiceKey] = i.[InvoiceKey]
    ORDER BY <order _clause>
) x
Retard answered 22/10, 2019 at 8:43 Comment(0)
S
2

This works for me:

select ir.[Invoice Number], c.[Carrier Name]
from 
    (select ROW_NUMBER() over (order by i.[Invoice Number] asc) AS RowNumber, i.[Invoice Number], i.InvoiceKey
    from Invoice i) AS ir
left join Carriers c
on ir.InvoiceKey = c.InvoiceKey
where RowNumber = 1
union all
select ir.[Invoice Number], NULL as [Carrier Name]
from 
    (select ROW_NUMBER() over (order by i.[Invoice Number] asc) AS RowNumber, i.[Invoice Number]
    from Invoice i) AS ir
where RowNumber > 1

or

select TOP 1 i.[Invoice Number], c.[Carrier Name]
from Invoice i
left join Carriers c
on i.InvoiceKey = c.InvoiceKey
union all
select ir.[Invoice Number], NULL as [Carrier Name]
from 
    (select ROW_NUMBER() over (order by i.[Invoice Number] asc) AS RowNumber, i.[Invoice Number]
    from Invoice i) AS ir
where RowNumber > 1
Spracklen answered 14/1, 2011 at 15:7 Comment(1)
+1 Would do the job if I were on SQL Server 2005+, but I forgot to mention I need this run on a SQL Server 2000 box.Kendall
L
2
;with cteRowNumber as (
    select c.InvoiceKey, c.[Carrier Name], ROW_NUMBER() over (partition by c.InvoiceKey order by c.[Carrier Name]) as RowNum
        from Carriers c
)
select i.[Invoice Number],
       rn.[Carrier Name]
    from Invoice i
        left join cteRowNumber rn
            on i.InvoiceKey = rn.InvoiceKey
                and rn.RowNum = 1
Lifeline answered 14/1, 2011 at 15:15 Comment(2)
@abatishchev: I don't see it. If I'm including rn.RowNum = 1 as part of my join condition, that should join only the "first" (as defined by the ordering of the window function).Lifeline
+1 Technically correct, but I forgot to mention that I was looking for SQL Server 2000, so CTE isn't an option.Kendall
D
2

This is how I would do it, using a slightly different syntax than yours (MySQL style), but I guess you could apply it to your solution as well:

SELECT i.invoiceNumber, c.carrierName
FROM Invoice as i
LEFT JOIN Carriers as c ON (c.id = (SELECT id FROM Carriers WHERE invoiceKey = i.invoiceKey ORDER BY id LIMIT 1))

This will take all records from Invoice, and join it with one (or zero) record from Carriers, specifically the record which has the same invoiceKey and only the first one.

As long as you have an index on Carriers.invoiceKey the performance of this query should be acceptable.

Sebastian

Dominique answered 14/1, 2011 at 16:22 Comment(0)
P
1

In such cases I often employ a device which I here apply to your example and describe below:

SELECT
  i.[Invoice Number],
  c.[Carrier Name]
FROM Invoice i
  INNER JOIN Carriers c ON i.InvoiceKey = c.InvoiceKey
  INNER JOIN (
    SELECT MIN(ID) AS ID
    FROM Carriers
    GROUP BY InvoiceKey
  ) c_top ON c.ID = c_top.ID

I think, this is roughly what Quassnoi has posted, only I try to avoid using SELECT TOPs like that.

Invoice is joined with Carriers based on their linking expression (InvoiceKey in this case). Now, Carriers can have multiple rows for the same InvoiceKey, so we need to limit the output. And that is done using a derived table.

The derived table groups rows from Carrier based on the same expression that is used for linking the two tables (InvoiceKey).

And there's another way: instead of joining the derived table you could use IN (subquery) with the same effect. That is, the complete query would then look like this:

SELECT
  i.[Invoice Number],
  c.[Carrier Name]
FROM Invoice i
  INNER JOIN Carriers c ON i.InvoiceKey = c.InvoiceKey
    AND c.ID IN (SELECT MIN(ID) FROM Carriers GROUP BY InvoiceKey)
Privity answered 15/1, 2011 at 11:15 Comment(0)
N
0
group by carriername having max(invoicenumber)

to get the first carrier for each invoice:

group by invoicenumber having max(carriername)
-- substitute the column you want to order by for carrier name to change which is 'first'
Numbles answered 14/1, 2011 at 15:8 Comment(6)
This won't work - this will only give me the carrier name of the carrier with the highest invoice number. What I need is the first carrier for each invoice.Kendall
No, this should give you the highest invoice number for each carrier. By reversing the group by/having, you can get the first carrier for each invoice.Numbles
Remember- having clauses are applied AFTER group by clauses :-)Numbles
HAVING MAX(invoicenumber) won't even parse. MAX(invoicenumber) is not a predicate.Aubry
Having invoicenumber = max(invoicenumber), sorry.Numbles
@Chris: this won't parse too. You cannot use unaggregated expressions in HAVING clause.Aubry

© 2022 - 2024 — McMap. All rights reserved.