Postgres: left join with order by and limit 1
Asked Answered
K

3

47

I have the situation:

Table1 has a list of companies.
Table2 has a list of addresses.
Table3 is a N relationship of Table1 and Table2, with fields 'begin' and 'end'.

Because companies may move over time, a LEFT JOIN among them results in multiple records for each company.

begin and end fields are never NULL. The solution to find the latest address is use a ORDER BY being DESC, and to remove older addresses is a LIMIT 1.

That works fine if the query can bring only 1 company. But I need a query that brings all Table1 records, joined with their current Table2 addresses. Therefore, the removal of outdated data must be done (AFAIK) in LEFT JOIN's ON clause.

Any idea how I can build the clause to not create duplicated Table1 companies and bring latest address?

Keeping answered 17/2, 2014 at 19:23 Comment(0)
K
14

I managed to solve it using Windows Function:

WITH ranked_relationship AS(
    SELECT
        *
        ,row_number() OVER (PARTITION BY fk_company ORDER BY dt_start DESC) as dt_last_addr
    FROM relationship
)

SELECT
    company.*
    address.*,
    dt_last_addr as dt_relationship
FROM
    company
    LEFT JOIN ranked_relationship as relationship
            ON relationship.fk_company = company.pk_company AND dt_last_addr = 1
    LEFT JOIN address ON address.pk_address = relationship.fk_address

row_number() creates an int counter for each record, inside each window based to fk_company. For each window, the record with latest date comes first with rank 1, then dt_last_addr = 1 makes sure the JOIN happens only once for each fk_company, with the record with latest address.

Window Functions are very powerful and few ppl use them, they avoid many complex joins and subqueries!

Keeping answered 19/2, 2014 at 13:13 Comment(0)
B
39

Use a dependent subquery with max() function in a join condition.
Something like in this example:

SELECT *
FROM companies c
LEFT JOIN relationship r
ON c.company_id = r.company_id
   AND r."begin" = (
        SELECT max("begin")
        FROM relationship r1
        WHERE c.company_id = r1.company_id
     )
INNER JOIN addresses a
ON a.address_id = r.address_id 

demo: http://sqlfiddle.com/#!15/f80c6/2

Bowles answered 17/2, 2014 at 21:40 Comment(3)
Nice on sqlfiddle - I've only seen jsfiddle until now, but sqlfiddle seems superhelpful!Bette
It seems it won't work in case when r.begin can be equal among some recordsJoyann
FYI - You don't have to used max, you can just used whatever ordering you want in the subquery. This is a great solution though, thanksSeductive
C
25

Since PostgreSQL 9.3 there is JOIN LATERAL (https://www.postgresql.org/docs/9.4/queries-table-expressions.html) that allows to make a sub-query to join, so it solves your issue in an elegant way:

SELECT * FROM companies c
JOIN LATERAL (
    SELECT * FROM relationship r
    WHERE c.company_id = r.company_id
    ORDER BY r."begin" DESC LIMIT 1
) r ON TRUE
JOIN addresses a ON a.address_id = r.address_id

The disadvantage of this approach is the indexes of the tables inside LATERAL do not work outside.

Carilyn answered 13/4, 2021 at 8:54 Comment(2)
Could you elaborate on your comment "The disadvantage of this approach is the indexes of the tables inside LATERAL do not work outside." In your example, if there was an index on r.company_id, does this mean that the index would be used in the subquery, but not used if it were referenced in a WHERE clause in the outer query?Whorton
@EricDauenhauer Yes, it does.Carilyn
K
14

I managed to solve it using Windows Function:

WITH ranked_relationship AS(
    SELECT
        *
        ,row_number() OVER (PARTITION BY fk_company ORDER BY dt_start DESC) as dt_last_addr
    FROM relationship
)

SELECT
    company.*
    address.*,
    dt_last_addr as dt_relationship
FROM
    company
    LEFT JOIN ranked_relationship as relationship
            ON relationship.fk_company = company.pk_company AND dt_last_addr = 1
    LEFT JOIN address ON address.pk_address = relationship.fk_address

row_number() creates an int counter for each record, inside each window based to fk_company. For each window, the record with latest date comes first with rank 1, then dt_last_addr = 1 makes sure the JOIN happens only once for each fk_company, with the record with latest address.

Window Functions are very powerful and few ppl use them, they avoid many complex joins and subqueries!

Keeping answered 19/2, 2014 at 13:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.