How to workaround this case of lateral join with Spark SQL?
Asked Answered
S

1

8

I have a lateral join defined in this way:

select A.id, B.value
from A
left join lateral (
    select value
    from B
    where B.id = A.id
    limit 1
) as X on true;

that has the particular point of having limit 1 inside (in more complicated cased I could have some extra option in the join to narrow down the values and/or an order by). I know that in Spark-SQL there is nothing like that at the moment, so how can I get a workaround for it?

Superfuse answered 29/6, 2020 at 7:52 Comment(2)
Can you provide sample inputs and the result you're getting when running your lateral join?Paschal
In most(?) cases lateral can be replaced with a join + group by: Link. It depends on the actual task to say what the best solution in Spark would be.Olivann
S
4
WITH B_Ranked AS (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY id ORDER BY [some_column]) as rn
    FROM B
)
SELECT A.id, B_Ranked.value
FROM A
LEFT JOIN B_Ranked ON A.id = B_Ranked.id AND B_Ranked.rn = 1;

This approach should give you the first record from B for each id in A, similar to your original lateral join query with LIMIT 1.

Strapper answered 7/12, 2023 at 19:45 Comment(1)
Thanks, the bounty was about to expire and I was going to have to do it myself! Note that you can perform a more complicated filter inside the ON or WHERE, e.g. B_Ranked.rn < 25.Gaddi

© 2022 - 2024 — McMap. All rights reserved.