Postgres analogue to CROSS APPLY in SQL Server
Asked Answered
M

4

35

I need to migrate SQL queries written for MS SQL Server 2005 to Postgres 9.1.
What is the best way to substitute for CROSS APPLY in this query?

SELECT *
FROM V_CitizenVersions         
CROSS APPLY     
       dbo.GetCitizenRecModified(Citizen, LastName, FirstName, MiddleName,
BirthYear, BirthMonth, BirthDay, ..... ) -- lots of params

GetCitizenRecModified() function is a table valued function. I can't place code of this function because it's really enormous, it makes some difficult computations and I can't abandon it.

Meryl answered 13/7, 2012 at 14:44 Comment(3)
You don't need cross apply in Postgres. You can use a table function just like a function. Simply join them.Postremogeniture
@a_horse_with_no_name - CROSS APPLY re-executes the TVF with correlated parameters rather than executing once and then joining the result.Sims
I realise this is ancient... @MartinSmith that is not necessarily the case on MSSQL if that function is of the inline-table-valued variety, see Paul White's write up on how the MSSQL query planner can sometimes optimize apply into a join : sqlservercentral.com/articles/APPLY/69954 Since we don't see the original code here I am speculating that's what happened based on the comment re performance on Erwin's answer.Necropolis
W
45

In Postgres 9.3 or later use a LATERAL join:

SELECT v.col_a, v.col_b, f.*  -- no parentheses, f is a table alias
FROM   v_citizenversions v
LEFT   JOIN LATERAL f_citizen_rec_modified(v.col1, v.col2) f ON true
WHERE  f.col_c = _col_c;

Why LEFT JOIN LATERAL ... ON true?


For older versions, there is a very simple way to accomplish what I think you are trying to with a set-returning function (RETURNS TABLE or RETURNS SETOF record OR RETURNS record):

SELECT *, (f_citizen_rec_modified(col1, col2)).*
FROM   v_citizenversions v

The function computes values once for every row of the outer query. If the function returns multiple rows, resulting rows are multiplied accordingly. All parentheses are syntactically required to decompose a row type. The table function could look something like this:

CREATE OR REPLACE FUNCTION f_citizen_rec_modified(_col1 int, _col2 text)
  RETURNS TABLE(col_c integer, col_d text)
  LANGUAGE sql AS
$func$
SELECT s.col_c, s.col_d
FROM   some_tbl s
WHERE  s.col_a = $1
AND    s.col_b = $2
$func$;

You need to wrap this in a subquery or CTE if you want to apply a WHERE clause because the columns are not visible on the same level. (And it's better for performance anyway, because you prevent repeated evaluation for every output column of the function):

SELECT col_a, col_b, (f_row).*
FROM  (
   SELECT col_a, col_b, f_citizen_rec_modified(col1, col2) AS f_row
   FROM   v_citizenversions v
   ) x
WHERE (f_row).col_c = _col_c;

There are several other ways to do this or something similar. It all depends on what you want exactly.

Womanhood answered 13/7, 2012 at 15:22 Comment(4)
i used the query you proposed. now i'm shocked: the query executes more than a minute. in ms sql it takes less than a second O_O.Meryl
@user1178399: It's practically impossible to comment on that without knowing the many factors in play. I would speculate that the performance can be improved.Womanhood
I would suggest that the reason for the performance difference is that the original MSSQL query is probably not executing the function for every row. The function is likely an inline-table-valued-function (ITVF) and the query optimizer has executed is as a join rather than a correlated query for every row. In that case using lateral is an unfair comparison. In any rdbms, executing a user-defined (in sql) function for every row is a terrible idea. There's a good example of how MSSQL query planner can optimize ITVF here: sqlservercentral.com/articles/APPLY/69954Necropolis
We can use CROSS JOIN with returning table function, it avoid "ON TRUE" SELECT v.col_a, v.col_b, f.* FROM v_citizenversions v CROSS JOIN f_citizen_rec_modified(v.col1, v.col2) fStupefacient
L
32

Necromancing:
New in PostgreSQL 9.3:

The LATERAL keyword

left | right | inner JOIN LATERAL

INNER JOIN LATERAL is the same as CROSS APPLY
and LEFT JOIN LATERAL is the same as OUTER APPLY

Example usage:

SELECT * FROM T_Contacts 

--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1 
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989


LEFT JOIN LATERAL 
(
    SELECT 
         --MAP_CTCOU_UID    
         MAP_CTCOU_CT_UID   
        ,MAP_CTCOU_COU_UID  
        ,MAP_CTCOU_DateFrom 
        ,MAP_CTCOU_DateTo   
   FROM T_MAP_Contacts_Ref_OrganisationalUnit 
   WHERE MAP_CTCOU_SoftDeleteStatus = 1 
   AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID 

    /*  
    AND 
    ( 
        (__in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo) 
        AND 
        (__in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom) 
    ) 
    */
   ORDER BY MAP_CTCOU_DateFrom 
   LIMIT 1 
) AS FirstOE ON true 
Lintwhite answered 8/3, 2016 at 16:49 Comment(0)
V
2

This link appears to show how to do it in Postgres 9.0+:

PostgreSQL: parameterizing a recursive CTE

It's further down the page in the section titled "Emulating CROSS APPLY with set-returning functions". Please be sure to note the list of limitations after the example.

Vellum answered 13/7, 2012 at 15:19 Comment(1)
I'm surprised the link only police haven't jumped on this.Airflow
B
2

I like Erwin Brandstetter's answer however, I've discovered a performance problem: when running

SELECT *, (f_citizen_rec_modified(col1, col2)).*
FROM   v_citizenversions v

The f_citizen_rec_modified function will be ran 1 time for every column it returns (multiplied by every row in v_citizenversions). I did not find documentation for this effect, but was able to deduce it by debugging. Now the question becomes, how can we get this effect (prior to 9.3 where lateral joins are available) without this performance robbing side effect?

Update: I seem to have found an answer. Rewrite the query as follows:

select x.col1, x.col2, x.col3, (x.func).* 
FROM (select SELECT v.col1, v.col2, v.col3, f_citizen_rec_modified(col1, col2) func
FROM   v_citizenversions v) x

The key difference being getting the raw function results first (inner subquery) then wrapping that in another select that busts those results out into the columns. This was tested on PG 9.2

Brinkman answered 21/11, 2013 at 16:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.