Test for null in function with varying parameters
Asked Answered
E

4

10

I have a Postgres function:

create function myfunction(integer, text, text, text, text, text, text) RETURNS 
table(id int, match text, score int, nr int, nr_extra character varying, info character varying, postcode character varying,
street character varying, place character varying, country character varying, the_geom geometry)
AS $$
BEGIN

return query (select a.id, 'address' as match, 1 as score, a.ad_nr, a.ad_nr_extra,a.ad_info,a.ad_postcode, s.name as street, p.name place , c.name country, a.wkb_geometry as wkb_geometry from "Addresses" a 
    left join "Streets" s on a.street_id = s.id 
        left join "Places" p on s.place_id = p.id 
            left join "Countries" c on p.country_id = c.id 
            where c.name = $7 
                and p.name = $6
                    and s.name = $5
                    and a.ad_nr = $1 
                    and a.ad_nr_extra = $2
                    and a.ad_info = $3
                    and ad_postcode = $4);
END;
$$
LANGUAGE plpgsql;

This function fails to give the right result when one or more of the variables entered are NULL because ad_postcode = NULL will fail.

What can I do to test for NULL inside the query?

Elene answered 27/6, 2013 at 21:22 Comment(3)
You can use function arguments names instead of $1..$9. Something like myfunction(v_ad_nr integer, v_ad_nr_extra text, ... and ... a.ad_nr = v_ad_nr .... It makes code much more readable and easier to modify.Cowbird
And read some articles on SQL query formatting. Currently it is horrible. Or, at least, use an automatic query beautifier.Cowbird
Thank you for hinting me to use argument names, appreciated.Elene
M
29

I disagree with some of the advice in other answers. This can be done with PL/pgSQL and I think it is mostly far superior to assembling queries in a client application. It is faster and cleaner and the app only sends the bare minimum across the wire in requests. SQL statements are saved inside the database, which makes it easier to maintain - unless you want to collect all business logic in the client application, this depends on the general architecture.

PL/pgSQL function with dynamic SQL

CREATE OR REPLACE FUNCTION func(
      _ad_nr       int  = NULL
    , _ad_nr_extra text = NULL
    , _ad_info     text = NULL
    , _ad_postcode text = NULL
    , _sname       text = NULL
    , _pname       text = NULL
    , _cname       text = NULL)
  RETURNS TABLE(id int, match text, score int, nr int, nr_extra text
              , info text, postcode text, street text, place text
              , country text, the_geom geometry)
  LANGUAGE plpgsql AS
$func$
BEGIN
   -- RAISE NOTICE '%', -- for debugging
   RETURN QUERY EXECUTE concat(
   $$SELECT a.id, 'address'::text, 1 AS score, a.ad_nr, a.ad_nr_extra
        , a.ad_info, a.ad_postcode$$

   , CASE WHEN (_sname, _pname, _cname) IS NULL THEN ', NULL::text' ELSE ', s.name' END  -- street
   , CASE WHEN (_pname, _cname) IS NULL         THEN ', NULL::text' ELSE ', p.name' END  -- place
   , CASE WHEN _cname IS NULL                   THEN ', NULL::text' ELSE ', c.name' END  -- country
   , ', a.wkb_geometry'

   , concat_ws('
   JOIN   '
   , '
   FROM   "Addresses" a'
   , CASE WHEN NOT (_sname, _pname, _cname) IS NULL THEN '"Streets"   s ON s.id = a.street_id' END
   , CASE WHEN NOT (_pname, _cname) IS NULL         THEN '"Places"    p ON p.id = s.place_id' END
   , CASE WHEN _cname IS NOT NULL                   THEN '"Countries" c ON c.id = p.country_id' END
   )

   , concat_ws('
   AND    '
      , '
   WHERE  TRUE'
      , CASE WHEN $1 IS NOT NULL THEN 'a.ad_nr = $1' END
      , CASE WHEN $2 IS NOT NULL THEN 'a.ad_nr_extra = $2' END
      , CASE WHEN $3 IS NOT NULL THEN 'a.ad_info = $3' END
      , CASE WHEN $4 IS NOT NULL THEN 'a.ad_postcode = $4' END
      , CASE WHEN $5 IS NOT NULL THEN 's.name = $5' END
      , CASE WHEN $6 IS NOT NULL THEN 'p.name = $6' END
      , CASE WHEN $7 IS NOT NULL THEN 'c.name = $7' END
   )
   )
   USING $1, $2, $3, $4, $5, $6, $7;
END
$func$;

Call:

SELECT * FROM func(1, '_ad_nr_extra', '_ad_info', '_ad_postcode', '_sname');

SELECT * FROM func(1, _pname := 'foo');

Since all function parameters have default values, you can use positional notation, named notation or mixed notation at your choosing in the function call. See:

More explanation for basics of dynamic SQL:

The concat() function is instrumental for building the string. It was introduced with Postgres 9.1.

The ELSE branch of a CASE statement defaults to NULL when not present. Simplifies the code.

The USING clause for EXECUTE makes SQL injection impossible as values are passed as values and allows to use parameter values directly, exactly like in prepared statements.

NULL values are used to ignore parameters here. They are not actually used to search.

You don't need parentheses around the SELECT with RETURN QUERY.

Simple SQL function

You could do it with a plain SQL function and avoid dynamic SQL. For some cases this may be faster. But I wouldn't expect it in this case. Planning the query without unnecessary joins and predicates typically produces best results. Planning cost for a simple query like this is almost negligible.

CREATE OR REPLACE FUNCTION func_sql(
     _ad_nr       int  = NULL
   , _ad_nr_extra text = NULL
   , _ad_info     text = NULL
   , _ad_postcode text = NULL
   , _sname       text = NULL
   , _pname       text = NULL
   , _cname       text = NULL)
  RETURNS TABLE(id int, match text, score int, nr int, nr_extra text
              , info text, postcode text, street text, place text
              , country text, the_geom geometry)
  LANGUAGE sql AS 
$func$
SELECT a.id, 'address' AS match, 1 AS score, a.ad_nr, a.ad_nr_extra
     , a.ad_info, a.ad_postcode
     , s.name AS street, p.name AS place
     , c.name AS country, a.wkb_geometry
FROM   "Addresses"      a
LEFT   JOIN "Streets"   s ON s.id = a.street_id
LEFT   JOIN "Places"    p ON p.id = s.place_id
LEFT   JOIN "Countries" c ON c.id = p.country_id
WHERE ($1 IS NULL OR a.ad_nr = $1)
AND   ($2 IS NULL OR a.ad_nr_extra = $2)
AND   ($3 IS NULL OR a.ad_info = $3)
AND   ($4 IS NULL OR a.ad_postcode = $4)
AND   ($5 IS NULL OR s.name = $5)
AND   ($6 IS NULL OR p.name = $6)
AND   ($7 IS NULL OR c.name = $7)
$func$;

Identical call.

To effectively ignore parameters with NULL values:

($1 IS NULL OR a.ad_nr = $1)

To actually use NULL values as parameters, use this construct instead:

($1 IS NULL AND a.ad_nr IS NULL OR a.ad_nr = $1)  -- AND binds before OR

This also allows for indexes to be used.
For the case at hand, replace all instances of LEFT JOIN with JOIN.

db<>fiddle here - with simple demo for all variants.
Old sqlfiddle

Asides

  • Don't use name and id as column names. They are not descriptive and when you join a bunch of tables (like you do to a lot in a relational database), you end up with several columns all named name or id, and have to attach aliases to sort the mess.

  • Please format your SQL properly, at least when asking public questions. But do it privately as well, for your own good.

Malang answered 28/6, 2013 at 1:12 Comment(6)
"$3 is null" is giving me "Query failed: ERROR: 42P08: could not determine data type of parameter $3". $3 is a string with 4 characters.Prevalent
The type of the parameters are inferred from context, so ($1 IS NULL AND a.ad_nr IS NULL OR a.ad_nr = $1) don't work with me (Postgresql don't know the type of $1 beforehand), but (a.ad_nr = $1 OR $1 IS NULL AND a.ad_nr IS NULL) works.Prevalent
@Rodrigo: Which function did you test? I can't reproduce what you commented. The data type of all parameters is predefined. What's your version on Postgres? What's your exact call? I tested this before I posted with Postgres 9.1. Just tested all variants again with pg 9.3 and 9.4 to find them working as advertised. I added a fiddle demo to my answer: sqlfiddle.com/#!15/58e6c/2Malang
"The data type of all parameters is predefined", actually they don't have to be (in 9.4 or 9.3). Looks like if we begin comparing $1 with NULL, SQL have no context to learn from. But if we begin comparing a known column with $1, then it "learns" the type of $1. The advantage of not specifying the types is only a small gain in speed of implementation. So far it seems to work.Prevalent
@ErwinBrandstetter Sorry to raise this very old topic, but, reading Denis' comment and gist below - on version >= 9.2 (10.4 in my case) would it still be preferred to go with the dynamic approach, or the simple SQL (multiple $1 IS NULL or col=$1) would perform the same way in the sense of a plan per parameter combination? what about parameters only affecting the where - not the joins? p.s started dealing with postgres and your answers on SO are invaluable! Kudos!Cherubini
@Pyrocks: Basically still valid in Postgres 10 (or 11). But there have been various changes / improvements since. The answer is mostly "depends...". Way too many questions for a comment . Start a question with specifics for your case (and narrow it down to one question). You can always link here for context and / or add a comment here to link to your new (related) question.Malang
C
4

You can use

c.name IS NOT DISTINCT FROM $7

It will return true if c.name and $7 are equal or both are null.

Or you can use

(c.name = $7 or $7 is null )

It will return true if c.name and $7 are equal or $7 is null.

Cowbird answered 27/6, 2013 at 21:36 Comment(0)
G
4

If you can modify the query, you could do something like

and (ad_postcode = $4 OR $4 IS NULL)
Gca answered 27/6, 2013 at 21:37 Comment(1)
Thank you, allthough this was also stated in a previous answer I do appreciate the effort!Elene
B
2

Several things...

First, as side note: the semantics of your query might need a revisit. Some of the stuff in your where clauses might actually belong in your join clauses, like:

from ...
left join ... on ... and ...
left join ... on ... and ...

When they don't, you most should probably be using an inner join, rather than a left join.

Second, there is a is not distinct from operator, which can occasionally be handy in place of =. a is not distinct from b is basically equivalent to a = b or a is null and b is null.

Note, however, that is not distinct from does NOT use an index, whereas = and is null actually do. You could use (field = $i or $i is null) instead in your particular case, and it will yield the optimal plan if you're using the latest version of Postgres:

https://gist.github.com/ddebernardy/5884267

Bozen answered 27/6, 2013 at 21:43 Comment(4)
I have to disagree on the "very best option" and provided an answer to back that up.Malang
Wow, Postgres made some serious improvements on this front since I last needed it. I added a gist with an example of the same prepared statement with two different parameters. 9.2 seems to prepare several plans based on whichever predicate apply. Cool stuff.Bozen
This was a major update in 9.2. Some details in the release notes. BTW, I am not notified of your answer, if you don't @-reply, since we are commenting your answer. Saw this by chance.Malang
Thank you for the answers and comments. The hints on better organizing my code are appreciated.Elene

© 2022 - 2024 — McMap. All rights reserved.