How to detect NULL rows in PostgreSQL sum()
Asked Answered
L

5

12

I want to aggregate the sum of a column, while tracking the presence of NULL values which signal an error condition. E.g., take the table numbers:

# select * from numbers;
  n   |  l  
------+-----
    1 | foo
    2 | foo
 NULL | bar
    4 | bar

Given a label l, I want to compute the sum of numbers n with that label, provided there are no NULL values. Ideally, for a label without any rows, the sum would be 0. So I'm looking for some query q such that q('foo') = 3, q('baz') = 0 and q('bar') somehow signals an error, e.g. by returning NULL.

I started with the sum() aggregate function, but that converts NULL rows to 0. One solution would be a variant that returns NULL provided there are any NULL values.

sum() gives

# select sum(n) from numbers where l = 'bar';
 sum 
-----
   4

but I'd rather have sumnull() with

# select sumnull(n) from numbers where l = 'bar';
 sumnull 
---------
  NULL

The best solution I've found so far is to also count non-NULL rows and compare to the total count:

# select sum(n), count(*), count(n) as notnull from numbers;
 sum | count | notnull 
-----+-------+---------
   7 |     4 |       3

Then if count is unequal to notnull, I know the result is not valid.

Laynelayney answered 26/9, 2016 at 13:0 Comment(2)
I am not aware of any standard SQL function to do that. But you can easily create such an aggregate function.Lykins
Additionally to the answers below. Sometimes it could be more efficiency to split the task into the two steps: 1) check is the null value exists (using an index on the column) and if not 2) calculate the sum.Hiatus
B
14

Is an empty set good enough?

create table numbers (n int);
insert into numbers values (1),(2),(null),(4);

select sum(n)
from numbers
having bool_and(n is not null);
 sum 
-----
(0 rows)

If you really need a null value it is a bit more complex:

with sum_null as (
    select sum(n) as sum_n
    from numbers
    having bool_and(n is not null)
)
select case
    when not exists (select 1 from sum_null) then null
    else (select sum_n from sum_null) end
;
 sum_n 
-------

(1 row)

Replacing the having line for:

having not bool_or(n is null)

is less readable but potentially much faster since it can stop searching at the first null found.

https://www.postgresql.org/docs/current/static/functions-aggregate.html#FUNCTIONS-AGGREGATE-TABLE

Buccaneer answered 26/9, 2016 at 13:31 Comment(1)
You're right, an empty result set is actually a more useful result for signalling an error condition. One nitpick is that I would regard an empty table as fine and expect the sum 0, though clearly I didn't put that into the question. This can be solved by changing to having coalesce(bool_and(n is not null), true).Laynelayney
E
8

You can create a custom aggregate, e.g.:

create or replace function int_sum_null(int, int)
returns int language sql as $$
    select $1 + $2
$$;

create aggregate sumnull(integer) (
    sfunc = int_sum_null,
    stype = int
);

select sum(n), sumnull(n)
from numbers;

 sum | sumnull 
-----+---------
   7 | <null>      
(1 row) 

Update #1

Solutions without a custom aggregate:

select case 
    when bool_or(n is null) then null 
    else sum(n) end
from numbers;

select coalesce((
    select sum(n)
    from numbers
    having not bool_or(n is null)), null);

These variants are based on the Clodoaldo Neto idea. If you like them please upvote his answer too.


Update #2

Modify the custom aggregate sumnull and add the initial condition:

drop aggregate sumnull(integer);
create aggregate sumnull(integer) (
    sfunc = int_sum_null,
    stype = int,
    initcond = 0
);

to get the results you have described in the updated question:

create table numbers (n int, l text);
insert into numbers values 
(1, 'foo'), (2, 'foo'), (null, 'bar'), (4, 'bar');

select 
    sumnull(n) filter (where l = 'foo') foo, 
    sumnull(n) filter (where l = 'bar') bar, 
    sumnull(n) filter (where l = 'baz') baz
from numbers;

 foo | bar | baz 
-----+-----+-----
   3 |     |   0
(1 row)
Evangelical answered 26/9, 2016 at 13:11 Comment(4)
You could replace int with anyelement.Phila
@LaurenzAlbe IMO not anyelement but numericHiatus
@LaurenzAlbe - yes, this is possible, however system aggregates are defined for individual types. There is a difference in error reporting when an aggregate is used improperly.Evangelical
Worth noting is that the custom aggregate (sumnull) is incredibly slower than the built-in sum (in my case, about 10x). In case you're looking for performance, the case statement is a much better solution.Crites
S
3

1) Use COALESCE

SELECT SUM(COALESCE(n,'NaN'::numeric)) FROM numbers;

If any row is NULL then result will be as 'NaN'

2) You will get NULL as result if any row has NULL value else result as number

SELECT 
    CASE WHEN (SELECT COUNT(*) FROM numbers WHERE n IS NULL) > 0 THEN NULL
        ELSE (SELECT SUM(COALESCE(n, 0)) FROM numbers)
END
Scrutator answered 5/12, 2018 at 10:6 Comment(0)
S
2

You can have it return NaN for those cases by using coalesce to replace NULL with NaN;

create table t(
  val int
);

insert into t (val) values (1), (2), (NULL);
select sum(coalesce(val, double precision 'NaN')) from t;

Result: NaN.

insert into t (val) values (1), (2), (3);
select sum(coalesce(val, double precision 'NaN')) from t;

Result: 6.

See this SQLFiddle

Sabatier answered 11/10, 2018 at 16:26 Comment(0)
L
0

I have the following... just in case it is useful to somebody:

SELECT SUM(n) * CASE WHEN count(n) = count(*) THEN 1 ELSE NULL END FROM numbers

The trick is to detect whether there are NULL rows by comparing counts, then forcing a math operation with either 1 or NULL which yields a NULL when there are NULL rows.

Longobard answered 15/4 at 13:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.