How can I get a cumulative product with Snowflake?
Asked Answered
S

2

5

I want to calculate the cumulative product across rows in Snowflake.

Basically I have monthly rates that multiplied accumulate across time.

(Some databases have the product() SQL function for that).

Sayre answered 29/3, 2022 at 0:25 Comment(1)
It is worth mentioning that non-cumulative version is possible with User-Defined Aggregate Functions (UDAFs) - Product_aggTreponema
S
7

A trick suggested by Sterling Paramore: Add logs, and then exponentiate it:

with data as (select $1 x from values (1),(2),(3),(4),(5))

select x
    , sum(x) over(order by x) sum
    , exp(sum(ln(x)) over(order by x)) mult
from data

enter image description here

Sayre answered 29/3, 2022 at 0:25 Comment(2)
this is really nice.Lusterware
Beautiful! For those curious: due to the property ln(a*b) = ln(a) + ln(b) the following is true exp(ln(a) + ln(b)) = a*b, since exp(ln(x)) = x for any x>=0.Spatula
H
1

If a built-in function doesn't exist, it's usually possible to roll something custom using a User-Defined Table Function.

In this case:

CREATE OR REPLACE FUNCTION CUMULATIVE_PRODUCT(VALUE double)
    RETURNS TABLE (PRODUCT double)
    LANGUAGE JAVASCRIPT
    AS '{
        initialize: function(argumentInfo, context) {
           this.cumulativeProduct = 1;
        },
        processRow: function f(row, rowWriter, context){
           this.cumulativeProduct = this.cumulativeProduct*row.VALUE;
           rowWriter.writeRow({PRODUCT: this.cumulativeProduct});
        }
    }';

Example table:

create temp table sample_numbers as (
select 1 as index, 5.1::double as current_value
    union all
select 2 as index, 4.3::double as current_value
    union all
select 3 as index, 3.7::double as current_value
    union all
select 4 as index, 3.9::double as current_value
)

invoking the UDTF:

select index,current_value,PRODUCT as cumulative_product
from sample_numbers,table(CUMULATIVE_PRODUCT(current_value) over ())

enter image description here

Note the empty over() clause which forces Snowflake to do a single sequential run over the data instead of splitting it into parallel chunks

Homologate answered 29/3, 2022 at 0:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.