BigQuery User Defined Aggregation Function?
Asked Answered
R

1

12

I know I can define a User Defined Function in order to perform some custom calculation. I also know I can use the 'out-of-the-box' aggregation functions to reduce a collection of values to a single value when using a GROUP BY clause.

Is it possible to define a custom user-defined, Aggregation Function to use with a GROUP BY clause?

Recipe answered 15/5, 2017 at 8:15 Comment(0)
R
17

Turns out that this IS possible (as long as the groups we seek to aggregate are of a reasonable size in memory) with a little bit of 'glue' - namely the ARRAY_AGG function

The steps are as follows:

  1. Create a UDF with an input parameter of type ARRAY<T> where T is the type of value you want to aggregate.
  2. Use the ARRAY_AGG function in the query with the GROUP BY clause to generate an array of T and pass into your UDF.

As a concrete example:

CREATE TEMP FUNCTION aggregate_fruits(fruits ARRAY<STRING>)
RETURNS STRING
LANGUAGE js AS """
return "my fruit bag contains these items: " + fruits.join(",");
""";

WITH fruits AS
(SELECT "apple" AS fruit
UNION ALL SELECT "pear" AS fruit
UNION ALL SELECT "banana" AS fruit)

SELECT aggregate_fruits(ARRAY_AGG(fruit))
FROM fruits
Recipe answered 15/5, 2017 at 12:27 Comment(1)
For others, here is the associated feature request: issuetracker.google.com/issues/62898282Mccourt

© 2022 - 2024 — McMap. All rights reserved.