Get first N elements from an array in BigQuery table
Asked Answered
B

3

5

I have an array column and I would like to get first N elements of it (keeping an array data type). Is there a some nice way how to do it? Ideally without unnesting, ranking and array_agg back to array.

I could also do this (for getting first 2 elements):

WITH data AS
(
  SELECT 1001 as id, ['a', 'b', 'c'] as array_1
  UNION ALL
  SELECT 1002 as id, ['d', 'e', 'f', 'g'] as array_1
  UNION ALL
  SELECT 1003 as id, ['h', 'i'] as array_1
)
select *,
       [array_1[SAFE_OFFSET(0)], array_1[SAFE_OFFSET(1)]] as my_result
from data

But obviously this is not a nice solution as it would fail in case when some array would have only 1 element.

Begot answered 5/11, 2019 at 14:4 Comment(1)
you can use ARRAY_LENGTH(array_1). This will give you the length of the array.Asare
E
5

Here's a general solution with a UDF that you can call for any array type:

CREATE TEMP FUNCTION TopN(arr ANY TYPE, n INT64) AS (
  ARRAY(SELECT x FROM UNNEST(arr) AS x WITH OFFSET off WHERE off < n ORDER BY off)
);

WITH data AS
(
  SELECT 1001 as id, ['a', 'b', 'c'] as array_1
  UNION ALL
  SELECT 1002 as id, ['d', 'e', 'f', 'g'] as array_1
  UNION ALL
  SELECT 1003 as id, ['h', 'i'] as array_1
)
select *, TopN(array_1, 2) AS my_result
from data

It uses unnest and the array function, which it sounds like you didn't want to use, but it has the advantage of being general enough that you can pass any array to it.

Earthworm answered 5/11, 2019 at 14:15 Comment(1)
I have to say I hoped this could be done without UDF but this for sure works.Begot
D
2

Another option for BigQuery Standard SQL (with JS UDF)

#standardSQL
CREATE TEMP FUNCTION FirstN(arr ARRAY<STRING>, N FLOAT64)
RETURNS ARRAY<STRING> LANGUAGE js AS """ 
  return arr.slice(0, N);
""";
SELECT *, 
  FirstN(array_1, 3) AS my_result
FROM data   
Drawing answered 5/11, 2019 at 20:1 Comment(1)
Works too, thanks for the tip! I picked the other answer as accepted as it does not need javascript and stays completely in SQL.Begot
B
0

You can use a case when condition on size of the array:

CASE
  WHEN ARRAY_LENGTH(array_1) >= 2 THEN ARRAY_SLICE(array_1, 1, 2)
  ELSE array_1
END as top_2_elements
Boutonniere answered 13/3 at 11:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.