Splitting a string column in BigQuery
Asked Answered
F

4

25

Let's say I have a table in BigQuery containing 2 columns. The first column represents a name, and the second is a delimited list of values, of arbitrary length. Example:

Name | Scores
-----+-------
Bob  |10;20;20
Sue  |14;12;19;90
Joe  |30;15

I want to transform into columns where the first is the name, and the second is a single score value, like so:

Name,Score
Bob,10
Bob,20
Bob,20
Sue,14
Sue,12
Sue,19
Sue,90
Joe,30
Joe,15

Can this be done in BigQuery alone?

Fancher answered 16/10, 2013 at 21:6 Comment(0)
A
19

Good news everyone! BigQuery can now SPLIT()!


Look at "find all two word phrases that appear in more than one row in a dataset".

There is no current way to split() a value in BigQuery to generate multiple rows from a string, but you could use a regular expression to look for the commas and find the first value. Then run a similar query to find the 2nd value, and so on. They can all be merged into only one query, using the pattern presented in the above example (UNION through commas).

Aweather answered 16/10, 2013 at 23:48 Comment(3)
Nice - This is awesome! Are there any parameters to specify a range of elements to pull from the resulting array?Fancher
I'm not sure I understand. Post new question? See https://mcmap.net/q/456341/-how-to-find-the-most-common-bi-grams-with-bigqueryAweather
Is it possible to just split the scores column into multiple columns namely scores1, scores2 ,scores3 etc? instead of creating a row for each value?Beckner
D
19

Trying to rewrite Elad Ben Akoune's answer in Standart SQL, the query becomes like this;

WITH name_score AS (
SELECT Name, split(Scores,';') AS Score
FROM (
      (SELECT * FROM (SELECT 'Bob' AS Name ,'10;20;20' AS Scores)) 
      UNION ALL 
      (SELECT * FROM (SELECT 'Sue' AS Name ,'14;12;19;90' AS Scores))
      UNION ALL
      (SELECT * FROM (SELECT 'Joe' AS Name ,'30;15' AS Scores))
)) 
SELECT name, score
FROM name_score
CROSS JOIN UNNEST(name_score.score) AS score;

And this outputs;

+------+-------+
| name | score |
+------+-------+
| Bob  | 10    |
| Bob  | 20    |
| Bob  | 20    |
| Sue  | 14    |
| Sue  | 12    |
| Sue  | 19    |
| Sue  | 90    |
| Joe  | 30    |
| Joe  | 15    |
+------+-------+
Deadpan answered 22/11, 2018 at 8:25 Comment(0)
P
13

If someone is still looking for an answer

select Name,split(Scores,';') as Score
from (
      # replace the inner custome select with your source table
      select *
      from 
      (select 'Bob' as Name ,'10;20;20' as Scores),
      (select 'Sue' as Name ,'14;12;19;90' as Scores),
      (select 'Joe' as Name ,'30;15' as Scores)
);
Parboil answered 21/5, 2015 at 8:46 Comment(1)
Note: This query only works in Legacy SQL Mode. In Standart SQL, it will give "Column name Name is ambiguous" error.Deadpan
H
0

Let's call your example table "FACT_SCORES".

Try this:

WITH SCORE_LIST AS (
SELECT
  FS.NAME
  ,FS.SCORES
  ,SPLIT(FS.SCORES,';') AS SCORES_ARRAY
FROM FACT_SCORES FS
)

SELECT
  SL.NAME
  ,SL.SCORES
  ,SCORE
FROM SCORE_LIST SL
CROSS JOIN UNNEST(SL.SCORES_ARRAY) AS SCORE

What you we're trying to do is called flattening arrays.

I first transformed your list into an array, and then flattened it.

Google has some very good documentation on working with arrays: https://cloud.google.com/bigquery/docs/arrays

And a specific part about converting elements in an array to rows in a table: https://cloud.google.com/bigquery/docs/arrays#flattening_arrays

Himelman answered 19/6, 2024 at 14:44 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.