How to get distinct values on GROUP_CONCAT using Google Big Query

Asked 20/2, 2015 at 21:12 Answered 16/12, 2021 at 13:45

Solved distinct google-bigquery group-concat

I'm trying to get distinct values when using GROUP_CONCAT in BigQuery.

I'll recreate the situation using a simpler, static example:

EDIT: I've modified the example to represent better my real situation: 2 columns with group_concat which needs to be distinct:

SELECT 
  category, 
  GROUP_CONCAT(id) as ids, 
  GROUP_CONCAT(product) as products
FROM 
 (SELECT "a" as category, "1" as id, "car" as product),
 (SELECT "a" as category, "2" as id, "car" as product),
 (SELECT "a" as category, "3" as id, "car" as product),
 (SELECT "b" as category, "4" as id, "car" as product),
 (SELECT "b" as category, "5" as id, "car" as product),
 (SELECT "b" as category, "2" as id, "bike" as product),
 (SELECT "a" as category, "1" as id, "truck" as product),
GROUP BY 
  category

This example returns:

Row category    ids products
1   a   1,2,3,1 car,car,car,truck
2   b   4,5,6   car,car,bike

I'd like to strip the duplicated values found, to return like:

Row category    ids products 
1   a   1,2,3   car,truck
2   b   4,5,6   car,bike

In MySQL, GROUP_CONCAT has a DISTINCT OPTION, but in BigQuery there isn't.

Any ideas?

Peti answered 20/2, 2015 at 21:12 Comment(2)

possible duplicate of Syntax to run a distinct GROUP_CONCAT in Google Bigquery – Homeric 20/2, 2015 at 21:50

I think it's similar but not exactly the same, but thank you for pointing @Homeric – Peti 23/2, 2015 at 12:56

Here is solution which uses UNIQUE scope aggregation function to remove duplicates. Note, that in order to use it, first we need to build a REPEATED using NEST aggregation:

SELECT 
  GROUP_CONCAT(UNIQUE(ids)) WITHIN RECORD,
  GROUP_CONCAT(UNIQUE(products)) WITHIN RECORD 
FROM (
SELECT 
  category, 
  NEST(id) as ids, 
  NEST(product) as products
FROM 
 (SELECT "a" as category, "1" as id, "car" as product),
 (SELECT "a" as category, "2" as id, "car" as product),
 (SELECT "a" as category, "3" as id, "car" as product),
 (SELECT "b" as category, "4" as id, "car" as product),
 (SELECT "b" as category, "5" as id, "car" as product),
 (SELECT "b" as category, "2" as id, "bike" as product),
 (SELECT "a" as category, "1" as id, "truck" as product),
GROUP BY 
  category
)

Cranial answered 23/2, 2015 at 19:16 Comment(3)

Perfect Mosha! I've never heard about UNIQUE function. It worked flawlessly! Thanks! – Peti 24/2, 2015 at 20:50

I don't think that you eve need to do the NEST subselect – Doldrums 30/3, 2016 at 9:22

I think this solution is in Legacy SQL syntax, would be good to mention this. This solution does not work with Standard SQL for BigQuery. Does anyone has a solution in Standard SQL? – Lothario 16/12, 2021 at 13:37

Removing duplicates before applying group_concat will achieve the result you want:

    SELECT 
      category, 
      GROUP_CONCAT(id) as ids
    FROM (  
    SELECT category, id
    FROM 
     (SELECT "a" as category, "1" as id),
     (SELECT "a" as category, "2" as id),
     (SELECT "a" as category, "3" as id),
     (SELECT "b" as category, "4" as id),
     (SELECT "b" as category, "5" as id),
     (SELECT "b" as category, "6" as id),
     (SELECT "a" as category, "1" as id),
    GROUP BY 
      category, id
    )
    GROUP BY 
      category

Capitulary answered 21/2, 2015 at 0:41 Comment(1)

Thanks Ahmed, it works for a single column, but in my real situation I need 2 different distinct columns. I've edited the question to show the problem. – Peti 23/2, 2015 at 14:35

Here is solution which uses UNIQUE scope aggregation function to remove duplicates. Note, that in order to use it, first we need to build a REPEATED using NEST aggregation:

SELECT 
  GROUP_CONCAT(UNIQUE(ids)) WITHIN RECORD,
  GROUP_CONCAT(UNIQUE(products)) WITHIN RECORD 
FROM (
SELECT 
  category, 
  NEST(id) as ids, 
  NEST(product) as products
FROM 
 (SELECT "a" as category, "1" as id, "car" as product),
 (SELECT "a" as category, "2" as id, "car" as product),
 (SELECT "a" as category, "3" as id, "car" as product),
 (SELECT "b" as category, "4" as id, "car" as product),
 (SELECT "b" as category, "5" as id, "car" as product),
 (SELECT "b" as category, "2" as id, "bike" as product),
 (SELECT "a" as category, "1" as id, "truck" as product),
GROUP BY 
  category
)

Cranial answered 23/2, 2015 at 19:16 Comment(3)

Perfect Mosha! I've never heard about UNIQUE function. It worked flawlessly! Thanks! – Peti 24/2, 2015 at 20:50

I don't think that you eve need to do the NEST subselect – Doldrums 30/3, 2016 at 9:22

In Standard SQL (the preferred BigQuery dialect) the solution would be:

SELECT 
    string_agg(distinct(q.product), ', ') as products_distinct

FROM 
    (
        (SELECT "a" as category, "1" as id, "car" as product)
        union all
        (SELECT "a" as category, "2" as id, "car" as product)
        union all
        (SELECT "a" as category, "3" as id, "car" as product)
        union all
        (SELECT "b" as category, "4" as id, "car" as product)
        union all
        (SELECT "b" as category, "5" as id, "car" as product)
        union all
        (SELECT "b" as category, "2" as id, "bike" as product)
        union all
        (SELECT "a" as category, "1" as id, "truck" as product)
    ) as q

Lothario answered 16/12, 2021 at 13:45 Comment(0)

Recommended topics

Hot tags