MySQL - GROUP_CONCAT returns duplicate data, can't use DISTINCT
Asked Answered
P

3

9

I have a normalized database and I'm trying to return data from multiple tables using JOINs and GROUP_CONCAT.

Problem: Rows are being duplicated with GROUP_CONCAT. I can't use DISTINCT because some of the data (ingredient mfr) does need to be duplicated.

Here is my current query and db structure (SQL Fiddle):

SELECT recipe.*, 
GROUP_CONCAT(recipe_detail.ingredient_id) AS iid,  
GROUP_CONCAT(ingredient.name) AS iname, 
GROUP_CONCAT(ingredient_mfr.abbr) AS mabbr, 
GROUP_CONCAT(recipe_tag.name) AS tag
FROM  recipe
LEFT JOIN recipe_detail
    ON recipe.id = recipe_detail.recipe_id
LEFT JOIN ingredient
    ON recipe_detail.ingredient_id = ingredient.id
LEFT JOIN ingredient_mfr
    ON ingredient.mfr_id = ingredient_mfr.id
LEFT JOIN recipe_tagmap
    ON recipe.id = recipe_tagmap.recipe_id
LEFT JOIN recipe_tag
    ON recipe_tagmap.tag_id = recipe_tag.id
WHERE recipe.user_id = 1
GROUP BY recipe.id

recipe
+------------+------------+-----------+
|    id      |    name    |  user_id  |
+============+============+===========+
|     1      |  Test123   |     1     |
+------------+------------+-----------+
|     2      |  Test456   |     1     |
+------------+------------+-----------+
|     3      |  Test789   |     1     |
+------------+------------+-----------+

recipe_detail
+------------+---------------+
| recipe_id  | ingredient_id |
+============+===============+
|     1      |      193      |
+------------+---------------+
|     1      |      194      |
+------------+---------------+
|     2      |       16      |
+------------+---------------+
|     3      |      277      |
+------------+---------------+

ingredient
+------------+---------------+---------+
|     id     |      name     |  mfr_id |
+============+===============+=========+
|     16     |       Gin     |    4    |
+------------+---------------+---------+
|     193    |       Fig     |    3    |
+------------+---------------+---------+
|     194    |       Tea     |    3    |
+------------+---------------+---------+
|     277    |       Nut     |    2    |
+------------+---------------+---------+

ingredient_mfr
+------------+------------+
|    id      |    abbr    |
+============+============+
|     2      |    TFA     |
+------------+------------+
|     3      |    FA      |
+------------+------------+
|     4      |    LOR     |
+------------+------------+

recipe_tag
+------------+------------+
|    id      |    name    |
+============+============+
|     1      |    one     |
+------------+------------+
|     2      |    two     |
+------------+------------+
|     3      |    three   |
+------------+------------+
|     4      |    four    |
+------------+------------+
|     5      |    five    |
+------------+------------+
|     6      |    six     |
+------------+------------+
|     7      |    seven   |
+------------+------------+
|     8      |    eight   |
+------------+------------+
|     9      |    nine    |
+------------+------------+

recipe_tagmap
+------------+---------------+---------+
|     id     |   recipe_id   |  tag_id |
+============+===============+=========+
|     1      |       1       |    1    |
+------------+---------------+---------+
|     2      |       1       |    2    |
+------------+---------------+---------+
|     3      |       1       |    3    |
+------------+---------------+---------+
|     4      |       2       |    4    |
+------------+---------------+---------+
|     5      |       2       |    5    |
+------------+---------------+---------+
|     6      |       2       |    6    |
+------------+---------------+---------+
|     7      |       3       |    7    |
+------------+---------------+---------+
|     8      |       3       |    8    |
+------------+---------------+---------+
|     9      |       3       |    9    |
+------------+---------------+---------+

With my current query, my results look like this:

+------+---------+--------------+----------- ----+---------------+------------------+
|  id  |  name   |      iid     |     iname      |    mabbr      |       tag        |
+======+=========+==============+================+===============+==================+
|   1  | Test123 | 193,193,193, | Fig, Fig, Fig, | FA, FA, FA,   | one, two, three, |
|      |         | 194,194,194  | Tea, Tea, Tea  | FA, FA, FA    | one, two, three  |
+------+---------+--------------+----------------+---------------+------------------+
|   2  | Test456 | 16,16,16     | Gin, Gin, Gin  | LOR, LOR, LOR | four, five six   |
+------+---------+--------------+----------------+---------------+------------------+
|   3  | Test789 | 277,277,277  | Nut, Nut, Nut  | TFA, TFA, TFA | seven,eight,nine |
+------+---------+--------------+----------------+---------------+------------------+

What I would like my results to look like:

+------+---------+--------------+----------- ----+---------------+------------------+
|  id  |  name   |      iid     |     iname      |    mabbr      |       tag        |
+======+=========+==============+================+===============+==================+
|   1  | Test123 |   193, 194   |    Fig, Tea    |    FA, FA     | one, two, three, |
+------+---------+--------------+----------------+---------------+------------------+
|   2  | Test456 |      16      |      Gin       |     LOR       | four, five six   |
+------+---------+--------------+----------------+---------------+------------------+
|   3  | Test789 |     277      |      Nut       |     TFA       | seven,eight,nine |
+------+---------+--------------+----------------+---------------+------------------+

As you can see, the presence of multiple tags causes the ingredient data to duplicate. The presence of multiple ingredients causes the tags to duplicate. I have tried to use DISTINCT, but sometimes I will have multiple ingredients and each one of those will return it's own "mabbr", which may be the same it's other ingredient (see first row of expected results). Using DISTINCT, it will only return one instance of that "mabbr".

Is there a change I can make to my query to achieve what I'd like to do?

SQL Fiddle

Pilferage answered 3/5, 2014 at 23:46 Comment(0)
F
7

You can resolve this by extracting the tag grouping to its own subquery:

SELECT
    recipe.*,
    GROUP_CONCAT(recipe_detail.ingredient_id) AS iid,
    GROUP_CONCAT(ingredient.name) AS iname,
    GROUP_CONCAT(ingredient_mfr.abbr) AS mabbr,
    (
      SELECT GROUP_CONCAT(recipe_tag.name)
        FROM recipe_tag
          INNER JOIN recipe_tagmap
            ON recipe_tagmap.tag_id = recipe_tag.id
        WHERE recipe_tagmap.recipe_id = recipe.id
     ) AS tag

  FROM recipe
    LEFT JOIN recipe_detail
      ON recipe.id = recipe_detail.recipe_id
    LEFT JOIN ingredient
      ON recipe_detail.ingredient_id = ingredient.id
    LEFT JOIN ingredient_mfr
      ON ingredient.mfr_id = ingredient_mfr.id

  WHERE recipe.user_id = 1
  GROUP BY recipe.id

(example fiddle)

Fernyak answered 4/5, 2014 at 3:40 Comment(3)
In your example, mabbr is still being duplicated more times than it should.Pilferage
+1 for getting a working answer. I've pulled it in from the link for you, because otherwise it should have been removed, especially as it did NOT match what your answer said. I've updated the JOIN used in the subquery to match what the optimizer is actually giving you. Note that for large result sets it may be more efficient to do (a modified version of) the tag2 subquery as a table-reference, instead of being in the SELECT list.Diocletian
Sub request is the solution but it is very slow if you have several line of recipe. GROUP_CONCAT(distinct stays faster.Orthodoxy
O
0

Subrequest with GROUP_CONCAT are slow when you have several line, I recommend you another solution (in case where you need to do on lot of data) :

For GROUP_CONCAT(...), if you want to keep distinct duplicate and keep fast SQL request, you can concatenate the id. And you will get something like:

GROUP_CONCAT(DISTINCT(ingredient.id || '\t' || ingredient_mfr.name))

But you need to parse the data after the request. You will have to split the string on \t.

Orthodoxy answered 25/4 at 8:14 Comment(0)
L
-2

Adding distinct while doing GROUP_CONCAT will give you unique values.

SELECT recipe.*, 
GROUP_CONCAT(distinct recipe_detail.ingredient_id) AS iid,  
GROUP_CONCAT(distinct ingredient.name) AS iname, 
GROUP_CONCAT(distinct ingredient_mfr.abbr) AS mabbr, 
GROUP_CONCAT(distinct recipe_tag.name) AS tag
FROM  recipe
LEFT JOIN recipe_detail
    ON recipe.id = recipe_detail.recipe_id
LEFT JOIN ingredient
    ON recipe_detail.ingredient_id = ingredient.id
LEFT JOIN ingredient_mfr
    ON ingredient.mfr_id = ingredient_mfr.id
LEFT JOIN recipe_tagmap
    ON recipe.id = recipe_tagmap.recipe_id
LEFT JOIN recipe_tag
    ON recipe_tagmap.tag_id = recipe_tag.id
WHERE recipe.user_id = 1
GROUP BY recipe.id

SQL Fiddle

Louls answered 17/4, 2019 at 11:18 Comment(1)
as OP said he can't use DISTINCT because some values have to be duplicatedDisposable

© 2022 - 2024 — McMap. All rights reserved.