How to join two tables using a comma-separated-list in the join field
Asked Answered
P

5

13

I have two tables, categories and movies.

In movies table I have a column categories. That column consists of the categories that movie fits in. The categories are IDs separated by a comma.

Here's an example:

Table categories {
  -id-       -name-
  1          Action
  2          Comedy
  4          Drama
  5          Dance
}

Table movies {
  -id-       -categories-  (and some more columns ofc)
  1          2,4
  2          1,4
  4          3,5
}

Now to the actual question: Is it possible to perform a query that excludes the categories column from the movies table, and instead selects the matching categories from the categories table and returns them in an array? Like a join, but the problem is there are multiple categories separated by comma, is it possible to do some kind of regex?

Petrolic answered 28/7, 2011 at 15:7 Comment(8)
Why aren't you normalizing to three tables; Movies, Categories, MoviesCategories?Amalee
What do you mean by saying return them in an array?Computation
Can you improve on the question title? A list of technologies does not describe the problem. That's what tags are.Arabic
@Brad, that is a really good idea. I didn't think of that when creating the database. It would have made it a lot easier for me. How about the speed? Would it be a lot more processor heavy if i seperate them into a third table?Petrolic
@Johan: I was hoping the OP could gain the experience of doing it herself!Arabic
@Katie: It will be vastly more efficient (in general), because that is how relational databases are designed to work.Arabic
@Tomale: Great! I've separated the tables so now i have "movies", "categories" and "movies_categories". I'm having some trouble writing the query though. It almost works as i want it now, the only problem is that i want to get all categories, not just the first one. Is it possible to concatenate the categories by a comma? Here's my query: SELECT movies.*, categories.name FROM movies LEFT JOIN movies_categories ON (movies.id = movies_categories.movie_id) LEFT JOIN categories ON (movies_categories.category_id = categories.id)Petrolic
And also, for some reason i'm not able to add a WHERE clause which i definately need. Any ideas why?Petrolic
S
13

Using comma separated lists in a database field is an anti-pattern and should be avoided at all costs.
Because it is a PITA to extract those comma separated values out agian in SQL.

Instead you should add a separate link table to represent the relationship between categories and movies, like so:

Table categories
  id integer auto_increment primary key
  name varchar(255)

Table movies
  id integer auto_increment primary key
  name varchar(255)

Table movie_cat
  movie_id integer foreign key references movies.id
  cat_id integer foreign key references categories.id
  primary key (movie_id, cat_id)

Now you can do

SELECT m.name as movie_title, GROUP_CONCAT(c.name) AS categories FROM movies m
INNER JOIN movie_cat mc ON (mc.movie_id = m.id)
INNER JOIN categories c ON (c.id = mc.cat_id)
GROUP BY m.id

Back to your question
Alternativly using your data you can do

SELECT m.name as movie_title
  , CONCAT(c1.name, if(c2.name IS NULL,'',', '), ifnull(c2.name,'')) as categories 
FROM movies m
LEFT JOIN categories c2 ON 
 (replace(substring(substring_index(m.categories, ',', 2),
  length(substring_index(m.categories, ',', 2 - 1)) + 1), ',', '') = c2.id)
INNER JOIN categories c1 ON 
 (replace(substring(substring_index(m.categories, ',', 1), 
  length(substring_index(m.categories, ',', 1 - 1)) + 1), ',', '') = c1.id)

Note that the last query only works if there are 2 or fewer categories per movie.

Spot answered 28/7, 2011 at 15:30 Comment(4)
@Spot I have this code working great but i would like to do more than 2. What would I have to change?Surrogate
What's movie_cat.id for?Arabic
@LightnessRacesinOrbit, every table must have a primary key. (or suffer all kinds of handicaps).Spot
@Johan: That doesn't mean you have to create a meaningless and arbitrary auto-incrementing integer field for it. Put the primary key on (movie_id,cat_id) for crying out loud! This movie_cat.id is 100% pointless.Arabic
C
21
select
    m.id,
    group_concat(c.name)
from
    movies m
    join categories c on find_in_set(c.id, m.categories)
group by
    m.id

The output should be something like this:

Table movies {
  -id-       -categories-
  1          Comedy,Drama
  2          Action,Drama
  4          Other,Dance
}
Computation answered 28/7, 2011 at 15:19 Comment(2)
There is at least one syntax error here. I think if you add the comma after movies m and change the ON to a WHERE (you're not using written-out JOIN syntax) then you'll get that output. But I can't see where this "Other" has come from?Arabic
@Tomalak It was a typo. I forgot to add join. I don't like to write join conditions in WHERE clause. Other is just an example, it depends on categories table.Computation
S
13

Using comma separated lists in a database field is an anti-pattern and should be avoided at all costs.
Because it is a PITA to extract those comma separated values out agian in SQL.

Instead you should add a separate link table to represent the relationship between categories and movies, like so:

Table categories
  id integer auto_increment primary key
  name varchar(255)

Table movies
  id integer auto_increment primary key
  name varchar(255)

Table movie_cat
  movie_id integer foreign key references movies.id
  cat_id integer foreign key references categories.id
  primary key (movie_id, cat_id)

Now you can do

SELECT m.name as movie_title, GROUP_CONCAT(c.name) AS categories FROM movies m
INNER JOIN movie_cat mc ON (mc.movie_id = m.id)
INNER JOIN categories c ON (c.id = mc.cat_id)
GROUP BY m.id

Back to your question
Alternativly using your data you can do

SELECT m.name as movie_title
  , CONCAT(c1.name, if(c2.name IS NULL,'',', '), ifnull(c2.name,'')) as categories 
FROM movies m
LEFT JOIN categories c2 ON 
 (replace(substring(substring_index(m.categories, ',', 2),
  length(substring_index(m.categories, ',', 2 - 1)) + 1), ',', '') = c2.id)
INNER JOIN categories c1 ON 
 (replace(substring(substring_index(m.categories, ',', 1), 
  length(substring_index(m.categories, ',', 1 - 1)) + 1), ',', '') = c1.id)

Note that the last query only works if there are 2 or fewer categories per movie.

Spot answered 28/7, 2011 at 15:30 Comment(4)
@Spot I have this code working great but i would like to do more than 2. What would I have to change?Surrogate
What's movie_cat.id for?Arabic
@LightnessRacesinOrbit, every table must have a primary key. (or suffer all kinds of handicaps).Spot
@Johan: That doesn't mean you have to create a meaningless and arbitrary auto-incrementing integer field for it. Put the primary key on (movie_id,cat_id) for crying out loud! This movie_cat.id is 100% pointless.Arabic
A
4

Brad is right; normalisation is the solution. Normalisation exists to solve this problem. It should be covered pretty well in your MySQL book if it's worth its salt.


If you really insist, though, you can fake the direct join by cross-matching with FIND_IN_SET (which conveniently expects a comma-delimited string of items).

Now, MySQL can't return "an array" — that's what sets of results are for — but it can give you the category names separated by, say, a pipe (|):

SELECT
       `m`.`id`,
       `m`.`name`,
       GROUP_CONCAT(`c`.`name` SEPARATOR "|") AS `cats`
  FROM
       `movies`     AS `m`,
       `categories` AS `c`
 WHERE
       FIND_IN_SET(`c`.`id`, `m`.`categories`) != 0
 GROUP BY
       `m`.`id`;

Result:

id  "name"     "cats"
---------------------------------------------------
1   "Movie 1"  "Comedy|Drama"
2   "Movie 2"  "Action|Drama"
4   "Movie 4"  "Dance"
Arabic answered 28/7, 2011 at 15:15 Comment(2)
How we can join more than one table in this scenario ?Shrink
@GaneshAher: That's going to be tough, which is another reason that this is the wrong tool for the job! Use a join.Arabic
H
0

Try This

SELECT m.*, c.* FROM movies m 
RIGHT JOIN categories c on find_in_set(c.id, m.categories) 
GROUP BY m.id
Hannahhannan answered 12/1, 2022 at 6:7 Comment(0)
L
-1

This isn't directly answering your question but what you have in the movies table is really bad.

Instead of combining categories using comma, what you should be doing is to have each category on separate rows, eg:

Table movies {
  -id-       -categories-
  1          2
  1          4
  2          1
  2          4
  4          3
  4          5
}
Loutish answered 28/7, 2011 at 15:12 Comment(1)
No, this is bad too. The movie-category relation should be in a separate table; you've just duplicated all of the "other columns" in movies.Arabic

© 2022 - 2024 — McMap. All rights reserved.