SQL: Get Products from a category but also must be in another set of categories
Asked Answered
N

1

5

I am currently stuck in a situation. The scenario is this. I have products who may be associated with multiple categories. The data structure is shown below:

Products Table:
product_id  name 
1           Lemon
2           Kiwis
3           Cheese

Product to Categories Table

product_id   category_id
1            1
1            2
1            3
2            1
2            3
3            2
3            4

Category Table (not required in query however adding it here to help visualize what is happening)

category_id  name
1            Fruit
2            Yellow
3            Round
4            Dairy

What I'm struggling with here is that originally I want to get all products that are in the fruit category (category id 1) but I also want to check if a fruit is yellow. Keep in mind that yellow will not be the only filter, sometimes I will want to return yellow and orange fruit, however since cheese is yellow I can't return it since it isn't a fruit. However to make things a bit easier I always know that I am going to look in the fruit category as a base.

The database structure can not change as its an opencart database structure.

Here are my attempts:

SELECT GROUP_CONCAT(DISTINCT p2c2.category_id SEPARATOR ',') as categories 
FROM oc_product_to_category p2c 
LEFT JOIN oc_product p ON (p.product_id = p2c.product_id) 
LEFT JOIN oc_product_to_category p2c2 ON (p.product_id = p2c2.product_id) 
WHERE p2c.category_id IN ('1','2')

This kind of works except for that fact that it will return Cheese.

Notes: I use Group Concat because at the end of all of this my goal is to return not so much the products that match these categories but based on the filters I want to return another list of categories from the products that match this criteria. So:

Scenario:

Get Products that match category criteria Return categories of those products.

Any assistance will be greatly appreciated.

Nothing answered 15/7, 2013 at 19:49 Comment(0)
M
7

This type of problem is called relational division.

There are two common solutions:

  1. First solution strings together the matching categories and compares to a fixed string:

    SELECT p2c.product_id
    FROM oc_product_to_category p2c
    GROUP BY p2c.product_id
    HAVING GROUP_CONCAT(p2c.category_id SEPARATOR ',' ORDER BY p2c.category_id) = '1,2'
    
  2. Second solution does a JOIN for each required value:

    SELECT p.product_id 
    FROM oc_product p 
    INNER JOIN oc_product_to_category p2c1 
      ON (p.product_id = p2c1.product_id AND p2c1.category_id = 1) 
    INNER JOIN oc_product_to_category p2c2 
      ON (p.product_id = p2c2.product_id AND p2c2.category_id = 2) 
    

I cover these solutions in my presentation SQL Query Patterns, Optimized. I found in my tests that the join solution is much better for performance.


@Tom's suggestion is right, here's what that would look like in a complete query:

    SELECT p.product_id, GROUP_CONCAT(p2c3.category_id SEPARATOR ',') AS categories
    FROM oc_product p 
    INNER JOIN oc_product_to_category p2c1 
      ON (p.product_id = p2c1.product_id AND p2c1.category_id = 1) 
    INNER JOIN oc_product_to_category p2c2 
      ON (p.product_id = p2c2.product_id AND p2c2.category_id = 2) 
    INNER JOIN oc_product_to_category p2c3
      ON (p.product_id = p2c3.product_id)
    GROUP BY p.product_id;

The DISTINCT that @Tom suggests shouldn't be necessary, because your p2c table should have a UNIQUE constraint over (product_id, category_id).

Monovalent answered 15/7, 2013 at 20:1 Comment(5)
Hi Bill, firstly thanks for your time, I'd assume these just return products that have these category id's. The second part of the question asks based on the products id returned will I be able to get all the categories of that product?Nothing
^ In the same query (sry) The reason for this as I have many products and doing this in 2 queries will put lots of strain on the server i believe.Nothing
If I were you, I would take Bill's 2nd solution, add a separate join to the oc_product_to_category table and then change the select to what you have in your query: SELECT GROUP_CONCAT(DISTINCT p2c.category_id SEPARATOR ',') as categories.Drear
Well, that first solution requires the product to have no other categories then the ones searched (in this case: it must have 'Fruit' and 'Yellow', but not also have 'Sweet' for instance). I have no trouble believing the second one is faster, however, omitting a WHERE category_id IN(...) will make the first one even slower, and incorrect.Calcic
Fwiw: if I rewrite the first solution from GROUP BY p2c.product_id HAVING GROUP_CONCAT(p2c.category_id SEPARATOR ',' ORDER BY p2c.category_id) = '1,2' to WHERE p2c.category_id IN (1,2) GROUP BY p2c.product_id HAVING COUNT(p2c.category_id)=2, it quite consistently scores in relation to the second option (1) better, but random, at 1 category (for which this should not be used of course ;) ), (2) about 9% better at 2 categories (3) about 10% worse at 3 categories (4) about 50% worse at 9 categories, and it trails of there. So yes, the 2nd one is better with multiple categories.Calcic

© 2022 - 2024 — McMap. All rights reserved.