SQL selecting rows where one column's value is common across another criteria column
Asked Answered
F

4

15

I have a cross reference table that looks like this:

id  document_id  subject_id
1   8            21
2   5            17
3   5            76
4   7            88
5   9            17
6   9            76
7   2            76

It matches documents to subjects. Documents can be members of more than one subject. I want to return rows from this table where a given document matches all the subjects in a given set. For example, given the set of subjects:

(17,76)

I want to return only rows for documents which match all the subjects in that set (at least) somewhere in the cross reference table. The desired output set given the above set would be:

id  document_id  subject_id
2   5            17
3   5            76
5   9            17
6   9            76

Notice that the last row of the table is not returned because that document only matches one of the required subjects.

What is the simplest and most efficient way to query for this in SQL?

Forgiven answered 10/9, 2009 at 22:54 Comment(2)
It'd be great to know how you are providing the parameters to the query. I see one answer, while perfectly fine, it will only work for exactly 2 values in the parameter set. If you can limit the number of parameters, to, say, 10 max, then it's one conversation. If you need the app to be flexible, then the suggestions will be different.Majestic
Thanks, the input is basically "pick any number of subjects" so the set of subject ids can grow as big as the number of subjects (in theory).Forgiven
C
30

I assume that the natrual key of this table is document_id + subject_id, and that id is a surrogate; IOW, document_id and subject_id are unique. As such, I'm just going to pretend it doesn't exist and that a unique constraint is on the natural key.

Let's start with the obvious.

SELECT document_id, subject_id
  FROM document_subjects
 WHERE subject_id IN (17,76)

That gets you everything you want plus stuff you don't want. So all we need to do is filter out the other stuff. The "other stuff" is groups of rows having a count that is not equal to the count of the desired subjects.

SELECT document_id
  FROM document_subjects
 WHERE subject_id IN (17,76)
 GROUP BY document_id
HAVING COUNT(*) = 2

Note that subject_id is removed because it doesn't participate in grouping. Taking this one step further, i'm going to add an imaginary table called subjects_i_want that contains N rows of subjects you want.

SELECT document_id
  FROM document_subjects
 WHERE subject_id IN (SELECT subject_id FROM subjects_i_want)
 GROUP BY document_id
HAVING COUNT(*) = (SELECT COUNT(*) FROM subjects_i_want)

Obviously subjects_i_want could be swapped out for another subquery, temporary table, or whatever. But, once you have this list of document_id, you can use it within a subselect of a bigger query.

SELECT document_id, subject_id, ...
  FROM document_subjects
 WHERE document_id IN(
        SELECT document_id
          FROM document_subjects
          WHERE subject_id IN (SELECT subject_id FROM subjects_i_want)
          GROUP BY document_id
         HAVING COUNT(*) = (SELECT COUNT(*) FROM subjects_i_want))

Or whatever.

Crape answered 10/9, 2009 at 23:16 Comment(2)
+1 very nice, Alex. I've noticed a few variations of this question lately, and this is the most clearly presented general solution I've seen so far.Masorete
+1, very nce and helped me, it would be better if the count() performed i Having would be on distinct entries, as it would remove the possibility of duplicate data being also considered; preferably COUNT(DISTINCT subject_id) instead of COUNT()Deerdre
T
2

Using Oracle (or any database that allows the with clause). This allows definition of the subject_id values exactly once.

with t as (select distinct document_id from table1 where subject_id in (17,76) )
select document_id from table1 where subject_id in (select subject_id from t)
group by document_id 
having count(*) = (select count (*) from t);
Tomika answered 9/6, 2015 at 20:4 Comment(1)
I found this answer to be the most helpful as it applies to PostgreSQL too.Saintmihiel
L
1

That's a very interesting question.

I'm assuming you would like a more generalized query, but this is what I would do in the case where you always have the same number of subjects (say two):

 SELECT T.id, T.document_id, T.subject_id
   FROM table T
        INNER JOIN table T1 ON T.document_id = T1.document_id AND T1.subject_ID = 17
        INNER JOIN table T2 ON T.document_id = T2.document_id AND T2.subject_ID = 76            

Of course, you could add yet another INNER JOIN to add another subject ID.. But I admit it's not a very good general solution.

Locomotion answered 10/9, 2009 at 23:4 Comment(1)
D'oh, I'm indeed looking for a solution that could match an arbitrary number of subjects.Forgiven
M
0
select document_id from table1
 where subject_id in (17, 76)
 group by document_id
having count(distinct subject_id) = 2
Modestomodesty answered 10/9, 2009 at 23:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.