Return distinct pairs of names which have the same exact items in column
Asked Answered
C

4

7

I want to find the distinct pairs of names in the table which have the same exact items in the items column. For instance:

CREATE TABLE t
(
    name    VARCHAR(255),
    item    VARCHAR(255)
);

INSERT INTO t VALUES("Alice", "Orange");
INSERT INTO t VALUES("Alice", "Pear");
INSERT INTO t VALUES("Alice", "Lemon");
INSERT INTO t VALUES("Bob", "Orange");
INSERT INTO t VALUES("Bob", "Pear");
INSERT INTO t VALUES("Bob", "Lemon");
INSERT INTO t VALUES("Charlie", "Pear");
INSERT INTO t VALUES("Charlie", "Lemon");

The answer here would be Alice,Bob because they took the exact same items.

I want to do it with double negation (using NOT EXISTS/NOT IN) only which I think is more well-suited to this question, but I couldn't come up with anything that is remotely close to being functional.

This is somewhat similar to this question but I'm using SQLite so I cannot use GROUP_CONCAT() but I was wondering how it would be done using relational division using NOT EXISTS/NOT IN.

Crossindex answered 4/11, 2015 at 4:49 Comment(8)
I made you an SQLFiddle to play around with here ~ sqlfiddle.com/#!5/b70cdSpicebush
How many different items can you have in your table?Hewes
@TimBiegeleisen As much as you want. I don't suppose it affects anything, as long as it can still return pairs that contain the exact same set of items.Crossindex
SQLite does have group_concat... sqlite.org/lang_aggfunc.htmlChrysolite
@Chrysolite okay my bad, but I would really like to get some hints on solving it without using group concat...Crossindex
I was working on a solution which involves creating a pivot of your table. But it won't be suitable if you can really have an arbitrarily large number of items. If I were you, I'd go with the GROUP_CONCAT option.Hewes
@TimBiegeleisen This is part of an assignment, and for the purposes of this question there is no need for arbitrarily large number of items.Crossindex
if u are interested specific common group of items, you can use this query select T1.name FROM (select name , count() as count from t where t.item in ("Orange", "Pear", "Lemon") group by name ) T1 inner join ( select name , count() as count from t where t.item in ("Orange", "Pear", "Lemon") group by name ) T2 on T1.name != T2.name and T1.count = T2.countBucci
H
1

With compound queries:

SELECT t1.name, t2.name
FROM t AS t1, t AS t2
GROUP BY t1.name, t2.name
HAVING t1.name < t2.name
   AND NOT EXISTS (SELECT item FROM t WHERE name = t1.name
                   EXCEPT
                   SELECT item FROM t WHERE name = t2.name)
   AND NOT EXISTS (SELECT item FROM t WHERE name = t2.name
                   EXCEPT
                   SELECT item FROM t WHERE name = t1.name);

Using NOT IN is possible, bit expresses exactly the same mechanism with more complexity:

SELECT t1.name, t2.name
FROM t AS t1, t AS t2
GROUP BY t1.name, t2.name
HAVING t1.name < t2.name
   AND NOT EXISTS (SELECT item
                   FROM t
                   WHERE name = t1.name
                     AND item NOT IN (SELECT item
                                      FROM t
                                      WHERE name = t2.name))
   AND NOT EXISTS (SELECT item
                   FROM t
                   WHERE name = t2.name
                     AND item NOT IN (SELECT item
                                      FROM t
                                      WHERE name = t1.name));
Henleyonthames answered 4/11, 2015 at 12:39 Comment(0)
O
2

To get the number of common items between all pairs of names you can use the following query:

SELECT t1.name AS name1, t2.name AS name2, COUNT(*) AS cnt
FROM t AS t1
INNER JOIN t AS t2 ON t1.item = t2.item AND t1.name < t2.name
GROUP BY t1.name, t2.name

Output:

name1   name2       cnt
------------------------
Alice   Bob         3
Alice   Charlie     2
Bob     Charlie     2

Now all you want is to filter out (name1, name2) pairs having a count that is not equal to the number of items of name1 and name2. You can do this using a HAVING clause with correlated subqueries:

SELECT t1.name AS name1, t2.name AS name2
FROM t AS t1
INNER JOIN t AS t2 ON t1.item = t2.item AND t1.name < t2.name
GROUP BY t1.name, t2.name
HAVING COUNT(*) = (SELECT COUNT(*) FROM t WHERE name = t1.name) AND 
       COUNT(*) = (SELECT COUNT(*) FROM t WHERE name = t2.name)

Demo here

Obtrude answered 4/11, 2015 at 6:33 Comment(0)
H
1

With compound queries:

SELECT t1.name, t2.name
FROM t AS t1, t AS t2
GROUP BY t1.name, t2.name
HAVING t1.name < t2.name
   AND NOT EXISTS (SELECT item FROM t WHERE name = t1.name
                   EXCEPT
                   SELECT item FROM t WHERE name = t2.name)
   AND NOT EXISTS (SELECT item FROM t WHERE name = t2.name
                   EXCEPT
                   SELECT item FROM t WHERE name = t1.name);

Using NOT IN is possible, bit expresses exactly the same mechanism with more complexity:

SELECT t1.name, t2.name
FROM t AS t1, t AS t2
GROUP BY t1.name, t2.name
HAVING t1.name < t2.name
   AND NOT EXISTS (SELECT item
                   FROM t
                   WHERE name = t1.name
                     AND item NOT IN (SELECT item
                                      FROM t
                                      WHERE name = t2.name))
   AND NOT EXISTS (SELECT item
                   FROM t
                   WHERE name = t2.name
                     AND item NOT IN (SELECT item
                                      FROM t
                                      WHERE name = t1.name));
Henleyonthames answered 4/11, 2015 at 12:39 Comment(0)
A
0

I might have found a solution to your issue. Mine was tested using MySQL, but it's not using GROUP_CONCAT(). It might work for your SQLite database. My query is used to find people who have bought the same exact items.

Try using this statement: SELECT DISTINCT e1.name, e2.name from t e1, t e2 WHERE e1.item=e2.item AND e1.name != e2.name GROUP BY e1.item HAVING count(*) >1;

https://gyazo.com/5e5e9d0ddfb33cb47439a674297108ed

Accusal answered 4/11, 2015 at 5:44 Comment(0)
M
0

This seems to be working with SQLLite

    select t1.name
    from t t1
        join t t2 on t1.name <> t2.name and t1.item = t2.item 
        join (select name, count(*) as cnt from t group by name) t3 on t3.name = t1.name
        join (select name, count(*) as cnt from t group by name) t4 on t4.name = t2.name
    group by t1.name, t3.cnt, t4.cnt
    having count(*) = max(t3.cnt, t4.cnt)
Makepeace answered 4/11, 2015 at 5:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.