How do I replace NOT EXISTS with JOIN?
Asked Answered
U

3

7

I've got the following query:

select distinct a.id, a.name
from Employee a
join Dependencies b on a.id = b.eid
where not exists 
    ( 
select * 
    from Dependencies d 
    where b.id = d.id 
    and d.name  = 'Apple'
    )
and exists 
    (
    select * 
    from Dependencies c 
    where b.id = c.id 
    and c.name  = 'Orange'
    );

I have two tables, relatively simple. The first Employee has an id column and a name column The second table Dependencies has 3 column, an id, an eid (employee id to link) and names (apple, orange etc).

the data looks like this Employee table looks like this

id  | name
-----------
1   | Pat
2   | Tom
3   | Rob
4   | Sam

Dependencies

id  | eid | Name
--------------------
1   | 1   |  Orange
2   | 1   |  Apple
3   | 2   |  Strawberry
4   | 2   |  Apple
5   | 3   |  Orange
6   | 3   |  Banana

As you can see Pat has both Orange and Apple and he needs to be excluded and it has to be via joins and i can't seem to get it to work. Ultimately the data should only return Rob

Underclassman answered 10/6, 2014 at 15:15 Comment(1)
Write your requirements in plain English. I had to read your code in order to try to guess what are your requirements.Preceptor
D
14

Inner join with the name you want, left join on the name you dont, then use where to ensure the left join fails to match, like so (SQL Fiddle):

select distinct a.id, a.name
from Employee a
  inner join Dependencies b on a.id = b.eid
    and b.name = 'Orange'
  left join Dependencies c on ( a.id = c.eid
    and c.name = 'Apple')
where c.id is null;
Dialyze answered 10/6, 2014 at 15:21 Comment(5)
Is it okay to have multiple joins on Dependencies with various aliases? and i'm not sure what c.id is null means?Underclassman
Yes, you'll end up with a table (or likely index) spool in the query plan (a redundancy operation when a table is used twice, but you were using it 3 times before), but the operation is perfectly legal. As to the id is null - the left join fills the columns of C with null when the condition fails to match, rather than eliminate rows. Thus, c.id is null when the join condition fails. Go look up the difference between different types of joins. Hope this helps.Dialyze
@Underclassman So write it this way... "SELECT DISTINCT a.id a_id,a.name,c.id c_id from... where 1=1"Mcneely
As a side note: select a.x from a left join b on a.x = b.x where b.x is null is the idiom sometimes used in MySQL to simulate the minus operator, which is not implemented in MySQL.Wartburg
@stawberry please don't encourage that where 1=1 crap. There is no reason to ever add that to a query.Barrelchested
C
1

Two joins to Dependencies will be needed, as there are 2 tests. Neglecting performance for a moment, you could try to improve the understandability of the joins by naming the aliases, e.g.:

SELECT DISTINCT e.ID, e.Name
   FROM Employee e
   LEFT OUTER JOIN Dependencies withApple
      ON withApple.eid = e.id
      AND withApple.Name = 'Apple'
   LEFT OUTER JOIN Dependencies withOrange
      ON withOrange.eid = e.id
      AND withOrange.Name = 'Orange'
   WHERE
      withApple.id IS NULL -- Don't want this
      AND
      withOrange.id IS NOT NULL -- Do want this.

SqlFiddle

Capsule answered 10/6, 2014 at 15:36 Comment(3)
Though it works, I think it is actually worse. Using left outer join only makes sense if you have to deal with missing rows or you want to calculate the difference. Since id is the column you join on, the condition id is not null is pointless. And besides, the comments "do want this" and "don't want this" do not help either. Shouldn't they be the other way round? As the case may be, it just proves the point.Wartburg
You are right - the naming of the aliases preempted the filtering and conveyed a confusing message. The idea behind LOJ as opposed to IJ was to try and delay the filtering to the WHERE clause, and to confer the same symmetry to the unwanted apple as to the wanted Orange. But you are right, this isn't good Sql, the intention was just an interim stepping stone which hopefully would click something in the OP's understanding.Capsule
Yeah, ironically, in my opinion the original query with exists subqueries (except the useless join) conveyed the intention better than any proposed solution with joins, so I wouldn't even try to change that and leave it as it is. :)Wartburg
E
0

Another version, similar to Jaaz Cole's one is:

select distinct a.id, a.name
   from Employee a
inner join Dependencies b on a.id = b.eid
   and b.name = 'Orange'
left join Dependencies c on a.id = c.eid
where (c.id is null or c.name != 'Apple');

In substance this comes from logics: The NEGATION(A & B) = NEGATION(A) OR NEGATION(B)

Epididymis answered 15/12, 2022 at 14:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.