What's the difference between "where" clause and "on" clause when table left join?
Asked Answered
P

6

29

SQL1:

select t1.f1,t2.f2 
from t1 
   left join t2 on t1.f1 = t2.f2 and t1.f2=1 and t1.f3=0 

SQL2:

select t1.f1,t2.f2 
from t1 
  left join t2 on t1.f1 = t2.f2 
where t1.f2=1 and t1.f3=0

The difference is where and on clause, is it returning same result? and what's the difference? does DBMS run them in same way? Thanks.

Phosphorite answered 29/11, 2011 at 12:58 Comment(3)
Have you read all of them: stackoverflow.com/search?q=join+where?Libertarian
ok ,I will read it ,I don't know how to express the subject in normal English.Phosphorite
I explained in brief below in my postsWeigel
R
26

The where clause applies to the whole resultset; the on clause only applies to the join in question.

In the example supplied, all of the additional conditions related to fields on the inner side of the join - so in this example, the two queries are effectively identical.

However, if you had included a condition on a value in the table in the outer side of the join, it would have made a significant difference.

You can get more from this link: http://ask.sqlservercentral.com/questions/80067/sql-data-filter-condition-in-join-vs-where-clause

For example:

select t1.f1,t2.f2 from t1 left join t2 on t1.f1 = t2.f2 and t2.f4=1

select t1.f1,t2.f2 from t1 left join t2 on t1.f1 = t2.f2 where t2.f4=1

- do different things - the former will left join to t2 records where f4 is 1, while the latter has effectively been turned back into an inner join to t2.

Rosenfeld answered 29/11, 2011 at 13:2 Comment(3)
To complete the picture: if the where clause would reference columns from t2, then it would turn the outer join into an inner joinSummand
@a_horse_with_no_name: GMTA. :)Rosenfeld
That link is now dead, sadlyBolivia
M
8

The first query is quicker than the second one as the join condition is more specific than the second one: it does not makes sense to return records that you will filter with the where clause (it would be better do not return them at all- query1)

Anyway it really depends by the query optimizer.

have a look at the below:

Is a JOIN faster than a WHERE?

Midget answered 29/11, 2011 at 13:2 Comment(5)
Not necessarily - it depends on the database's optimiser. Many of them will optimise these to identical queries.Rosenfeld
A decent optimizer will most probably create the same execution plan for both statements, so I doubt there will be a performance difference.Summand
you guys are both right but it really depend by the Optimizer #1130423Midget
@MassimilianoPeluso: I suggest adding that link to your answer - I think it would make it more useful.Rosenfeld
It is much more important to focus on the potential difference in semantics. Suggesting that one approach is faster than the other, when in fact it might just be plain wrong is bad advice.Botvinnik
B
5

It is important to understand the logical order of SQL operations when thinking about SQL syntax. JOIN is an operator (and ON belongs to the relevant JOIN) in the FROM clause. The FROM clause is the first operation to be executed logically (optimisers can still choose to reorder things).

In your example, there isn't really a difference, but it is easy to construct one, as I've shown in this blog post about the difference between ON and WHERE in OUTER JOIN (the example from the blog post uses the Sakila database):

First query

SELECT a.actor_id, a.first_name, a.last_name, count(fa.film_id)
FROM actor a
LEFT JOIN film_actor fa ON a.actor_id = fa.actor_id
WHERE fa.film_id < 10
GROUP BY a.actor_id, a.first_name, a.last_name
ORDER BY count(fa.film_id) ASC;

Yields:

ACTOR_ID  FIRST_NAME  LAST_NAME  COUNT
--------------------------------------
194       MERYL       ALLEN      1
198       MARY        KEITEL     1
30        SANDRA      PECK       1
85        MINNIE      ZELLWEGER  1
123       JULIANNE    DENCH      1

Because we filtered the outer joined table in the WHERE clause, the LEFT JOIN was effectively turned into an INNER JOIN. Why? Because if we had an actor that didn't play in a film, that actor's only row would have fa.film_id IS NULL, and the fa.film_id < 10 predicate would thus yield NULL. Such actors are excluded from the result, just as with an INNER JOIN.

Second query

SELECT a.actor_id, a.first_name, a.last_name, count(fa.film_id)
FROM actor a
LEFT JOIN film_actor fa ON a.actor_id = fa.actor_id
  AND fa.film_id < 10
GROUP BY a.actor_id, a.first_name, a.last_name
ORDER BY count(fa.film_id) ASC;

Yields:

ACTOR_ID  FIRST_NAME  LAST_NAME     COUNT
-----------------------------------------
3         ED          CHASE         0
4         JENNIFER    DAVIS         0
5         JOHNNY      LOLLOBRIGIDA  0
6         BETTE       NICHOLSON     0
...
1         PENELOPE    GUINESS       1
200       THORA       TEMPLE        1
2         NICK        WAHLBERG      1
198       MARY        KEITEL        1

Now, the actors without films are included in the result, because the fa.film_id < 10 predicate is part of the LEFT JOIN's ON predicate

Conclusion

Always place predicates where they make most sense logically.

  • Are they part of your JOIN operation? Place them in ON
  • Are they filters on your entire JOIN product? Place them in WHERE
Botvinnik answered 5/3, 2020 at 16:34 Comment(0)
B
3

The two queries are NOT identical.

Mark Bannister was right in pointing out that the where clause is applied to the whole result set but the on clause applies to the join.

In your case, for SQL 1 LEFT JOIN conditions filter joins on the right but the left side is always returned before any WHERE filtering. Since there are no WHERE conditions all of t1 is always returned.

In SQL 2, the LEFT JOIN conditions filter some results showing up on the right but again all t1 is returned. But this time the WHERE conditions may filter some records of t1 away.

INSERT INTO `t1` (`f1`,`f2`,`f3`) VALUES (1,1,1); INSERT INTO `t2` (`f3`) VALUES (1);

Since they point to different logic the query must be written based on that and it gives us great power and flexibility.

An INNER JOIN however returns the same result so yes check the optimiser.

Bobbobb answered 19/4, 2018 at 7:1 Comment(0)
W
2

The relational algebra allows interchangeability of the predicates in the WHERE clause and the INNER JOIN, so even INNER JOIN queries with WHERE clauses can have the predicates rearrranged by the optimizer so that they may already be excluded during the JOIN process.

I recommend you write the queries in the most readble way possible.

Sometimes this includes making the INNER JOIN relatively "incomplete" and putting some of the criteria in the WHERE simply to make the lists of filtering criteria more easily maintainable.

You can get more from this link: http://ask.sqlservercentral.com/questions/80067/sql-data-filter-condition-in-join-vs-where-clause

For example, instead of:

SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
    ON ca.CustomerID = c.CustomerID
    AND c.State = 'NY'
INNER JOIN Accounts a
    ON ca.AccountID = a.AccountID
    AND a.Status = 1

Write:

SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
    ON ca.CustomerID = c.CustomerID
INNER JOIN Accounts a
    ON ca.AccountID = a.AccountID
WHERE c.State = 'NY'
    AND a.Status = 1

But it depends, of course.

Weigel answered 29/11, 2011 at 13:29 Comment(2)
Most RDBMS products will optimize both queries identically. In "SQL Performance Tuning" by Peter Gulutzan and Trudy Pelzer, they tested multiple brands of RDBMS and found no performance difference. I prefer to keep join conditions separate from query restriction conditions. If you're using OUTER JOIN sometimes it's necessary to put conditions in the join clause.Weigel
While it is true that predicates can be moved between INNER JOIN .. ON and WHERE, the OP's question is using LEFT JOIN, where the two predicate locations are definitely not the same thing.Botvinnik
W
1

1)

SQL1: select t1.f1,t2.f2 from t1 left join t2 on t1.f1 = t2.f2 **and** t1.f2=1 and t1.f3=0 

In this, parser will check each row of t1 with each row of t2 with these 3 conditions. Getting faster result.

2) SQL2: select t1.f1,t2.f2 from t1 left join t2 on t1.f1 = t2.f2 **where** t1.f2=1 and t1.f3=0

In this, join only take 1st condition and then the result got from join is filtered with those 2 conditions. And will take more time than 1st query.

You can get more from this link: http://ask.sqlservercentral.com/questions/80067/sql-data-filter-condition-in-join-vs-where-clause

Weigel answered 29/11, 2011 at 13:3 Comment(5)
Any decent optimizer will treat both statements exactly the same way (Oracle and PostgreSQL do)Summand
Essentially the same answer as Massimiliano, and therefore not necessarily true for the same reason.Rosenfeld
See Mark's comment above; performance is dependent on the database provider. However, the output should be the same for this example for all modern RDBMS.Vinita
Most RDBMS products will optimize both queries identically. In "SQL Performance Tuning" by Peter Gulutzan and Trudy Pelzer, they tested multiple brands of RDBMS and found no performance difference. I prefer to keep join conditions separate from query restriction conditions. If you're using OUTER JOIN sometimes it's necessary to put conditions in the join clause.Weigel
The main difference in the two queries is semantic. Any potential performance difference (difficult to claim generically) is irrelevant if the two queries don't do the same thing.Botvinnik

© 2022 - 2024 — McMap. All rights reserved.