MySQL JOIN ON vs USING?
Asked Answered
R

7

360

In a MySQL JOIN, what is the difference between ON and USING()? As far as I can tell, USING() is just more convenient syntax, whereas ON allows a little more flexibility when the column names are not identical. However, that difference is so minor, you'd think they'd just do away with USING().

Is there more to this than meets the eye? If yes, which should I use in a given situation?

Recoil answered 6/7, 2012 at 16:16 Comment(3)
There's also NATURAL JOIN: #8696883Dyna
Note that using has another usage besides joins. See https://mcmap.net/q/66477/-using-keyword-vs-on-clause-mysql-duplicateFlacon
Does this answer your question? What's the difference between "using" and "on" in table joins in MySQL?Atonement
U
575

It is mostly syntactic sugar, but a couple differences are noteworthy:

ON is the more general of the two. One can join tables ON a column, a set of columns and even a condition. For example:

SELECT * FROM world.City JOIN world.Country ON (City.CountryCode = Country.Code) WHERE ...

USING is useful when both tables share a column of the exact same name on which they join. In this case, one may say:

SELECT ... FROM film JOIN film_actor USING (film_id) WHERE ...

An additional nice treat is that one does not need to fully qualify the joining columns:

SELECT film.title, film_id -- film_id is not prefixed
FROM film
JOIN film_actor USING (film_id)
WHERE ...

To illustrate, to do the above with ON, we would have to write:

SELECT film.title, film.film_id -- film.film_id is required here
FROM film
JOIN film_actor ON (film.film_id = film_actor.film_id)
WHERE ...

Notice the film.film_id qualification in the SELECT clause. It would be invalid to just say film_id since that would make for an ambiguity:

ERROR 1052 (23000): Column 'film_id' in field list is ambiguous

As for select *, the joining column appears in the result set twice with ON while it appears only once with USING:

mysql> create table t(i int);insert t select 1;create table t2 select*from t;
Query OK, 0 rows affected (0.11 sec)

Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

Query OK, 1 row affected (0.19 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> select*from t join t2 on t.i=t2.i;
+------+------+
| i    | i    |
+------+------+
|    1 |    1 |
+------+------+
1 row in set (0.00 sec)

mysql> select*from t join t2 using(i);
+------+
| i    |
+------+
|    1 |
+------+
1 row in set (0.00 sec)

mysql>
Undercoating answered 6/7, 2012 at 17:32 Comment(5)
+1 Nice answer on the syntactical difference. I'm curious about performance differences, if any. I imagine USING interprets to ON.Touristy
Actually, both interpret to plain old Theta-style. You can see that by invoking EXPLAIN EXTENDED on your query, followed by SHOW WARNINGS.Undercoating
you can also do USING(category,field_id) which is useful when joining by composite primary keys, also I heard that the optimizer does use USING to improve performance in some casesFinney
Is USING a MySQL definition or it is standard?Sech
@Sech it is in the ANSI SQL 92 standardUndercoating
E
27

Thought I would chip in here with when I have found ON to be more useful than USING. It is when OUTER joins are introduced into queries.

ON benefits from allowing the results set of the table that a query is OUTER joining onto to be restricted while maintaining the OUTER join. Attempting to restrict the results set through specifying a WHERE clause will, effectively, change the OUTER join into an INNER join.

Granted this may be a relative corner case. Worth putting out there though.....

For example:

CREATE TABLE country (
   countryId int(10) unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT,
   country varchar(50) not null,
  UNIQUE KEY countryUIdx1 (country)
) ENGINE=InnoDB;

insert into country(country) values ("France");
insert into country(country) values ("China");
insert into country(country) values ("USA");
insert into country(country) values ("Italy");
insert into country(country) values ("UK");
insert into country(country) values ("Monaco");


CREATE TABLE city (
  cityId int(10) unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT,
  countryId int(10) unsigned not null,
  city varchar(50) not null,
  hasAirport boolean not null default true,
  UNIQUE KEY cityUIdx1 (countryId,city),
  CONSTRAINT city_country_fk1 FOREIGN KEY (countryId) REFERENCES country (countryId)
) ENGINE=InnoDB;


insert into city (countryId,city,hasAirport) values (1,"Paris",true);
insert into city (countryId,city,hasAirport) values (2,"Bejing",true);
insert into city (countryId,city,hasAirport) values (3,"New York",true);
insert into city (countryId,city,hasAirport) values (4,"Napoli",true);
insert into city (countryId,city,hasAirport) values (5,"Manchester",true);
insert into city (countryId,city,hasAirport) values (5,"Birmingham",false);
insert into city (countryId,city,hasAirport) values (3,"Cincinatti",false);
insert into city (countryId,city,hasAirport) values (6,"Monaco",false);

-- Gah. Left outer join is now effectively an inner join 
-- because of the where predicate
select *
from country left join city using (countryId)
where hasAirport
; 

-- Hooray! I can see Monaco again thanks to 
-- moving my predicate into the ON
select *
from country co left join city ci on (co.countryId=ci.countryId and ci.hasAirport)
; 
Ethicize answered 12/9, 2014 at 11:15 Comment(4)
Extremely good point. Of all the advantages using provides, it can't be combined with other predicates: select*from t join t2 using(i) and on 1 wouldnt work.Flacon
where hasAirport ; - what does this mean ? no value is there to compare with .Airhead
Also note that you can do way more comparisions with ON than just =. Like SELECT * FROM country LEFT JOIN city ON country.countryId=city.countryId AND city.city BETWEEN 'C' AND 'E' will list all countries but just the cities starting with C or D (if any). (Plus towns called 'E')Myke
I once even did a JOIN with a subquery in the ON !!! It is all possible and sometimes highly effective.Myke
M
26

Wikipedia has the following information about USING:

The USING construct is more than mere syntactic sugar, however, since the result set differs from the result set of the version with the explicit predicate. Specifically, any columns mentioned in the USING list will appear only once, with an unqualified name, rather than once for each table in the join. In the case above, there will be a single DepartmentID column and no employee.DepartmentID or department.DepartmentID.

Tables that it was talking about:

enter image description here

The Postgres documentation also defines them pretty well:

The ON clause is the most general kind of join condition: it takes a Boolean value expression of the same kind as is used in a WHERE clause. A pair of rows from T1 and T2 match if the ON expression evaluates to true.

The USING clause is a shorthand that allows you to take advantage of the specific situation where both sides of the join use the same name for the joining column(s). It takes a comma-separated list of the shared column names and forms a join condition that includes an equality comparison for each one. For example, joining T1 and T2 with USING (a, b) produces the join condition ON T1.a = T2.a AND T1.b = T2.b.

Furthermore, the output of JOIN USING suppresses redundant columns: there is no need to print both of the matched columns, since they must have equal values. While JOIN ON produces all columns from T1 followed by all columns from T2, JOIN USING produces one output column for each of the listed column pairs (in the listed order), followed by any remaining columns from T1, followed by any remaining columns from T2.

Minutiae answered 21/7, 2015 at 0:25 Comment(0)
N
13

Database tables

To demonstrate how the USING and ON clauses work, let's assume we have the following post and post_comment database tables, which form a one-to-many table relationship via the post_id Foreign Key column in the post_comment table referencing the post_id Primary Key column in the post table:

SQL USING vs ON clauses table relationship

The parent post table has 3 rows:

| post_id | title     |
|---------|-----------|
| 1       | Java      |
| 2       | Hibernate |
| 3       | JPA       |

and the post_comment child table has the 3 records:

| post_comment_id | review    | post_id |
|-----------------|-----------|---------|
| 1               | Good      | 1       |
| 2               | Excellent | 1       |
| 3               | Awesome   | 2       |

The JOIN ON clause using a custom projection

Traditionally, when writing an INNER JOIN or LEFT JOIN query, we happen to use the ON clause to define the join condition.

For example, to get the comments along with their associated post title and identifier, we can use the following SQL projection query:

SELECT
   post.post_id,
   title,
   review
FROM post
INNER JOIN post_comment ON post.post_id = post_comment.post_id
ORDER BY post.post_id, post_comment_id

And, we get back the following result set:

| post_id | title     | review    |
|---------|-----------|-----------|
| 1       | Java      | Good      |
| 1       | Java      | Excellent |
| 2       | Hibernate | Awesome   |

The JOIN USING clause using a custom projection

When the Foreign Key column and the column it references have the same name, we can use the USING clause, like in the following example:

SELECT
  post_id,
  title,
  review
FROM post
INNER JOIN post_comment USING(post_id)
ORDER BY post_id, post_comment_id

And, the result set for this particular query is identical to the previous SQL query that used the ON clause:

| post_id | title     | review    |
|---------|-----------|-----------|
| 1       | Java      | Good      |
| 1       | Java      | Excellent |
| 2       | Hibernate | Awesome   |

The USING clause works for Oracle, PostgreSQL, MySQL, and MariaDB. SQL Server doesn't support the USING clause, so you need to use the ON clause instead.

The USING clause can be used with INNER, LEFT, RIGHT, and FULL JOIN statements.

SQL JOIN ON clause with SELECT *

Now, if we change the previous ON clause query to select all columns using SELECT *:

SELECT *
FROM post
INNER JOIN post_comment ON post.post_id = post_comment.post_id
ORDER BY post.post_id, post_comment_id

We are going to get the following result set:

| post_id | title     | post_comment_id | review    | post_id |
|---------|-----------|-----------------|-----------|---------|
| 1       | Java      | 1               | Good      | 1       |
| 1       | Java      | 2               | Excellent | 1       |
| 2       | Hibernate | 3               | Awesome   | 2       |

As you can see, the post_id is duplicated because both the post and post_comment tables contain a post_id column.

SQL JOIN USING clause with SELECT *

On the other hand, if we run a SELECT * query that features the USING clause for the JOIN condition:

SELECT *
FROM post
INNER JOIN post_comment USING(post_id)
ORDER BY post_id, post_comment_id

We will get the following result set:

| post_id | title     | post_comment_id | review    |
|---------|-----------|-----------------|-----------|
| 1       | Java      | 1               | Good      |
| 1       | Java      | 2               | Excellent |
| 2       | Hibernate | 3               | Awesome   |

You can see that this time, the post_id column is deduplicated, so there is a single post_id column being included in the result set.

Conclusion

If the database schema is designed so that Foreign Key column names match the columns they reference, and the JOIN conditions only check if the Foreign Key column value is equal to the value of its mirroring column in the other table, then you can employ the USING clause.

Otherwise, if the Foreign Key column name differs from the referencing column or you want to include a more complex join condition, then you should use the ON clause instead.

Natalia answered 28/7, 2020 at 9:47 Comment(3)
The queries are not identical. There are subtle syntactic differences. See the accepted answer, for example.Playground
@LukasEder "the result set is going to be identical" for the queries. (The JOINs of ON & USING calculate different tables in standard SQL. MySQL allows mentioning the USING columns via dotting, which is not allowed in standard SQL. But SELECT * returns only one copy of each USING column. So again the JOINs of ON & USING calculate different tables.)Atonement
Thanks for mentioning that USING is not available in SQL Server 😢.Undirected
M
3

For those experimenting with this in phpMyAdmin, just a word:

phpMyAdmin appears to have a few problems with USING. For the record this is phpMyAdmin run on Linux Mint, version: "4.5.4.1deb2ubuntu2", Database server: "10.2.14-MariaDB-10.2.14+maria~xenial - mariadb.org binary distribution".

I have run SELECT commands using JOIN and USING in both phpMyAdmin and in Terminal (command line), and the ones in phpMyAdmin produce some baffling responses:

1) a LIMIT clause at the end appears to be ignored.
2) the supposed number of rows as reported at the top of the page with the results is sometimes wrong: for example 4 are returned, but at the top it says "Showing rows 0 - 24 (2503 total, Query took 0.0018 seconds.)"

Logging on to mysql normally and running the same queries does not produce these errors. Nor do these errors occur when running the same query in phpMyAdmin using JOIN ... ON .... Presumably a phpMyAdmin bug.

Madonnamadora answered 7/6, 2018 at 21:10 Comment(0)
S
1

Short answer:

  • USING: when clause is ambiguous
  • ON: when clause has different comparison params
Speaking answered 20/10, 2022 at 11:33 Comment(0)
D
0

I want to add a practical reason why one would want to use 'using' over 'on'.

I was doing 'Combine two tables' problem on leetcode and found out that 'using' is more than twice faster than 'on'. Of course, as the selected answer explained, 'using' is only when one column criteria is enough for the task. So 'on' is still necessary in other situations.

So my assumption is that 'using' method was created in simpler way than 'on' method. While you can't do much stuff with 'using', it provides simpler path, thus has better performance.

Duo answered 17/1 at 4:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.