Does the MySQL IN clause execute the subquery multiple times?
Asked Answered
B

2

5

Given this SQL query in MySQL:

SELECT * FROM tableA WHERE tableA.id IN (SELECT id FROM tableB);

Does MySQL execute the subquery SELECT id FROM tableB multiple times for each row in tableA?

Is there a way to make sql go faster without using variables or store procedures?

Why is this often slower than using LEFT JOIN?

Brynnbrynna answered 13/9, 2013 at 16:12 Comment(0)
J
11

Your assumption is false; the subquery will be executed only once. The reason why it's slower than a join is because IN can't take advantage of indexes; it has to scan its arguments once for each time the WHERE clause is evaluated, that is, once per row in tableA. You can optimize the query, without using variables or stored procedures, simply by replacing the IN with a join, thus:

SELECT tableA.field1, tableA.field2, [...]
FROM tableA 
  INNER JOIN tableB ON tableA.id = tableB.id

Unless you don't mind getting back every field from both tables, you do need to enumerate the fields you want in the SELECT clause; tableA.*, for example, will elicit a syntax error.

Jael answered 13/9, 2013 at 16:16 Comment(9)
+1. Also a LEFT JOIN isn't the same thing. INNER JOIN is the answer.Migratory
@Migratory Well, if I wanted to be picky, I could argue that LEFT JOIN tableB on tableA.id = tableB.id AND tableB.id IS NOT NULL would have the same effect if more verbosely, but that'd be silly.Jael
Are you sure that in MySQL the subquery will be executed only once? Where does the assurance come from?Univalent
no indexes! That explains a lot, and enumerating the values for IN for each row is also going to slow it down. Thanks for the clarification.Brynnbrynna
@Univalent From experience; feel free to confirm it yourself with any MySQL 5.0 instance that's configured to use a general query log.Jael
@MathewFoscarini Glad to be of help!Jael
How does the general log tell you whether the subquery is executed once or for each row? One should play with counters to be sure. But similar kind of subquries is usually not optimized by MySQL (it is from my experience). But this trivial one certaily might be optimized.Univalent
Moreover, if you'd select something other than the primary key (and not stored as a first field in any index), you would see a significant slowdown. I've just tried.Univalent
With JOIN, if tableB.id is not unique it will return the same row from tableA multiple times. With IN, matched rows are returned only once even if tableB.id has repeated values.Fouquet
A
5

First, this depends on the version of MySQL. I believe that version 5.6 optimizes such queries correctly. MySQL documentation is inconsistent on this. For instance, here it says one thing:

Consider the following subquery comparison:

outer_expr IN (SELECT inner_expr FROM ... WHERE subquery_where)

MySQL evaluates queries “from outside to inside.” That is, it first obtains the value of the outer expression outer_expr, and then runs the subquery and captures the rows that it produces.

This "from outside to inside" means that the subquery is evaluated for each row. This is consistent with my experience with MySQL.

The documentation suggests otherwise here:

Some optimizations that MySQL itself makes are:

  • MySQL executes uncorrelated subqueries only once. Use EXPLAIN to make sure that a given subquery really is uncorrelated.
  • MySQL rewrites IN, ALL, ANY, and SOME subqueries in an attempt to take advantage of the possibility that the select-list columns in the subquery are indexed.

I believe the statement does not refer to in clauses. Perhaps what happens is that the subquery is rewritten as a correlated subquery to check for indexes, and then it is run multiple times (regardless of the presence of a index).

Agriculture answered 13/9, 2013 at 16:35 Comment(4)
Sure, anyone is not sure about this, please have a look at this question: "MySQL “IN” queries terribly slow with subquery but fast with explicit values" - #5018784Aleedis
@TheAnhNguyen . . . That question is 10 years old. The referenced version of MySQL in this answer was 8 years ago.Agriculture
Yesterday, I have same problem with MariaDB 5.5.60 (released in 2018). So I also want to know that what RBDMS optimizes such queries correctly and vice versa.Aleedis
Oh, by MySQL 8.0 Reference link (dev.mysql.com/doc/refman/8.0/en/subquery-materialization.html). I will make a test with it later. It said that: The optimizer uses materialization to enable more efficient subquery processing. Materialization speeds up query execution by generating a subquery result as a temporary table, normally in memory. The first time MySQL needs the subquery result, it materializes that result into a temporary table. Any subsequent time the result is needed, MySQL refers again to the temporary table.Aleedis

© 2022 - 2024 — McMap. All rights reserved.