Mysql: inner join on primary key for 2 IDs gives "Range checked for each record"
Asked Answered
E

2

12

I get a "Range checked for each record (index map: 0x1)" in EXPLAIN SELECT when doing an INNER JOIN on a PRIMARY key with 2 values (using either IN or OR constructs)

Here is the query:

SELECT *
FROM message AS m
INNER JOIN user AS u
ON u.id = m.sender_id OR u.id = m.receiver_id

When doing an explain, it gives me:

+----+-------------+-------+------+---------------+------+---------+------+-------+-----------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra                                         |
+----+-------------+-------+------+---------------+------+---------+------+-------+-----------------------------------------------+
|  1 | SIMPLE      | u     | ALL  | PRIMARY       | null | null    | null | 75000 | Range checked for each record (index map: 0x1)|
+----+-------------+-------+------+---------------+------+---------+------+-------+-----------------------------------------------+

It can't be...

If I try this I get the same result:

SELECT *
FROM message AS m
INNER JOIN user AS u
ON u.id IN(m.sender_id, m.receiver_id)

But if I do this, it works fine and I get only 1 row parsed:

SELECT *
FROM message AS m
INNER JOIN user AS u
ON u.id = m.sender_id

How is this possible? I'm joining on a primary key with same-type values. (the actual query is "a bit" more complicated but nothing fancy, 2 inner joins and lastly one left join)

It should be 2 rows, period.

Thanks for any input on this (did some research but did not find anything valuable except for "please add an index", which is obviously not applicable here)

EDIT: yes, I tried the USE INDEX statement, but still no luck

EDIT: Here is a very simple schema to reproduce this weird behavior of MySQL:

CREATE TABLE test_user (
    id INT NOT NULL AUTO_INCREMENT,
    name VARCHAR(30),
    PRIMARY KEY (id)
);

CREATE TABLE test_message (
    id INT NOT NULL AUTO_INCREMENT, 
    sender_id INT NOT NULL,
    receiver_id INT NOT NULL,
    PRIMARY KEY (id),
    INDEX idx_sender (sender_id),
    INDEX idx_receiver (receiver_id)
);

EXPLAIN SELECT *
FROM test_message AS m
INNER JOIN test_user AS u
    ON u.id = m.sender_id OR u.id = m.receiver_id;
Entozoon answered 3/11, 2014 at 17:59 Comment(8)
Is inner join the correct join method? You're not supposed to use FULL as well?Margravine
Hmmmm, I see you have only primary indexes set up id numbers. Try adding indexes to sender_id and receiver_id, that may help the pairing.Software
Okay I did, it does not help with the IN construct, but it works with the OR construct! I thought that indexes were to be set only on the columns that were being searched (user.id), not the (already selected) ones where the values to test come from (task.id_user). Can you please post an actual answer (that I will be allowed to accept) and explain to me (and others) why the index should also be set on the reference column, not only the searched one? Thank you David!Entozoon
Wait, I spoke too fast. It still does not work. It only fixes the one and single use case where I do: u.id = t.id_user OR u.id = t.id_user (which is pointless but was still causing the issue and made the query even simpler). I'm sorry that it does not work. Any other idea? I'm sure I'm not the only one trying to compare a column to 2 columns using OR. I'm surprised I did not come accross that earlier.Entozoon
@hansmei, MySQL does not support FULL OUTER JOIN. And there is no such thing in SQL as FULL INNER JOIN.Sneaker
Please don't clarify in comments; edit your question. Also please do not keep adding "EDIT"s. Just edit your question to be the best self-contained up-to-date version. Past versions are accessible via the "edited" link.Redraft
Hi. What do you mean, "it works fine"? Or "it does not work". Also you don't actually say what you expect. Even though you might think that's obvious. Please read & act on minimal reproducible example. Here, you have no example input, output & desired output. And what does "I get only 1 row parsed" mean? And give a complete example (code & data) that exhibits your problem. And in giving your specification, what are any other declarable unique & FKs column sets & non-null columns? PS What do mean by the "reference" & "searched" columns?Redraft
@LidelnKyoku - Something is wrong with the first query and its EXPLAIN -- The query mentions two tables; the EXPLAIN mentions only one. Please fix.Tillis
S
12

In general, MySQL can use only one index per table reference in a query (there's an index-merge algorithm, but this doesn't work as often as you might think).

Your join condition has an OR between two comparisons to indexed columns, and the optimizer can't choose which is the better one to use before the data in the table is examined row-by-row.

A common workaround is to do a UNION between simpler queries, instead of the OR condition.

mysql> EXPLAIN 
    SELECT * FROM test_message AS m 
    INNER JOIN test_user AS u ON u.id = m.sender_id 
  UNION
    SELECT * FROM test_message AS m 
    INNER JOIN test_user AS u ON u.id = m.receiver_id;

+----+--------------+------------+--------+---------------+---------+---------+--------------------+------+-----------------+
| id | select_type  | table      | type   | possible_keys | key     | key_len | ref                | rows | Extra           |
+----+--------------+------------+--------+---------------+---------+---------+--------------------+------+-----------------+
|  1 | PRIMARY      | m          | ALL    | idx_sender    | NULL    | NULL    | NULL               |    1 | NULL            |
|  1 | PRIMARY      | u          | eq_ref | PRIMARY       | PRIMARY | 4       | test.m.sender_id   |    1 | NULL            |
|  2 | UNION        | m          | ALL    | idx_receiver  | NULL    | NULL    | NULL               |    1 | NULL            |
|  2 | UNION        | u          | eq_ref | PRIMARY       | PRIMARY | 4       | test.m.receiver_id |    1 | NULL            |
| NULL | UNION RESULT | <union1,2> | ALL    | NULL          | NULL    | NULL    | NULL               | NULL | Using temporary |
+----+--------------+------------+--------+---------------+---------+---------+--------------------+------+-----------------+

This does use proper index lookups in both sub-queries, but it has to use a temporary table to finish the UNION afterwards. Ultimately, it might be a wash for performance. Depends on how many rows of data need to be examined, and how many rows are produced as the result.

Sneaker answered 2/11, 2017 at 17:31 Comment(3)
I cannot check but believe that using "UNION ALL" will help to avoid temporary table(potentially it may cause duplicates but I suppose it's quite rare case to send message to yourselves)Gilburt
@skyboyer: MySQL 5.7.3 and later can eliminate the temporary table when you use UNION ALL. See specific notes at dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-3.html. Earlier versions of MySQL don't have this optimization.Sneaker
Unbelievable... changing from a two-condition JOIN to two single-condition JOINs with a UNION changed runtime from 1m44s to 0.04s! Interesting the optimizer can't make that simplification itself.Linchpin
W
1

The problem is well known also in other (I think all) RDBMS, the optimizer will use only one rule for each join.

If the join condition is complex or if it can't recognize a known pattern to solve it, no optimization will be applied and it will go for full table scan.

In your case the OR condition in main join, seems simple but it is not, because you are asking to check each user id against two different columns (not constant values) at a time.

To solve it you must split your join condition in more sub-queries so optimizer can use the better rule for each one.

@Bill Karwin has proposed the common solution and it helps well to understand the problem.

A (slightly) better way to solve this problem is to move the union up one level and join on the derived table:

EXPLAIN 
SELECT *
FROM test_user AS u
INNER JOIN (
    select id, sender_id as msg_id
    from test_message 
    union all 
    select id, receiver_id 
    from test_message 
    ) AS m 
ON u.id = m.msg_id;

It will not use TEMPORARY tables and does only one full table scan on test_users instead of two

id  select_type table           partitions  type    possible_keys   key             key_len ref         rows    filtered    Extra
1   PRIMARY     u               NULL        ALL     PRIMARY         NULL            NULL    NULL        1       100.00      NULL
1   PRIMARY     <derived2>      NULL        ref     <auto_key0>     <auto_key0>     4       test.u.id   2       100.00      NULL
2   DERIVED     test_message    NULL        index   NULL            idx_sender      4       NULL        1       100.00      "Using index"
3   UNION       test_message    NULL        index   NULL            idx_receiver    4       NULL        1       100.00      "Using index" 
Wang answered 3/11, 2017 at 10:31 Comment(6)
Try that EXPLAIN again, after you have added a few dozen rows to the tables. You might find that it does not start with a full table scan of u. But, there is still a tmp table implied in <derived2>. Furthermore there is a full table scan of it (ok, only 2 rows) to build an index (<auto_key0>).Tillis
@RickJames I did my tests with 6 users and 50 messages and the execution plan does not change. This query is similar to @BillKarwin one, but it does only 1 full table scan of u and only 1 nested loop (JOIN) instead of 2 and 2. From MySQL 5.7 building the merged index <auto_key0> for derived table should have good performances. I mean, to join a union shoud be better than the union of two joins.Wang
Whoa! The Question wanted all columns from both test_message and test_user; the query here delivers only the columns from test_user. So, I suggest it is not a valid Answer.Tillis
the question was "How is this possible to get 'Range checked for each record' in this scenario?" and my answer give a description of the problem and an alternative solution to avoid it. If you do not like it, don't worry.. I will survive..Wang
'Range checked for each record' is so cryptic that I don't know what it means.Tillis
?!?!? I thought you were the MySQL Expert.. and also that you read the question (at least the title)Wang

© 2022 - 2024 — McMap. All rights reserved.