MySQL version: 5.5.28-0ubuntu0.12.04.2-log
I was also under the impression that JOIN is always better than a sub-query in MySQL, but EXPLAIN is a better way to make a judgment. Here is an example where sub queries work better than JOINs.
Here is my query with 3 sub-queries:
EXPLAIN SELECT vrl.list_id,vrl.ontology_id,vrl.position,l.name AS list_name, vrlih.position AS previous_position, vrl.moved_date
FROM `vote-ranked-listory` vrl
INNER JOIN lists l ON l.list_id = vrl.list_id
INNER JOIN `vote-ranked-list-item-history` vrlih ON vrl.list_id = vrlih.list_id AND vrl.ontology_id=vrlih.ontology_id AND vrlih.type='PREVIOUS_POSITION'
INNER JOIN list_burial_state lbs ON lbs.list_id = vrl.list_id AND lbs.burial_score < 0.5
WHERE vrl.position <= 15 AND l.status='ACTIVE' AND l.is_public=1 AND vrl.ontology_id < 1000000000
AND (SELECT list_id FROM list_tag WHERE list_id=l.list_id AND tag_id=43) IS NULL
AND (SELECT list_id FROM list_tag WHERE list_id=l.list_id AND tag_id=55) IS NULL
AND (SELECT list_id FROM list_tag WHERE list_id=l.list_id AND tag_id=246403) IS NOT NULL
ORDER BY vrl.moved_date DESC LIMIT 200;
EXPLAIN shows:
+----+--------------------+----------+--------+-----------------------------------------------------+--------------+---------+-------------------------------------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------+--------+-----------------------------------------------------+--------------+---------+-------------------------------------------------+------+--------------------------+
| 1 | PRIMARY | vrl | index | PRIMARY | moved_date | 8 | NULL | 200 | Using where |
| 1 | PRIMARY | l | eq_ref | PRIMARY,status,ispublic,idx_lookup,is_public_status | PRIMARY | 4 | ranker.vrl.list_id | 1 | Using where |
| 1 | PRIMARY | vrlih | eq_ref | PRIMARY | PRIMARY | 9 | ranker.vrl.list_id,ranker.vrl.ontology_id,const | 1 | Using where |
| 1 | PRIMARY | lbs | eq_ref | PRIMARY,idx_list_burial_state,burial_score | PRIMARY | 4 | ranker.vrl.list_id | 1 | Using where |
| 4 | DEPENDENT SUBQUERY | list_tag | ref | list_tag_key,list_id,tag_id | list_tag_key | 9 | ranker.l.list_id,const | 1 | Using where; Using index |
| 3 | DEPENDENT SUBQUERY | list_tag | ref | list_tag_key,list_id,tag_id | list_tag_key | 9 | ranker.l.list_id,const | 1 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | list_tag | ref | list_tag_key,list_id,tag_id | list_tag_key | 9 | ranker.l.list_id,const | 1 | Using where; Using index |
+----+--------------------+----------+--------+-----------------------------------------------------+--------------+---------+-------------------------------------------------+------+--------------------------+
The same query with JOINs is:
EXPLAIN SELECT vrl.list_id,vrl.ontology_id,vrl.position,l.name AS list_name, vrlih.position AS previous_position, vrl.moved_date
FROM `vote-ranked-listory` vrl
INNER JOIN lists l ON l.list_id = vrl.list_id
INNER JOIN `vote-ranked-list-item-history` vrlih ON vrl.list_id = vrlih.list_id AND vrl.ontology_id=vrlih.ontology_id AND vrlih.type='PREVIOUS_POSITION'
INNER JOIN list_burial_state lbs ON lbs.list_id = vrl.list_id AND lbs.burial_score < 0.5
LEFT JOIN list_tag lt1 ON lt1.list_id = vrl.list_id AND lt1.tag_id = 43
LEFT JOIN list_tag lt2 ON lt2.list_id = vrl.list_id AND lt2.tag_id = 55
INNER JOIN list_tag lt3 ON lt3.list_id = vrl.list_id AND lt3.tag_id = 246403
WHERE vrl.position <= 15 AND l.status='ACTIVE' AND l.is_public=1 AND vrl.ontology_id < 1000000000
AND lt1.list_id IS NULL AND lt2.tag_id IS NULL
ORDER BY vrl.moved_date DESC LIMIT 200;
and the output is:
+----+-------------+-------+--------+-----------------------------------------------------+--------------+---------+---------------------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-----------------------------------------------------+--------------+---------+---------------------------------------------+------+----------------------------------------------+
| 1 | SIMPLE | lt3 | ref | list_tag_key,list_id,tag_id | tag_id | 5 | const | 2386 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | l | eq_ref | PRIMARY,status,ispublic,idx_lookup,is_public_status | PRIMARY | 4 | ranker.lt3.list_id | 1 | Using where |
| 1 | SIMPLE | vrlih | ref | PRIMARY | PRIMARY | 4 | ranker.lt3.list_id | 103 | Using where |
| 1 | SIMPLE | vrl | ref | PRIMARY | PRIMARY | 8 | ranker.lt3.list_id,ranker.vrlih.ontology_id | 65 | Using where |
| 1 | SIMPLE | lt1 | ref | list_tag_key,list_id,tag_id | list_tag_key | 9 | ranker.lt3.list_id,const | 1 | Using where; Using index; Not exists |
| 1 | SIMPLE | lbs | eq_ref | PRIMARY,idx_list_burial_state,burial_score | PRIMARY | 4 | ranker.vrl.list_id | 1 | Using where |
| 1 | SIMPLE | lt2 | ref | list_tag_key,list_id,tag_id | list_tag_key | 9 | ranker.lt3.list_id,const | 1 | Using where; Using index |
+----+-------------+-------+--------+-----------------------------------------------------+--------------+---------+---------------------------------------------+------+----------------------------------------------+
A comparison of the rows
column tells the difference and the query with JOINs is using Using temporary; Using filesort
.
Of course when I run both the queries, the first one is done in 0.02 secs, the second one does not complete even after 1 min, so EXPLAIN explained these queries properly.
If I do not have the INNER JOIN on the list_tag
table i.e. if I remove
AND (SELECT list_id FROM list_tag WHERE list_id=l.list_id AND tag_id=246403) IS NOT NULL
from the first query and correspondingly:
INNER JOIN list_tag lt3 ON lt3.list_id = vrl.list_id AND lt3.tag_id = 246403
from the second query, then EXPLAIN returns the same number of rows for both queries and both these queries run equally fast.