MySQL Slow JOIN query when using ORDERBY
Asked Answered
P

5

5

I have a problem with this query:

SELECT a.*
FROM smartressort AS s
JOIN smartressort_to_ressort AS str
    ON s.id = str.smartressort_id
JOIN article_to_ressort AS atr
    ON str.ressort_id = atr.ressort_id
JOIN article AS a FORCE INDEX (source_created)
    ON atr.article_id = a.id    
WHERE
    s.id = 1
ORDER BY
    a.created_at DESC
LIMIT 25;

This one is realy slow, it some times takes 14 sec.

EXPLAIN show this:

1   SIMPLE  s   const   PRIMARY PRIMARY 4   const   1   Using index; Using temporary; Using filesort
1   SIMPLE  str ref PRIMARY,ressort_id  PRIMARY 4   const   1   Using index
1   SIMPLE  atr ref PRIMARY,article_id  PRIMARY 4   com.nps.lvz-prod.str.ressort_id 1262    Using index
1   SIMPLE  a   ALL NULL    NULL    NULL    NULL    146677  Using where; Using join buffer (flat, BNL join)

so the last "all" type is realy bad. But i already tried to force using the index, with no luck.

The Article Table looks like this:

CREATE TABLE `article` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`node_id` varchar(255) NOT NULL DEFAULT '',
`object_id` varchar(255) DEFAULT NULL,
`headline_1` varchar(255) NOT NULL DEFAULT '',
`created_at` datetime(3) NOT NULL,
`updated_at` datetime(3) NOT NULL,
`teaser_text` longtext NOT NULL,
`content_text` longtext NOT NULL,
PRIMARY KEY (`id`),
KEY `article_nodeid` (`node_id`),
KEY `article_objectid` (`object_id`),
KEY `source_created` (`created_at`)
) ENGINE=InnoDB AUTO_INCREMENT=161116 DEFAULT CHARSET=utf8mb4 ROW_FORMAT=DYNAMIC;

When i remove the FORCE INDEX, the Explain gets better, but the query is still slow.

Explain Without force index:

1   SIMPLE  s   const   PRIMARY PRIMARY 4   const   1   Using index; Using temporary; Using filesort
1   SIMPLE  str ref PRIMARY,ressort_id  PRIMARY 4   const   1   Using index
1   SIMPLE  atr ref PRIMARY,article_id  PRIMARY 4   com.nps.lvz-prod.str.ressort_id 1262    Using index
1   SIMPLE  a   eq_ref  PRIMARY PRIMARY 4   com.nps.lvz-prod.atr.article_id 1   

And for another smartressort id(3) it looks like this:

1   SIMPLE  s   const   PRIMARY PRIMARY 4   const   1   Using index; Using temporary; Using filesort
1   SIMPLE  str ref PRIMARY,ressort_id  PRIMARY 4   const   13  Using index
1   SIMPLE  atr ref PRIMARY,article_id  PRIMARY 4   com.nps.lvz-prod.str.ressort_id 1262    Using index
1   SIMPLE  a   eq_ref  PRIMARY PRIMARY 4   com.nps.lvz-prod.atr.article_id 1   

Here we have 13 Ressorts for one Smartressort. Rows: 1x1x13x1262x1 = 16.406

1) What can i do to make this request faster?

2) What's wrong with the source_created index?

Phoebephoebus answered 12/3, 2019 at 10:32 Comment(10)
You left out perhaps the most critical part, which is the definition for the source_created index. Please include that.Breviary
@Cid I don't think that would typically make much difference.Breviary
Forcing source_created index is useless, you are not using the indexed column in the join, only after.Pipsissewa
@Cid The WHERE clause is applied before the join happens, not afterwards :-)Breviary
@TimBiegeleisen haha then I was wrong, thanks for pointing that, tonight, I'll go sleeping a bit less dumb :)Fairground
"The FORCE INDEX hint acts like USE INDEX (index_list), with the addition that a table scan is assumed to be very expensive. In other words, a table scan is used only if there is no way to use one of the named indexes to find rows in the table." In my opinion, in the query but can't use only the named index (source_created) then it does a full scan. Can you try with the primary key and show us the result ?Kolivas
@TimBiegeleisen the source_created is here: KEY source_created (created_at) so its just an index for created_atPhoebephoebus
@DanielE. forcing the primary key as the index is also very slow. The main problem is the ordering. The query without the order by only takes ~35ms.Phoebephoebus
Have you solved the issue? what was the solution?Pipsissewa
In addition to the Answers given, follow the tips on many:many tables here.Windswept
B
4

The SELECT * you have in your query is ugly, and this can often be an index killer. It can preclude the use of an index, because most indices you would define would not cover every column demanded by the SELECT *. The approach of this answer is to index all other tables in your query, which would therefore incentivize MySQL to just do a single scan over the article table.

CREATE INDEX idx1 ON article_to_ressort (article_id, ressort_id);
CREATE INDEX idx2 ON smartressort_to_ressort (ressort_id, smartressort_id);

These two indices should speed up the joining process. Note that I did not define an index for the smartressort table, assuming that its id column is already a primary key. I would probably write your query starting with the article table, and joining outwards, but it should not really matter.

Also, forcing an index is mostly either a bad idea or not necessary. The optimizer can usually figure out when it is best to use an index.

Breviary answered 12/3, 2019 at 10:47 Comment(4)
Note that SELECT a.* is not that bad in this particular case, OP needs all the data , so the row should be fetched anyways - in general you are right, there are cases when all the data could be served from an index - similarly to your new indexes.Pipsissewa
@Pipsissewa Well MySQL has to scan something. If it can find an efficient way to scan the article table, handling the joins, then it should do that.Breviary
agree. Perhaps an index on article, containing id and created_at would help the ORDER BY, preventing the need to access all the rows for sorting?Pipsissewa
The thing is, the index you describe would not cover all the columns needed by the SELECT a.*, so it is not clear whether MySQL would choose to use it.Breviary
C
2

SELECT many columns FROM tables ORDER BY something LIMIT few is a notorious performance antipattern; it has to retrieve and order a whole mess of rows and columns, just to discard all but a few rows of the result set.

The trick is to figure out which values of article.id you need in your result set, then retrieve just those values. It's called a deferred join.

This should get you that set of id values. There's probably no need to join the smartressort table because smartressort_to_ressort contains the id values you need.

                 SELECT a.id
                   FROM article a
                   JOIN article_to_ressort atr ON a.id = atr.article_id
                   JOIN smartressort_to_ressort str ON atr.ressort_id = str.ressort_id
                  WHERE str.smartressort_id = 1
                  ORDER BY a.created_at DESC
                  LIMIT 25

Then you can use this as a subquery to get the rows you need.

SELECT a.*
  FROM article a
 WHERE a.id IN (
                 SELECT a.id
                   FROM article a
                   JOIN article_to_ressort atr ON a.id = atr.article_id
                   JOIN smartressort_to_ressort str ON atr.ressort_id = str.ressort_id
                  WHERE str.smartressort_id = 1
                  ORDER BY a.created_at DESC
                  LIMIT 25
               )
 ORDER BY a.created_at DESC

The second ORDER BY makes sure the rows from article are in a predictable order. Your index optimization work, then, need only apply to the subquery.

Coltson answered 12/3, 2019 at 11:5 Comment(3)
@Phoebephoebus ideally the optimizer should access the a.created_at table first because you order by on that on it so basically O.Jones is forcing the MySQL optimizer into a better plan here.. MySQL 8.0 would automaticlly optimize your query and would avoid "Using temporary" and would run just fineMarcin
i cant use the Limit in the subquery. But this works and is much faster: SELECT ar.* FROM article ar WHERE ar.id IN ( SELECT atr.article_id from article_to_ressort atr JOIN smartressort_to_ressort str ON atr.ressort_id = str.ressort_id WHERE str.smartressort_id = 2 ) ORDER BY ar.created_at DESC LIMIT 25;Phoebephoebus
i also need to add: CREATE INDEX idx3 ON article ( id , created_at )Phoebephoebus
P
0

In addition to @TimBiegelsen 's great answer, I would recommend to modify your source_created index:

...
KEY `source_created` (`id`, `created_at`)

The gain would be that MySQL could use it for sorting, and wouldn't need to fetch all 16406 rows. It may or may not help, but worth to try (perhaps with explicite declaration to use it)

Pipsissewa answered 12/3, 2019 at 11:14 Comment(0)
P
0

To start with: You can remove the smartressort table from your query, as it doesn't add anything to it.

The following is your query rewritten. We want all ressorts for smart ressort #1 and then all articles for these ressorts. Of these we show the newest 25.

SELECT *
FROM article
WHERE id IN
(
  SELECT article_id
  FROM article_to_ressort 
  WHERE ressort_id IN
  (
    SELECT ressort_id
    FROM smartressort_to_ressort
    WHERE smartressort_id = 1
  )
)
ORDER BY created_at DESC
LIMIT 25;

Now which indexes would be needed to help the DBMS with this? Start with the inner table (smartressort_to_ressort). We access all records with a given smartressort_id and we want to get the associated ressort_id. So the index should contain these two columns in this order. Same for article_to_ressort and its ressort_id and article_id. At last we want to select the articles by the found article IDs and order by their created_at.

CREATE INDEX idx1 ON smartressort_to_ressort (smartressort_id, ressort_id);
CREATE INDEX idx2 ON article_to_ressort (ressort_id, article_id);
CREATE INDEX idx3 ON article (id, created_at);

Anyway, these indexes are just an offer to the DBMS. It may decide against them. This is especially true for the index on the article table. How many rows does the DBMS expect to access for one smartressort_id, i.e. how many rows may be in the IN clause? If the DBMS thinks that this might well be about 10% of all article IDs, it may already decide to rather read the table sequentially than muddle it's way through the index for so many rows.

Pedestrianize answered 12/3, 2019 at 11:18 Comment(1)
This is much faster (~35ms). the explain query shows that only the idx3 is used. rows shows 1262 x 1 x 1 x 1262.Phoebephoebus
P
0

So for me the solution was this:

SELECT a.*
FROM article as a  USE INDEX (source_created)
where a.id in (
             SELECT atr.article_id
               from smartressort_to_ressort str 
               JOIN article_to_ressort atr  ON atr.ressort_id = str.ressort_id
              WHERE str.smartressort_id = 1
) 
ORDER BY a.created_at DESC
LIMIT 25;

This only needs ~35ms. Explain looks like this:

1   PRIMARY a   index   NULL    source_created  7   NULL    1   
1   PRIMARY <subquery2> eq_ref  distinct_key    distinct_key    4   func    1
2   MATERIALIZED    str ref PRIMARY,ressort_id,idx1 PRIMARY 4   const   1   Using index
2   MATERIALIZED    atr ref PRIMARY,article_id,idx2 PRIMARY 4   com.nps.lvz-prod.str.ressort_id 1262    Using index

Even so, this query Explain looks better for me, but i don't know why exactly:

explain SELECT a.*, NOW()
FROM article as a  USE INDEX (source_created)
where a.id in (SELECT atr.article_id
    FROM smartressort AS s
    JOIN smartressort_to_ressort AS str
    ON s.id = str.smartressort_id
    JOIN article_to_ressort AS atr
    ON str.ressort_id = atr.ressort_id
    WHERE s.id = 1
) 
ORDER BY a.created_at DESC
LIMIT 25;

Output:

1   PRIMARY s   const   PRIMARY PRIMARY 4   const   1   Using index
1   PRIMARY a   index   NULL    source_created  7   NULL    25  
1   PRIMARY str ref PRIMARY,ressort_id,idx1 PRIMARY 4   const   1   Using index
1   PRIMARY atr eq_ref  PRIMARY,article_id,idx2 PRIMARY 8   com.nps.lvz-prod.str.ressort_id,com.nps.lvz-prod.a.id   1   Using index; FirstMatch(a)
Phoebephoebus answered 12/3, 2019 at 12:39 Comment(1)
Since this says MATERIALIZED, you may be testing with a different version.Windswept

© 2022 - 2024 — McMap. All rights reserved.