MySQL & nested set: slow JOIN (not using index)
Asked Answered
A

3

7

I have two tables:

localities:

CREATE TABLE `localities` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(100) NOT NULL,
  `type` varchar(30) NOT NULL,
  `parent_id` int(11) DEFAULT NULL,
  `lft` int(11) DEFAULT NULL,
  `rgt` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `index_localities_on_parent_id_and_type` (`parent_id`,`type`),
  KEY `index_localities_on_name` (`name`),
  KEY `index_localities_on_lft_and_rgt` (`lft`,`rgt`)
) ENGINE=InnoDB;

locatings:

CREATE TABLE `locatings` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `localizable_id` int(11) DEFAULT NULL,
  `localizable_type` varchar(255) DEFAULT NULL,
  `locality_id` int(11) NOT NULL,
  `category` varchar(50) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `index_locatings_on_locality_id` (`locality_id`),
  KEY `localizable_and_category_index` (`localizable_type`,`localizable_id`,`category`),
  KEY `index_locatings_on_category` (`category`)
) ENGINE=InnoDB;

localities table is implemented as a nested set.

Now, when user belongs to some locality (through some locating) he also belongs to all its ancestors (higher level localities). I need a query that will select all the localities that all the users belong to into a view.

Here is my try:

select distinct lca.*, lt.localizable_type, lt.localizable_id 
from locatings lt
join localities lc on lc.id = lt.locality_id
left join localities lca on (lca.lft <= lc.lft and lca.rgt >= lc.rgt)

The problem here is that it takes way too much time to execute.

I consulted EXPLAIN:

+----+-------------+-------+--------+---------------------------------+---------+---------+----------------------------------+-------+----------+-----------------+
| id | select_type | table | type   | possible_keys                   | key     | key_len | ref                              | rows  | filtered | Extra           |
+----+-------------+-------+--------+---------------------------------+---------+---------+----------------------------------+-------+----------+-----------------+
|  1 | SIMPLE      | lt    | ALL    | index_locatings_on_locality_id  | NULL    | NULL    | NULL                             |  4926 |   100.00 | Using temporary |
|  1 | SIMPLE      | lc    | eq_ref | PRIMARY                         | PRIMARY | 4       | bzzik_development.lt.locality_id |     1 |   100.00 |                 |
|  1 | SIMPLE      | lca   | ALL    | index_localities_on_lft_and_rgt | NULL    | NULL    | NULL                             | 11439 |   100.00 |                 |
+----+-------------+-------+--------+---------------------------------+---------+---------+----------------------------------+-------+----------+-----------------+
3 rows in set, 1 warning (0.00 sec)

The last join obviously doesn’t use lft, rgt index as I expect it to. I’m desperate.

UPDATE: After adding a condition as @cairnz suggested, the query takes still too much time to process.

UPDATE 2: Column names instead of the asterisk

Updated query:

SELECT DISTINCT lca.id, lt.`localizable_id`, lt.`localizable_type` 
FROM locatings lt FORCE INDEX(index_locatings_on_category)
JOIN localities lc
    ON lc.id = lt.locality_id
INNER JOIN localities lca
    ON lca.lft <= lc.lft AND lca.rgt >= lc.rgt
WHERE lt.`category` != "Unknown";

Updated EXAPLAIN:

+----+-------------+-------+--------+-----------------------------------------+-----------------------------+---------+---------------------------------+-------+----------+-------------------------------------------------+
| id | select_type | table | type   | possible_keys                           | key                         | key_len | ref                             | rows  | filtered | Extra                                           |
+----+-------------+-------+--------+-----------------------------------------+-----------------------------+---------+---------------------------------+-------+----------+-------------------------------------------------+
|  1 | SIMPLE      | lt    | range  | index_locatings_on_category             | index_locatings_on_category | 153     | NULL                            |  2545 |   100.00 | Using where; Using temporary                    |
|  1 | SIMPLE      | lc    | eq_ref | PRIMARY,index_localities_on_lft_and_rgt | PRIMARY                     | 4       | bzzik_production.lt.locality_id |     1 |   100.00 |                                                 |
|  1 | SIMPLE      | lca   | ALL    | index_localities_on_lft_and_rgt         | NULL                        | NULL    | NULL                            | 11570 |   100.00 | Range checked for each record (index map: 0x10) |
+----+-------------+-------+--------+-----------------------------------------+-----------------------------+---------+---------------------------------+-------+----------+-------------------------------------------------+

Any help appreciated.

Auscultate answered 3/1, 2012 at 18:15 Comment(2)
Have you tried not having lft and rft in the same index? (one for lft, one for rft)Affusion
Answer updated per your update.Affusion
A
2

Ah, it just occurred to me.

Since you are asking for everything in the table, mysql decides to use a full table scan instead, as it deems it more efficient.

In order to get some key usage, add in some filters to restrict looking for every row in all the tables anyways.

Updating Answer:

Your second query does not make sense. You are left joining to lca yet you have a filter in it, this negates the left join by itself. Also you're looking for data in the last step of the query, meaning you will have to look through all of lt, lc and lca in order to find your data. Also you have no index with left-most column 'type' on locations, so you still need a full table scan to find your data.

If you had some sample data and example of what you are trying to achieve it would perhaps be easier to help.

Affusion answered 3/1, 2012 at 18:48 Comment(6)
Thanks, query is much faster, but still takes too much. I updated my question with new query and explain.Auscultate
Sorry, this is maybe a silly question, but what did you mean by adding filters then?Auscultate
your query has to process the lt table, joining up on lc, joining up on lca. the filter you have is in lca, the last "step" of the query. it can then scan the lca table for rows that match type != "Unknown" but in order to get to that point it already has to read lt and lc, if that made sense. also you have a left join to that table, which means you can have NULL records there, yet you are filtering it in a WHERE clause, removing all NULL records (equal to an inner join). Maybe you meant your filter to be on lc, or on lt. If you filtered on lt table, it has less rows to scan in lc and lca.Affusion
Do you mean something like this ... WHERE lt.category != "Unknown"; We have about 12 000 records in localities table and about 4 000 records in locatings and the query takes half a minute to process. That's not OK, is it?Auscultate
That sounds more like it. If you now have an index on lt.category it should speed searching those up. it would then be more likely to use the following indexes on lc and lca tables. what does EXPLAIN say when you add such filter and index. Remember you cannot use KEY localizable_and_category_index (localizable_type,localizable_id,category) for only searching for category since it's not the leftmost column in the index.Affusion
I updated the question with actual query and explain. Looks like index is used, but we are getting the same processing time.Auscultate
H
2

try to experiment with forcing index - http://dev.mysql.com/doc/refman/5.1/en/index-hints.html, maybe it's just optimizer issue.

Halfwitted answered 3/1, 2012 at 19:26 Comment(2)
Also replace DISTINCT with a GROUP BYWessels
@FrancisAvila replacing DISTINCT with a GROUP BY doesn't make any difference.Auscultate
C
0

It looks like you're wanting the parents of the single result.

According to the person credited with defining Nested Sets in SQL, Joe Celko at http://www.ibase.ru/devinfo/DBMSTrees/sqltrees.html "This model is a natural way to show a parts explosion, because a final assembly is made of physically nested assemblies that break down into separate parts."

In other words, Nested Sets are used to filter children efficiently to an arbitrary number of independent levels within a single collection. You have two tables, but I don't see where the properties of the set "locatings" can't be de-normalized into "localities"?

If the localities table had a geometry column, could I not find the one locality from a "locating" and then select on the one table using a single filter: parent.lft <= row.left AND parent.rgt >= row.rgt ?

UPDATED

In this answer https://mcmap.net/q/1189651/-mysql-optimizing-finding-super-node-in-nested-set-tree, there is an example from http://explainextended.com/2009/09/29/adjacency-list-vs-nested-sets-mysql/ where the following example gets all the ancestors to an arbitrary depth of 100000:

SELECT  hp.id, hp.parent, hp.lft, hp.rgt, hp.data
FROM    (
    SELECT  @r AS _id,
            @level := @level + 1 AS level,
            (
            SELECT  @r := NULLIF(parent, 0)
            FROM    t_hierarchy hn
            WHERE   id = _id
            )
    FROM    (
            SELECT  @r := 1000000,
                    @level := 0
            ) vars,
            t_hierarchy hc
    WHERE   @r IS NOT NULL
    ) hc
JOIN    t_hierarchy hp
ON      hp.id = hc._id
ORDER BY
    level DESC
Convince answered 30/4, 2014 at 22:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.