Which Hierarchical model should I use? Adjacency, Nested, or Enumerated?

Asked 28/1, 2011 at 17:22 Answered 14/2, 2011 at 21:8

Solved mysql enumeration nested-sets adjacency-list

I have a table which contains a location of all geographical locations in the world and their relationships.

Here is a example that shows the hierarchy. You will see that the data is actually stored as all three

Enumerated Path
Adjacency list
Nested Set

The data obviously never changes either. Below is an example of direct ancestors of the location Brighton in England which has a woeid of 13911.

Table: geoplanet_places (Has 5.6million rows) Large Image: http://chrisacky.com/ancestors.jpg

I then have another table called entities. This table stores my items which I would like to map to a geographical location. I store some basic information but most important I store the woeid which is a foreign key from geoplanet_places. enter image description here

Eventually the entities table will contain several thousand entities. And I would like a way to be able to return a full tree of all of the nodes which contain entities.

I plan on creating something to facilitate the filtering and searching of entities based on their geographical location and be able to discover how many entities can be found on that particular node.

So if I only have one entity in my entities table, I might have something like this

`Earth (1)

United Kingdom (1)

England (1)

East Sussex (1)

Brighton and Hove City (1)

Brighton (1)`

Lets then say that I have another entity which is located in Devon, then it would show something like:

Earth (2)

United Kingom (2)

England (2)

Devon (1)

East Sussex (1) ... etc

The (Counts) which will say how many entities are "inside" of each geographical location do not need to be live. I can live with generating my object every hour and caching it.

The aim, is to be able to create an interface which might start out showing only the Countries which have entities..

So like

Argentina (1021), Chile (291), ..., United States (32,103), United Kingdom (12,338)

Then the user will click on a location, such as United Kindom, and will then be given all of the immediate child nodes which are descendants of United Kingdom AND have an entity in them.

If there are 32 Counties in United Kindgdom, but only 23 of them eventually when you drill down have entities stored in them, then I don't want to display the other 9. It is only locations.

This site aptly demonstrates the functionality that I wish to achieve: http://www.homeaway.com/vacation-rentals/europe/r5 enter image description here

How do you recommend that I manage such a data structure?

Things I am using.

PHP
MySQL
Solr

I plan on having the Drill downs be as rapid as possible. I want to create an AJAX interface that will be seemless for searching.

I would also be interested to know which columns you would recommend indexing on.

Bigamist answered 28/1, 2011 at 17:22 Comment(0)

Typically, there are three kinds of queries in the hierarchies which cause troubles:

Return all ancestors
Return all descendants
Return all children (immediate descendants).

Here's a little table which shows the performance of different methods in MySQL:

                        Ancestors  Descendants  Children        Maintainability InnoDB
Adjacency list          Good       Decent       Excellent       Easy            Yes
Nested sets (classic)   Poor       Excellent    Poor/Excellent  Very hard       Yes
Nested sets (spatial)   Excellent  Very good    Poor/Excellent  Very hard       No
Materialized path       Excellent  Very good    Poor/Excellent  Hard            Yes

In children, poor/excellent means that the answer depends on whether you are mixing the method with adjacency list, i. e. storing the parentID in each record.

For your task, you need all three queries:

All ancestors to show the Earth / UK / Devon thing
All children to show "Destinations in Europe" (the items)
All descendants to show "Destinations in Europe" (the counts)

I would go for materialized paths, since this kind of hierarchy rarely changes (only in case of war, revolt etc).

Create a varchar column called path, index it and fill it with the value like this:

1:234:6345:45454:

where the numbers are primary keys of the appropriate parents, in correct order (1 for Europe, 234 for UK etc.)

You will also need a table called levels to keep numbers from 1 to 20 (or whatever maximum nesting level you want).

To select all ancestors:

SELECT   pa.*
FROM     places p
JOIN     levels l
ON       SUBSTRING_INDEX(p.path, ':', l.level) <> p.path
JOIN     places pa
ON       pa.path = CONCAT(SUBSTRING_INDEX(p.path, ':', l.level), ':') 
WHERE    p.id = @id_of_place_in_devon

To select all children and counts of places within them:

SELECT  pc.*, COUNT(pp.id)
FROM    places p
JOIN    places pc
ON      pc.parentId = p.id
JOIN    places pp
ON      pp.path BETWEEN pc.path AND CONCAT(pc.path, ':')
        AND pp.id NOT IN
        (
        SELECT  parentId
        FROM    places
        )
WHERE   p.id = @id_of_europe
GROUP BY
        pc.id

Pentheus answered 28/1, 2011 at 17:42 Comment(7)

How might you tackle such a question. As you can see I do have the parentID and the lft rgt values. I'm not sure if I am looking at the problem from completely the wrong perspective. Maybe I need to take a step back. For example, I will only want to return the immediate children of any one node and the (Count). But to get this Count value, I would still have to create a difficult query. The problem being that the Count value is calculated in the query and wouldn't be persisted. If I save the Count value then I could potentially use it in my query also. I'm just confused lots. :) – Bigamist 28/1, 2011 at 17:49

What should the pp.id/pp.path be in the second query? And must all of the paths end with : also? – Bigamist 28/1, 2011 at 18:56

Laykes: sorry, forgot to add a GROUP BY. pp is a table which selects all descendants for each of the Europe's children which are not the categories themselves. It is just an alias for the same places table. – Pentheus 28/1, 2011 at 19:2

Two points, (1) On your first query, to select all ancestors, it selects ancestors, but also siblings who share the common parent. Was this expected result? (2) On the second query, I get ambigious column for parentId. – Bigamist 28/1, 2011 at 19:42

@Laykes: could you please post some sample data? – Pentheus 28/1, 2011 at 19:43

Ah, also Quassnoi, I just realised that on the second query, the Count isn't actually the amount of children nodes that I want, but instead, the Count of the entities which I have added to it. The places table is purely locations, and then in a entities table, I have my properties which are locational specific. So that is where I have the foreign key to the location node. – Bigamist 28/1, 2011 at 19:45

Yes. I will post some sample data now. (Would a SQL schema and sample file be sufificient?). I could dump my current schema and data but with 5.6 million rows it is a little large? – Bigamist 28/1, 2011 at 19:46

This is the query that I came up. It is an adaption of what you suggestion Quassnoi.

SELECT   pa.*,  level, SUBSTRING_INDEX(p.ancestry, '/', l.level),  p.*
FROM     geoplanet_places p
JOIN     levels l
ON       SUBSTRING_INDEX(p.ancestry, '/', l.level) <> p.ancestry 
JOIN     geoplanet_places  pa
ON       pa.woeid =  SUBSTRING_INDEX( SUBSTRING_INDEX(p.ancestry, '/', l.level),'/',-1)
WHERE    p.woeid = "13911"

This returns all of the parents of Brighton.

The problem with your query was that it wasn't return the path to parents, but instead any node which shared the same path.

SELECT     pa.*, GROUP_CONCAT(pa.name ORDER BY pa.lft asc),group_concat( pa.lft  ), pa.ancestry
                                            FROM     geo_places p
                                            JOIN     levels l
                                            ON       SUBSTRING_INDEX(CONCAT(p.ancestry, p.woeid,'/'), '/', l.level) <> p.ancestry 
                                            JOIN     geo_places  pa
                                            ON       pa.woeid =  SUBSTRING_INDEX( SUBSTRING_INDEX(CONCAT(p.ancestry, p.woeid,'/'), '/', l.level),'/',-1)
                                            WHERE    p.woeid IN ("12767488","12832668","12844837","131390","131391","12846428","24534461")
                                            GROUP BY p.woeid

Bigamist answered 14/2, 2011 at 21:8 Comment(0)

Recommended topics

Hot tags