Pre-order transversal with on-the-fly path enumeration on adjacency representation
Nested sets from:
is the only efficient way I've seen of traversing, at the cost of slower updates. That's likely what most people will want for pre-order.
Closure table from https://mcmap.net/q/47253/-what-is-the-most-efficient-elegant-way-to-parse-a-flat-table-into-a-tree is interesting, but I don't see how to enforce pre-order there: MySQL Closure Table hierarchical database - How to pull information out in the correct order
Mostly for fun, here's a method that recursively calculates the 1.3.2.5. prefixes on the fly and sorts by them at the end, based only on the parent ID/child index representation.
Upsides:
- updates only need to update the indexes of each sibling
Downsides:
- n^2 memory usage worst case for a super deep tree. This could be quite serious, which is why I say this method is likely mostly for fun only. But maybe there is some ultra high update case where someone would want to use it? Who knows
- recursive queries, so reads are going to be less efficient than nested sets
Create and populate table:
CREATE TABLE "ParentIndexTree" (
"id" INTEGER PRIMARY KEY,
"parentId" INTEGER,
"childIndex" INTEGER NOT NULL,
"value" INTEGER NOT NULL,
"name" TEXT NOT NULL,
FOREIGN KEY ("parentId") REFERENCES "ParentIndexTree"(id)
)
;
INSERT INTO "ParentIndexTree" VALUES
(0, NULL, 0, 1, 'one' ),
(1, 0, 0, 2, 'two' ),
(2, 0, 1, 3, 'three'),
(3, 1, 0, 4, 'four' ),
(4, 1, 1, 5, 'five' )
;
Represented tree:
1
/ \
2 3
/ \
4 5
Then for a DBMS with arrays like PostgreSQL](https://www.postgresql.org/docs/14/arrays.html):
WITH RECURSIVE "TreeSearch" (
"id",
"parentId",
"childIndex",
"value",
"name",
"prefix"
) AS (
SELECT
"id",
"parentId",
"childIndex",
"value",
"name",
array[0]
FROM "ParentIndexTree"
WHERE "parentId" IS NULL
UNION ALL
SELECT
"child"."id",
"child"."parentId",
"child"."childIndex",
"child"."value",
"child"."name",
array_append("parent"."prefix", "child"."childIndex")
FROM "ParentIndexTree" AS "child"
JOIN "TreeSearch" AS "parent"
ON "child"."parentId" = "parent"."id"
)
SELECT * FROM "TreeSearch"
ORDER BY "prefix"
;
This creates on the fly prefixes of form:
1 -> 0
2 -> 0, 0
3 -> 0, 1
4 -> 0, 0, 0
5 -> 0, 0, 1
and then PostgreSQL then sorts by the arrays alphabetically as:
1 -> 0
2 -> 0, 0
4 -> 0, 0, 0
5 -> 0, 0, 1
3 -> 0, 1
which is the pre-order result that we want.
For a DBMS without arrays like SQLite, you can hack by encoding the prefix with a string of fixed width integers. Binary would be ideal, but I couldn't find out how, so hex would work. This of course means you will have to select a maximum depth that will fit in the number of bytes selected, e.g. below I choose 6 allowing for a maximum of 16^6 children per node.
WITH RECURSIVE "TreeSearch" (
"id",
"parentId",
"childIndex",
"value",
"name",
"prefix"
) AS (
SELECT
"id",
"parentId",
"childIndex",
"value",
"name",
'000000'
FROM "ParentIndexTree"
WHERE "parentId" IS NULL
UNION ALL
SELECT
"child"."id",
"child"."parentId",
"child"."childIndex",
"child"."value",
"child"."name",
"parent"."prefix" || printf('%06x', "child"."childIndex")
FROM "ParentIndexTree" AS "child"
JOIN "TreeSearch" AS "parent"
ON "child"."parentId" = "parent"."id"
)
SELECT * FROM "TreeSearch"
ORDER BY "prefix"
;
Some nested set notes
Here are a few points which confused me a bit after looking at the other nested set answers.
Jonny Buchanan shows his nested set setup as:
__________________________________________________________________________
| Root 1 |
| ________________________________ ________________________________ |
| | Child 1.1 | | Child 1.2 | |
| | ___________ ___________ | | ___________ ___________ | |
| | | C 1.1.1 | | C 1.1.2 | | | | C 1.2.1 | | C 1.2.2 | | |
1 2 3___________4 5___________6 7 8 9___________10 11__________12 13 14
| |________________________________| |________________________________| |
|__________________________________________________________________________|
which made me wonder why not just use the simpler looking:
__________________________________________________________________________
| Root 1 |
| ________________________________ _______________________________ |
| | Child 1.1 | | Child 1.2 | |
| | ___________ ___________ | | ___________ ___________ | |
| | | C 1.1.1 | | C 1.1.2 | | | | C 1.2.1 | | C 1.2.2 | | |
1 2 3___________| 4___________| | 5 6___________| 7___________| | |
| |________________________________| |_______________________________| |
|_________________________________________________________________________|
which does not have an extra number for each endpoint.
But then when I actually tried to implement it, I noticed that it was hard/impossible to implement the update queries like that, unless I had parent information as used by Konchog. The problem is that it was hard/impossible to distinguish between a sibling and a parent in one case while the tree was being moved around, and I needed that to decide if I was going to reduce the right hand side or not while closing a gap.
Left/size vs left/right: you could store it either way in the database, but I think left/right can be more efficient as you can index the DB with a multicolumn index (left, right) which can then be used to speed up ancestor queries, which are of type:
left < curLeft AND right > curLeft
Tested on Ubuntu 22.04, PostgreSQL 14.5, SQLite 3.34.0.