How do I sort a linked list in sql?
Asked Answered
K

4

21

I have implemented a linked list as a self-referencing database table:

CREATE TABLE LinkedList(
    Id bigint NOT NULL,
    ParentId bigint NULL,
    SomeData nvarchar(50) NOT NULL) 

where Id is the primary key, and ParentId is the Id of the previous node on the list. The first node has ParentId = NULL.

I now want to SELECT from the table, sorting the rows in the same order they should appear, as nodes on the list.

Eg.: if the table contains the rows

Id      ParentId  SomeData
24971   NULL      0
38324   24971     1
60088   60089     3
60089   38324     2
61039   61497     5
61497   60088     4
109397  109831    7
109831  61039     6

Then sorting it, using the criteria, should result in:

Id      ParentId  SomeData
24971   NULL      0
38324   24971     1
60089   38324     2
60088   60089     3
61497   60088     4
61039   61497     5
109831  61039     6
109397  109831    7

You're supposed to use the SomeData colum as a control, so please don't cheat doing ORDER by SomeData :-)

Kellam answered 5/2, 2009 at 12:40 Comment(2)
Your don't cheat comment: would have be better if you had choosen sample data which wouldn't get the correct result when sort on its own. That way "cheating" wouldn't be an option.Exarchate
Hmm... I don't think you understood my intention when adding that column. I just wanted to make life easier for people who might come up with an answer. Call it a "testing tool".Kellam
W
11

In Oracle:

SELECT Id, ParentId, SomeData
FROM (
  SELECT ll.*, level AS lvl
  FROM LinkedList ll
  START WITH
    ParentID IS NULL
  CONNECT BY
    ParentId = PRIOR Id
)
ORDER BY
  lvl

P. S. It's a bad practice to use NULL as ParentID, as it is not searchable by indices. Insert a surrogate root with id of 0 or -1 instead, and use START WITH ParentID = 0.

Weissmann answered 5/2, 2009 at 12:43 Comment(6)
That's a very good answer. How do I do the same in SQLServer 2005?Kellam
+1, but not for the anti-NULL comment: using 0 or -1 precludes having a foreign key to enforce integrity.Sceptre
You may always keep a surrogate root and keep it in the database. Full scan for a first entry will be too slow if the table contains many records.Weissmann
@Quassnoi: You don't need to create a stored procedure for SQL Server versions 2005 and up. They support Common Table Expressions (CTEs) that let you do the same thing that CONNECT BY does.Brame
Yes, CTEs are the way to do it in SQL Server. See my solution below.Kellam
To avoid NULL, the start of the list could use its own ID as its parent, so 'START WITH ParentID = Id' becomes the condition.Burrill
K
13

I found a solution for SQLServer, but looks big and much less elegant than Quassnoi's

WITH SortedList (Id, ParentId, SomeData, Level)
AS
(
  SELECT Id, ParentId, SomeData, 0 as Level
    FROM LinkedList
   WHERE ParentId IS NULL
  UNION ALL
  SELECT ll.Id, ll.ParentId, ll.SomeData, Level+1 as Level
    FROM LinkedList ll
   INNER JOIN SortedList as s
      ON ll.ParentId = s.Id
)

SELECT Id, ParentId, SomeData
  FROM SortedList
 ORDER BY Level
Kellam answered 5/2, 2009 at 13:5 Comment(4)
Actually, the name SortedList is a bit confusing - that isn't actually sorted - simply recursive... you need to ORDER BY the final SELECT (see my reply)Harkey
Fair enough, I have added the column Level and a ORDER BY in the final SELECT.Kellam
Can a similar approach be used with SQLite?Anamorphoscope
There is one problem with using the CTE, which is the recursion limit. By default this limit is 100, but you can increase it up to 32767 by adding 'option (maxrecursion 32767)' at the end of the queryMiscreance
W
11

In Oracle:

SELECT Id, ParentId, SomeData
FROM (
  SELECT ll.*, level AS lvl
  FROM LinkedList ll
  START WITH
    ParentID IS NULL
  CONNECT BY
    ParentId = PRIOR Id
)
ORDER BY
  lvl

P. S. It's a bad practice to use NULL as ParentID, as it is not searchable by indices. Insert a surrogate root with id of 0 or -1 instead, and use START WITH ParentID = 0.

Weissmann answered 5/2, 2009 at 12:43 Comment(6)
That's a very good answer. How do I do the same in SQLServer 2005?Kellam
+1, but not for the anti-NULL comment: using 0 or -1 precludes having a foreign key to enforce integrity.Sceptre
You may always keep a surrogate root and keep it in the database. Full scan for a first entry will be too slow if the table contains many records.Weissmann
@Quassnoi: You don't need to create a stored procedure for SQL Server versions 2005 and up. They support Common Table Expressions (CTEs) that let you do the same thing that CONNECT BY does.Brame
Yes, CTEs are the way to do it in SQL Server. See my solution below.Kellam
To avoid NULL, the start of the list could use its own ID as its parent, so 'START WITH ParentID = Id' becomes the condition.Burrill
H
6

(edit: d'oh! While I was debugging you found it too!)

In SQL Server:

;WITH cte (Id, ParentId, SomeData, [Level]) AS (
    SELECT Id, ParentId, SomeData, 0
    FROM LinkedList
    WHERE ParentId IS NULL
    UNION ALL
    SELECT ll.Id, ll.ParentId, ll.SomeData, cte.[Level] + 1
    FROM LinkedList ll
    INNER JOIN cte ON ll.ParentID = cte.ID
)
SELECT * FROM cte
ORDER BY [Level]
Harkey answered 5/2, 2009 at 13:10 Comment(3)
ah - misread the question; I thought you wanted to order it by SomeData; if you want it in linked-order, then [Level] is the way to goHarkey
Why the semi colon at the start?Marxism
@Marxism because CTEs are picky; you almost always end up needing the leading ; so I might as well add it. Add anything above the example and it goes boomHarkey
P
5

PostgreSQL version.

Create table, indexes and data:

DROP TABLE IF EXISTS LinkedList;

CREATE TABLE LinkedList (
    Id BIGINT NOT NULL,
    ParentId BIGINT NULL,
    SomeData VARCHAR(50)
);

CREATE INDEX LinkedList_Id_idx on LinkedList (Id);
CREATE index LinkedList_ParentId_idx on LinkedList (ParentId);

INSERT INTO LinkedList
    (Id, ParentId, SomeData)
VALUES 
    (24971,   NULL,      0),
    (38324,   24971,     1),
    (60088,   60089,     3),
    (60089,   38324,     2),
    (61039,   61497,     5),
    (61497,   60088,     4),
    (109397,  109831,    7),
    (109831,  61039,     6);

Actual query:

WITH RECURSIVE SortedList AS (
    SELECT
        *,
        0 AS SortKey
    FROM LinkedList
    WHERE ParentId IS NULL
    UNION ALL (
        SELECT
            LinkedList.*,
            SortedList.SortKey + 1 AS SortKey
        FROM LinkedList
        INNER JOIN SortedList
            ON (LinkedList.ParentId = SortedList.Id)
    )
)
SELECT
    *
FROM SortedList
ORDER BY SortKey;

Results:

   id   | parentid | somedata | sortkey
--------+----------+----------+---------
  24971 |          | 0        |       0
  38324 |    24971 | 1        |       1
  60089 |    38324 | 2        |       2
  60088 |    60089 | 3        |       3
  61497 |    60088 | 4        |       4
  61039 |    61497 | 5        |       5
 109831 |    61039 | 6        |       6
 109397 |   109831 | 7        |       7

Also did some benchmarks:

\set N 10000

DELETE FROM LinkedList;
INSERT INTO LinkedList VALUES (1, NULL, 1);
INSERT INTO LinkedList (
    SELECT
        generate_series AS Id,
        (generate_series - 1) AS ParentId,
        generate_series AS SomeData
    FROM GENERATE_SERIES(2, :N)
);

EXPLAIN ANALYZE
WITH RECURSIVE SortedList AS (
    SELECT
        *,
        0 AS SortKey
    FROM LinkedList
    WHERE ParentId IS NULL
    UNION ALL (
        SELECT
            LinkedList.*,
            SortedList.SortKey + 1 AS SortKey
        FROM LinkedList
        INNER JOIN SortedList
            ON (LinkedList.ParentId = SortedList.Id)
    )
)
SELECT
    *
FROM SortedList
ORDER BY SortKey;

Results:

Sort  (cost=6236.12..6300.16 rows=25616 width=138) (actual time=17857.640..17858.207 rows=10000 loops=1)
  Sort Key: sortedlist.sortkey
  Sort Method: quicksort  Memory: 1166kB
  CTE sortedlist
    ->  Recursive Union  (cost=4.40..2007.10 rows=25616 width=138) (actual time=0.032..17844.139 rows=10000 loops=1)
          ->  Bitmap Heap Scan on linkedlist  (cost=4.40..42.78 rows=16 width=138) (actual time=0.031..0.032 rows=1 loops=1)
                Recheck Cond: (parentid IS NULL)
                Heap Blocks: exact=1
                ->  Bitmap Index Scan on linkedlist_parentid_idx  (cost=0.00..4.40 rows=16 width=0) (actual time=0.006..0.006 rows=2 loops=1)
                      Index Cond: (parentid IS NULL)
          ->  Hash Join  (cost=5.20..145.20 rows=2560 width=138) (actual time=0.896..1.780 rows=1 loops=10000)
                Hash Cond: (linkedlist_1.parentid = sortedlist_1.id)
                ->  Seq Scan on linkedlist linkedlist_1  (cost=0.00..96.00 rows=3200 width=134) (actual time=0.002..0.784 rows=10000 loops=10000)
                ->  Hash  (cost=3.20..3.20 rows=160 width=12) (actual time=0.001..0.001 rows=1 loops=10000)
                      Buckets: 1024  Batches: 1  Memory Usage: 9kB
                      ->  WorkTable Scan on sortedlist sortedlist_1  (cost=0.00..3.20 rows=160 width=12) (actual time=0.000..0.001 rows=1 loops=10000)
  ->  CTE Scan on sortedlist  (cost=0.00..512.32 rows=25616 width=138) (actual time=0.034..17851.344 rows=10000 loops=1)
Planning Time: 0.163 ms
Execution Time: 17858.957 ms

So this query is pretty slow.

Poussette answered 25/9, 2019 at 12:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.