Linked List in SQL

T

14

74

What's the best way to store a linked list in a MySQL database so that inserts are simple (i.e. you don't have to re-index a bunch of stuff every time) and such that the list can easily be pulled out in order?

Tula answered 15/9, 2008 at 18:2 Comment(1)

See also How do I sort a linked list in sql? It provides an example of sorting self-linked table with recursive CTE. – Opuntia 18/9, 2014 at 9:5

T

15

Store an integer column in your table called 'position'. Record a 0 for the first item in your list, a 1 for the second item, etc. Index that column in your database, and when you want to pull your values out, sort by that column.

 alter table linked_list add column position integer not null default 0;
 alter table linked_list add index position_index (position);
 select * from linked_list order by position;

To insert a value at index 3, modify the positions of rows 3 and above, and then insert:

 update linked_list set position = position + 1 where position >= 3;
 insert into linked_list (my_value, position) values ("new value", 3);

Tiller answered 15/9, 2008 at 18:6 Comment(6)

This doesn't meet the requirement of not having to reindex a bunch of stuff every time you to an insert. If you have a large table and insert at the beginning, you are going to have to touch every record in order to modify it's position field. – Swivet 10/5, 2012 at 16:44

Could we use a float value instead of an integer for this? Then inserts can take decimal values, without needing to increment other records.... no? – Subtenant 31/5, 2016 at 11:29

@Subtenant The finite precision of floating point limits the number of times you can insert between two points: (1-2) becomes (1-1.5) becomes (1-1.25) becomes (1-1.125) becomes (1-1.0625) ... until you run out of mantissa and can no longer distinguish the divisions. – Keble 12/8, 2016 at 13:41

Is there no way to use (arbitrary-length) strings to totally prevent loss of precision? – Leisha 24/8, 2016 at 3:27

@AhmedFasih ingenius. I am sure you will find a scheme that works. However, it might be a challenge to optimize the length of the strings, especially when you start deleting... – Generous 1/4, 2019 at 10:34

@Generous I wound up making a library doing what I described: github.com/fasiha/mudderjs#readme though you're right, the scheme doesn't optimize post hoc deletion. – Leisha 1/4, 2019 at 14:41

P

22

Using Adrian's solution, but instead of incrementing by 1, increment by 10 or even 100. Then insertions can be calculated at half of the difference of what you're inserting between without having to update everything below the insertion. Pick a number large enough to handle your average number of insertions - if its too small then you'll have to fall back to updating all rows with a higher position during an insertion.

Phase answered 15/9, 2008 at 18:11 Comment(8)

Reminds me of my days programming BASIC when line numbers were multiples of 10. – Coster 9/2, 2011 at 16:48

I've seen this done many times with older programs. I think its a good solution. – Antipus 28/8, 2012 at 16:50

I initially disliked this solution because it seems inelegant. But whatever, its effective and efficient, and frankly pretty clever. It handles insertions and deletions very quickly for the most part, and handles retrieval of the entire list very quickly, too. – Os 31/7, 2013 at 12:17

How do you handle collisions while inserting data? – Garwin 15/4, 2020 at 0:14

@Garwin if element with that order N exsits add 100 to order for all elements with order greater or equal N – Depreciate 19/1, 2023 at 16:46

In the organization I work for, we have a software using a similar principle. Problem is, moving elements is fairly frequent, and that's when we entered a world of pain. So it can work for simple storage, but a real linked list is the way to make sure the software will not fail because you can't insert new elements between some others anymore. – Barium 3/2, 2023 at 8:8

it will be nice if you could put an example.. – Casilde 9/5, 2023 at 16:49

However, I see that even though steep is 100, in the case of dividing consecutive decimal numbers in half, that number and its decimal part still very quickly do not meet the limited conditions of a decimal number. – Jamboree 3/1 at 2:57

C

19

create a table with two self referencing columns PreviousID and NextID. If the item is the first thing in the list PreviousID will be null, if it is the last, NextID will be null. The SQL will look something like this:

create table tblDummy
{
     PKColumn     int     not null, 
     PreviousID     int     null, 
     DataColumn1     varchar(50)     not null, 
     DataColumn2     varchar(50)     not null,  
     DataColumn3     varchar(50)     not null, 
     DataColumn4     varchar(50)     not null, 
     DataColumn5     varchar(50)     not null, 
     DataColumn6     varchar(50)     not null, 
     DataColumn7     varchar(50)     not null, 
     NextID     int     null
}

Competitive answered 15/9, 2008 at 18:12 Comment(9)

This was my first crack at a solution, however, it fails for the requirement that it be 'easily pulled out in order'. Our solution is the most like a linked list, but much harder to retrieve in order. – Exordium 15/9, 2008 at 18:29

I think there is no magical solution that both doesn't require updating all the indexes and is easily ordered. It really comes down to how your database is going to be manipulated most often. Will most changes be mid linked-list or will most data be appended to the end. – Draghound 26/5, 2013 at 1:38

Why have both PreviousID and NextID? Having only one of them, you can infer the other: foo.previous_id = bar_id => bar.next_id = foo_id. – Recitativo 19/3, 2015 at 9:18

Isn't PreviousID redundant? – Collodion 8/1, 2016 at 2:59

PreviousID isn't redundant, it can be used to do a quick revsere traversal in the list without having to execute more SQL to obtain it. Yes you could do it without but personally I prefer this solution. At the end of the day, it comes down to how your going to be accessing your data. – Alchemize 22/4, 2016 at 14:36

reverse (typo) :) – Alchemize 22/4, 2016 at 14:46

What's the best way to sort such a linked list in SQL? – Voronezh 24/4, 2017 at 13:53

Linked lists are made for easy inserts and removals (without moving a lot of memory). Not for easy reading. As databases are used more for archiving purposes, records don't change constantly and are read way more often than written. Hence, a linked list is not a good way to store data, and it isn't baked in to the SQL language to handle this case. Although reading it in order is pretty easy with a programming language (read first record, get id of next, read next record, ...). It's not easy to do in SQL – Slop 24/8, 2020 at 15:0

You can solve "easily pulled out in order" problem with a hierarchical query: #20216244 – Budge 17/2 at 13:51

T

15

Store an integer column in your table called 'position'. Record a 0 for the first item in your list, a 1 for the second item, etc. Index that column in your database, and when you want to pull your values out, sort by that column.

 alter table linked_list add column position integer not null default 0;
 alter table linked_list add index position_index (position);
 select * from linked_list order by position;

To insert a value at index 3, modify the positions of rows 3 and above, and then insert:

 update linked_list set position = position + 1 where position >= 3;
 insert into linked_list (my_value, position) values ("new value", 3);

Tiller answered 15/9, 2008 at 18:6 Comment(6)

This doesn't meet the requirement of not having to reindex a bunch of stuff every time you to an insert. If you have a large table and insert at the beginning, you are going to have to touch every record in order to modify it's position field. – Swivet 10/5, 2012 at 16:44

Could we use a float value instead of an integer for this? Then inserts can take decimal values, without needing to increment other records.... no? – Subtenant 31/5, 2016 at 11:29

@Subtenant The finite precision of floating point limits the number of times you can insert between two points: (1-2) becomes (1-1.5) becomes (1-1.25) becomes (1-1.125) becomes (1-1.0625) ... until you run out of mantissa and can no longer distinguish the divisions. – Keble 12/8, 2016 at 13:41

Is there no way to use (arbitrary-length) strings to totally prevent loss of precision? – Leisha 24/8, 2016 at 3:27

@AhmedFasih ingenius. I am sure you will find a scheme that works. However, it might be a challenge to optimize the length of the strings, especially when you start deleting... – Generous 1/4, 2019 at 10:34

@Generous I wound up making a library doing what I described: github.com/fasiha/mudderjs#readme though you're right, the scheme doesn't optimize post hoc deletion. – Leisha 1/4, 2019 at 14:41

S

4

A linked list can be stored using recursive pointers in the table. This is very much the same hierarchies are stored in Sql and this is using the recursive association pattern.

You can learn more about it here (Wayback Machine link).

I hope this helps.

Spivey answered 15/9, 2008 at 18:11 Comment(3)

I was thinking of doing it myself, and happy to see that I am not the first person to try this, thanks! – Goldshell 12/1, 2017 at 21:58

The explanation in this link helps me understand part of the problem, but how can we use this self-referencing table and write a query to pull a member of the list and its children in order? (In the linked example, this would return a list of managers and subordinates, in order from upper-management down) – Shoop 19/7, 2017 at 18:49

The link in this answer is now dead unfortunately. I managed to track it down on the wayback machine. web.archive.org/web/20180729174436/http://www.tomjewett.com/… – Sounder 16/5, 2019 at 15:39

O

3

This is something I've been trying to figure out for a while myself. The best way I've found so far is to create a single table for the linked list using the following format (this is pseudo code):

LinkedList(

key1,
information,
key2

)

key1 is the starting point. Key2 is a foreign key linking to itself in the next column. So your columns will link something link something like this

col1

key1 = 0,
information= 'hello'
key2 = 1

Key1 is primary key of col1. key2 is a foreign key leading to the key1 of col2

col2

key1 = 1,
information= 'wassup'
key2 = null

key2 from col2 is set to null because it doesn't point to anything

When you first enter a column in for the table, you'll need to make sure key2 is set to null or you'll get an error. After you enter the second column, you can go back and set key2 of the first column to the primary key of the second column.

This makes the best method to enter many entries at a time, then go back and set the foreign keys accordingly (or build a GUI that just does that for you)

Here's some actual code I've prepared (all actual code worked on MSSQL. You may want to do some research for the version of SQL you are using!):

createtable.sql

create table linkedlist00 (

key1 int primary key not null identity(1,1),

info varchar(10),

key2 int

)

register_foreign_key.sql

alter table dbo.linkedlist00

add foreign key (key2) references dbo.linkedlist00(key1)

*I put them into two seperate files, because it has to be done in two steps. MSSQL won't let you do it in one step, because the table doesn't exist yet for the foreign key to reference.

Linked List is especially powerful in one-to-many relationships. So if you've ever wanted to make an array of foreign keys? Well this is one way to do it! You can make a primary table that points to the first column in the linked-list table, and then instead of the "information" field, you can use a foreign key to the desired information table.

Example:

Let's say you have a Bureaucracy that keeps forms.

Let's say they have a table called file cabinet

FileCabinet(

Cabinet ID (pk)
Files ID (fk) )

each column contains a primary key for the cabinet and a foreign key for the files. These files could be tax forms, health insurance papers, field trip permissions slips etc

Files(

Files ID (pk)
File ID (fk)
Next File ID (fk)

)

this serves as a container for the Files

File(

File ID (pk)
Information on the file

)

this is the specific file

There may be better ways to do this and there are, depending on your specific needs. The example just illustrates possible usage.

Olympian answered 15/9, 2008 at 18:2 Comment(2)

In general the idea of having a linked list for storing general purpose data in a database is a terrible idea. While your design for the linked list will work it should only be used in specialised circumstances where you really need a linked list instead of an ordered set. – Sudderth 13/5, 2015 at 13:30

You are completely correct. I have database now, and this would be completely terrible and nearly impossible to maintain. I just did it to show the proof of concept. It'd be nice if they'd just add this as a standard function rather than requiring a hackjob like mine. – Olympian 13/6, 2016 at 19:32

S

3

The simplest option would be creating a table with a row per list item, a column for the item position, and columns for other data in the item. Then you can use ORDER BY on the position column to retrieve in the desired order.

create table linked_list
(   list_id   integer not null
,   position  integer not null 
,   data      varchar(100) not null
);
alter table linked_list add primary key ( list_id, position );

To manipulate the list just update the position and then insert/delete records as needed. So to insert an item into list 1 at index 3:

begin transaction;

update linked_list set position = position + 1 where position >= 3 and list_id = 1;

insert into linked_list (list_id, position, data)
values (1, 3, "some data");

commit;

Since operations on the list can require multiple commands (eg an insert will require an INSERT and an UPDATE), ensure you always perform the commands within a transaction.

A variation of this simple option is to have position incrementing by some factor for each item, say 100, so that when you perform an INSERT you don't always need to renumber the position of the following elements. However, this requires a little more effort to work out when to increment the following elements, so you lose simplicity but gain performance if you will have many inserts.

Depending on your requirements other options might appeal, such as:

If you want to perform lots of manipulations on the list and not many retrievals you may prefer to have an ID column pointing to the next item in the list, instead of using a position column. Then you need to iterative logic in the retrieval of the list in order to get the items in order. This can be relatively easily implemented in a stored proc.
If you have many lists, a quick way to serialise and deserialise your list to text/binary, and you only ever want to store and retrieve the entire list, then store the entire list as a single value in a single column. Probably not what you're asking for here though.

Sudderth answered 15/9, 2008 at 18:16 Comment(2)

Not a true linked list if you're needing to reindex the table on an insert. – Peonir 13/5, 2015 at 0:54

@andrewb, yes, as noted you might want to have a referencing ID instead. – Sudderth 13/5, 2015 at 13:32

O

2

https://dba.stackexchange.com/questions/46238/linked-list-in-sql-and-trees suggests a trick of using floating-point position column for fast inserts and ordering.

It also mentions specialized SQL Server 2014 hierarchyid feature.

Opuntia answered 15/9, 2008 at 18:2 Comment(1)

This is susceptible to running out of space. There are similar models that use gaps between numbers or fractions. But there are always walks through the space that eat up the numbers quickly. At some point you have to resort. Its the beast. If you do it at store, you at least aren't doing it for every read like truly storing a linked list. – Gentilism 6/6, 2019 at 3:4

A

2

This post is old but still going to give my .02$. Updating every record in a table or record set sounds crazy to solve ordering. the amount of indexing also crazy, but it sounds like most have accepted it.

Crazy solution i came up with to reduce updates and indexing is to create two tables (and in most use cases you don's sort all records in just one table anyway). Table A to hold the records of the list being sorted and table B to group and hold a record of the order as a string. the order string represents an array that can be used to order the selected records either on the web server or browser layer of a webpage application.

Create Table A{
Id int primary key identity(1,1),
Data varchar(10) not null
B_Id int
}

Create Table B{
Id int primary key Identity(1,1),
GroupName varchat(10) not null,
Order varchar(max) null
}

The format of the order sting should be id, position and some separator to split() your string by. in the case of jQuery UI the .sortable('serialize') function outputs an order string for you that is POST friendly that includes the id and position of each record in the list.

The real magic is the way you choose to reorder the selected list using the saved ordering string. this will depend on the application you are building. here is an example again from jQuery to reorder the list of items: http://ovisdevelopment.com/oramincite/?p=155

Allare answered 15/9, 2008 at 18:2 Comment(4)

Can you elaborate more, like an example of what the order be like? and the query to find the first 10 items from it? – Latticework 17/4, 2015 at 13:50

Assuming my "sorted list o' data" is of reasonable length, I could store the data records in a table, and store the desired key order as a string somewhere else in the database, then sort the data in-memory after I fetch it from the database. For example, if I have data records with primary keys (in order) of [100, 205, 95, 96], I would store a string "100,205,95,96" in some database field, and when I query the database to retrieve the 4 records, I would also retrieve the string "sort order", parse it to convert it to an array of integers, and use that as the record order. – Rolypoly 9/11, 2018 at 20:22

Of course, my previous comment assumes that the keys in the string match the actual keys in the database. You'd need to pay attention to data concurrency (i.e. deal with the case where two users update the same list at about the same time). – Rolypoly 9/11, 2018 at 20:28

Sure, you don't re-index, but you're still sorting the string and you can't write a SQL query to give you the items sorted. You could basically do the same thing here with a Hashtable and the linked list methodology, but you'd never be able to write query that sorted quickly and you'd be relying on being able to store the ids of the list in memory. – Gentilism 6/6, 2019 at 3:10

M

2

There are a few approaches I can think of right off, each with differing levels of complexity and flexibility. I'm assuming your goal is to preserve an order in retrieval, rather than requiring storage as an actual linked list.

The simplest method would be to assign an ordinal value to each record in the table (e.g. 1, 2, 3, ...). Then, when you retrieve the records, specify an order-by on the ordinal column to get them back in order.

This approach also allows you to retrieve the records without regard to membership in a list, but allows for membership in only one list, and may require an additional "list id" column to indicate to which list the record belongs.

An slightly more elaborate, but also more flexible approach would be to store information about membership in a list or lists in a separate table. The table would need 3 columns: The list id, the ordinal value, and a foreign key pointer to the data record. Under this approach, the underlying data knows nothing about its membership in lists, and can easily be included in multiple lists.

Madalynmadam answered 15/9, 2008 at 18:27 Comment(0)

C

1

Increment the SERIAL 'index' by 100, but manually add intermediate values with an 'index' equal to Prev+Next / 2. If you ever saturate the 100 rows, reorder the index back to 100s.

This should maintain sequence with primary index.

Collodion answered 15/9, 2008 at 18:2 Comment(0)

C

0

in this method you will avoid updating a lot rows every insertion or move to top/bottom, or moving node along the "list".

pre:
add new column to your table from type Double (for a big precision).

ALTER TABLE <table_name> ADD COLUMN custom_order FLOAT(53);

declare a integer variable STEP. (for example STEP=100;).

insert to top:
get the minimal value of custom_order in your table
and insert the new row with this custom_order = MIN(custom_order) - STEP

SELECT MIN(custom_order) FROM <table_name> WHERE ...  
INSERT/UPDATE ...

insert to bottom:
max_value equal to Max value of custom_order in your table
and insert the new row with custom_order = max_value + STEP

SELECT MAX(custom_order) FROM <table_name> WHERE ...
INSERT/UPDATE ...

insert between 2 rows:
take the id of 2 rows you want to insert between them, and calculating the custom_order by get the AVG/2

custom_order_AVG = (above_custom_order_row - below_custom_order_row)/2 + below_custom_order_row

(i doing it by safe AVG)
and insert / update your row with custom_order = custom_order_AVG

INSERT/UPDATE custom_order = custom_order_AVG ...

Casilde answered 15/9, 2008 at 18:2 Comment(0)

P

0

I think its much simpler adding a created column of Datetime type and a position column of int, so now you can have duplicate positions, at the select statement use the order by position, created desc option and your list will be fetched in order.

Parton answered 15/9, 2008 at 18:2 Comment(2)

Should write out the SQL statement for this answer. – Stultz 14/11, 2012 at 21:5

This would fail. Insert 2 records between positions n and n+1, you'll have to give both the position n but different time stamps. Now try to insert a third record in between the last inserted two! You can't because the time stamp is bigger than both. – Thyrse 5/1, 2013 at 5:15

T

-1

You could implement it like a double ended queue (deque) to support fast push/pop/delete(if oridnal is known) and retrieval you would have two data structures. One with the actual data and another with the number of elements added over the history of the key. Tradeoff: This method would be slower for any insert into the middle of the linked list O(n).

create table queue (
    primary_key,
    queue_key
    ordinal,
    data
)

You would have an index on queue_key+ordinal

You would also have another table which stores the number of rows EVER added to the queue...

create table queue_addcount (
    primary_key,
    add_count
)

When pushing a new item to either end of the queue (left or right) you would always increment the add_count.

If you push to the back you could set the ordinal...

ordinal = add_count + 1

If you push to the front you could set the ordinal...

ordinal = -(add_count + 1)

update

add_count = add_count + 1

This way you can delete anywhere in the queue/list and it would still return in order and you could also continue to push new items maintaining the order.

You could optionally rewrite the ordinal to avoid overflow if a lot of deletes have occurred.

You could also have an index on the ordinal to support fast ordered retrieval of the list.

If you want to support inserts into the middle you would need to find the ordinal which it needs to be insert at then insert with that ordinal. Then increment every ordinal by one following that insertion point. Also, increment the add_count as usual. If the ordinal is negative you could decrement all of the earlier ordinals to do fewer updates. This would be O(n)

Torture answered 15/9, 2008 at 18:2 Comment(0)

B

-1

A list can be stored by having a column contain the offset (list index position) -- an insert in the middle is then incrementing all above the new parent and then doing an insert.

Blastopore answered 15/9, 2008 at 18:5 Comment(1)

That's not a linked list, that's an indexed array – Peonir 13/5, 2015 at 0:58

Recommended topics

Hot tags