Best representation of an ordered list in a database?
Asked Answered
D

8

95

I know that this sort of goes against the principles of a relational database but let me describe the situation.

I have a page where the user will place a number of items.

 ________________
| -Item1         |
| -Item2         |
| -Item3         |
| -Item4         |
|________________|

These items have must stay in a the order the user gives them. However this order may be changed an arbitrary number of times by the user.

 ________________
| -Item1         |
| -Item4         |
| -Item2         |
| -Item3         |
|________________|

Approach 1

My original thought was to give the items an index to represent thier place in the list

Page           Item
-----------    ---------------
FK | pid       FK | pid 
   | name      PK | iid 
                  | index
                  | content 

With this solution you can select items where pid = Page.pid and order by index which is convenient. However every time you change the order you have to change anywhere between one other item (best case) and all the other items (worst case).

Approach 2

I also considered making a "linked list" like data structure where each item points to the next item in the list.

Page           Item
-----------    ---------------
FK | pid       FK | pid 
   | name      PK | iid 
                  | next
                  | content 

This potentially makes changing the order less expensive but we would have to rely on front end programming to extract the order.

Is there an approach that I haven't thought of? Please let me know.

Distorted answered 2/3, 2012 at 15:51 Comment(6)
Why not just delete/insert the new list with a new sequential index?Cost
In practice, how many times a user chanegs the order? And how many items do you expect per user?Vestavestal
Data that records the order of something is in no way "against the principles of a relational database". The point is that in a relational database the ordering is not inherent in the structure of the database itself. In a relational database therefore, such information must always be stored as values of attributes within tuples within relations - which is exactly what you are proposing.Guadalajara
Possible duplicate of What would be the best way to store records order in SQLShanks
My question might seem very naive, but why not store it directly as an array (the column type is an array)? For example using postgresqlBrachypterous
@Brachypterous That's a good idea. Assuming that the order of the SQL results is not important and we can do some additional processing in the back-end or front-end application after data retrieval, we can store a list of primary keys somewhere in one single varchar column in CSV format that the value can be maintained using the front-end or back-end application. In this way, we can store all the ordering configs in the ordering_config table and we don't need a column just for ordering, data tables only keep data and configs are stored elsewhere.Rakel
G
13

I think @a1ex07 is on the right track here (+1). I don't think gaps in itemOrder violate 3NF, but I do worry about a different violation of 3NF (more on this below). We also have to watch out for bad data in the itemOrder field. Here's how I'd start:

create table pages (
  pid int,
  primary key (pid)
);

create table users (
  uid int,
  primary key (uid)
);

create table items (
  iid int,
  primary key (iid)
);

create table details (
  pid int not null references pages(pid),
  uid int not null references users(uid),
  iid int not null references items(iid), 
  itemOrder int,
  primary key (pid, uid, iid),
  unique (pid, uid, itemOrder)
);

The primary key ensures that for each page, for each user, there are unique items. The unique constraint ensures that for each page, for each user, there are unique itemOrders. Here's my worry about 3NF: in this scenario, itemOrder is not fully dependent on the primary key; it depends only on the (pid, uid) parts. That's not even 2NF; and that's a problem. We could include itemOrder in the primary key, but then I worry that it might not be minimal, as PKs need to be. We might need to decompose this into more tables. Still thinking . . .


[ EDIT - More thinking on the topic . . . ]

Assumptions

  1. There are users.

  2. There are pages.

  3. There are items.

  4. (page, user) identifies a SET of items.

  5. (page, user) identifies an ordered LIST of slots in which we can store items if we like.

  6. We do not wish to have duplicate items in a (page,user)'s list.

Plan A

Kill the details table, above.

Add a table, ItemsByPageAndUser, to represent the SET of items identified by (page, user).

create table ItemsByPageAndUser (
   pid int not null references pages(pid),
   uid int not null references users(uid),
   iid int not null references items(iid),
  primary key (pid, uid, iid)   
)

Add table, SlotsByPageAndUser, to represent the ordered LIST of slots that might contain items.

create table SlotsByPageAndUser (
   pid       int not null references pages(pid),
   uid       int not null references users(uid),
   slotNum   int not null,
   iidInSlot int          references items(iid),
 primary key (pid, uid, slotNum),   
 foreign key (pid, uid, iid) references ItemsByPageAndUser(pid, uid, iid),
 unique (pid, uid, iid)
)

Note 1: iidInSlot is nullable so that we can have empty slots if we want to. But if there is an item present it has to be checked against the items table.

Note 2: We need the last FK to ensure that we don't add any items that are not in the set of possible items for this (user,page).

Note 3: The unique constraint on (pid, uid, iid) enforces our design goal of having unique items in the list (assumption 6). Without this we could add as many items from the set identified by (page,user) as we like so long as they are in different slots.

Now we have nicely decoupled the items from their slots while preserving their common dependence on (page, user).

This design is certainly in 3NF and might be in BCNF, though I worry about SlotsByPageAndUser in that regard.

The problem is that because of the unique constraint in table SlotsByPageAndUser the cardinality of the relationship between SlotsByPageAndUser and ItemsByPageAndUser is one-to-one. In general, 1-1 relationships that are not entity subtypes are wrong. There are exceptions, of course, and maybe this is one. But maybe there's an even better way . . .

Plan B

  1. Kill the SlotsByPageAndUser table.

  2. Add a slotNum column to ItemsByPageAndUser.

  3. Add a unique constraint on (pid, uid, iid) to ItemsByPageAndUser.

Now it's:

create table ItemsByPageAndUser (
   pid     int not null references pages(pid),
   uid     int not null references users(uid),
   iid     int not null references items(iid),
   slotNum int,
 primary key (pid, uid, iid),   
 unique (pid, uid, slotNum)
)

Note 4: Leaving slotNum nullable preserves our ability to specify items in the set that are not in the list. But . . .

Note 5: Putting a unique constraint on a expression involving a nullable column might cause "interesting" results in some databases. I think it will work as we intend it to in Postgres. (See this discussion here on SO.) For other databases, your mileage may vary.

Now there is no messy 1-1 relationship hanging around, so that's better. It's still 3NF as the only non-key attribute (slotNum) depends on the key, the whole key, and nothing but the key. (You can't ask about slotNum without telling me what page, user, and item you are talking about.)

It's not BCNF because [ (pid, uid, iid) -> slotNum ] and [(pid,uid,slotNum) -> iid ]. But that's why we have the unique constraint on (pid, uid, slotNum) which prevents the data from getting into an inconsistent state.

I think this is a workable solution.

Gut answered 3/3, 2012 at 18:29 Comment(4)
Plan B is the same as the original post? At least that's what I see.Thistledown
Third normal form (3NF) is the third step in normalizing a database and it builds on the first and second normal forms, 1NF and 2NF.Snowwhite
Nothing in this violates 3NF or any other NF or has to do with affecting NFs, because itemorder is a CK. (NF definitions involve all CKs, not PKs, & it doesn't matter what CK might get called PK.) This post does not reflect an understanding of normalization & NFs. @Snowwhite 3NF was the 3rd found. Normalization to such NFs based on FDs & JDs does not involve going through less strict ones.Shanks
Slots in a separate table is indeed the correct way of avoiding integrity check failures when updating indexesDeprecative
G
96

Solution: make index a string (because strings, in essence, have infinite "arbitrary precision"). Or if you use an int, increment index by 100 instead of 1.

The performance problem is this: there is no "in between" values between two sorted items.

item      index
-----------------
gizmo     1
              <<------ Oh no! no room between 1 and 2.
                       This requires incrementing _every_ item after it
gadget    2
gear      3
toolkit   4
box       5

Instead, do like this (better solution below):

item      index
-----------------
gizmo     100
              <<------ Sweet :). I can re-order 99 (!) items here
                       without having to change anything else
gadget    200
gear      300
toolkit   400
box       500

Even better: here is how Jira solves this problem. Their "rank" (what you call index) is a string value that allows a ton of breathing room in between ranked items.

Here is a real example of a jira database I work with

   id    | jira_rank
---------+------------
 AP-2405 | 0|hzztxk:
 ES-213  | 0|hzztxs:
 AP-2660 | 0|hzztzc:
 AP-2688 | 0|hzztzk:
 AP-2643 | 0|hzztzs:
 AP-2208 | 0|hzztzw:
 AP-2700 | 0|hzztzy:
 AP-2702 | 0|hzztzz:
 AP-2411 | 0|hzztzz:i
 AP-2440 | 0|hzztzz:r

Notice this example hzztzz:i. The advantage of a string rank is that you run out of room between two items, you still don't have to re-rank anything else. You just start appending more characters to the string to narrow down focus.

EDIT: as mentioned in the comments, you can't insert anything between 0|hzztzz: and 0|hzztzz:a. I guess that's why I see jira's database automatically append :i at the end regularly instead of :a to avoid that scenario. If you really want to prevent problems, then I think you can change your algorithm so that (for example) every time you would insert :a at the end, you instead insert :ai. This way you logically guarantee that no ranking will end with the letter a -- which should mean that you will always have "room" to insert more items without having to re-order anything.

Glyceric answered 21/4, 2018 at 13:12 Comment(14)
Do you have any reference material re: the string approach?Manufacturer
Best I could find quickly: confluence.atlassian.com/jirakb/…Glyceric
you can't insert anything between 0|hzztzz: and 0|hzztzz:a, thoughWhiplash
lol, @njzk, just yesterday I was trying to implement this myself and discovered that. I assume that's why jira's rank always starts with "i" after colon. That and similar algorithmic tricks should hopefully avoid that edge case from happening. I guess then there's some tradeoff to consider between this and using floating point or int.Glyceric
@AlexanderBird I'm looking into getting a working implementation, and I'd say by inserting a second letter everytime you reach a it should mitigate the issueWhiplash
@Whiplash any luck?Kruller
@SamLevin I have a basic implementation here that works as far as I have tested it: github.com/smaspe/taskman/tree/master/src/tools Sometime when I have time I'll separate it from that repo and make it into a libraryWhiplash
@njzk2: The solution to this problem is straightforward. We treat those strings as fractions base B, with an infinite number of trailing zeros. Accordingly all strings we insert should end with a non-zero digit, because the zeros are implied. Lexicographical ordering of the strings agrees with the numeric ordering. We can always find a B-adic number between any two B-adic fractions (e.g. (X+Y)/2, though it's better to compute the one with the smallest denominator). If [a-z] is our alphabet with a being the zero digit, then between(aax, aaxb) == aaxab.Survance
@ybungalobill yes. That's essentially what I tried to do last year. I haven't had much time to refine it though.Whiplash
In the Jira example, what is the purpose of 0| in the beginning and the colon in the end of jira_rank? Wouldn't a string of characters be sufficient?Lynxeyed
@Redline JIRA uses that to refer to the bucket number. They have 3 buckets - 0, 1 & 2. When a bucket exhausts all rank values, JIRA uses the next bucket for an operation known as 'balancing'.Latrena
For anyone wondering how to implement the alphabetical rank solution, @m69 had an amazing answer complete with the needed algorithm in js and CLynxeyed
There is a nice implementation of this in TypeScript here: @wewatch/lexorank. Demo showing arbitrary precision in action: codesandbox.io/s/lexorank-demo-bwf1pf?file=/src/index.ts (naturally, you would not normally insert items like this; it's just to demonstrate the arbitrary precision)Ministrant
Another potentially simpler to implement solution is to space integer ranks by 100 and just check for collisions after reordering (if another item already has the rank you were about to assign to this item), and run a simple "respacing" process. Many use cases will mostly be rearranged near the end of the list, making this process not too expensive.Adherence
G
13

I think @a1ex07 is on the right track here (+1). I don't think gaps in itemOrder violate 3NF, but I do worry about a different violation of 3NF (more on this below). We also have to watch out for bad data in the itemOrder field. Here's how I'd start:

create table pages (
  pid int,
  primary key (pid)
);

create table users (
  uid int,
  primary key (uid)
);

create table items (
  iid int,
  primary key (iid)
);

create table details (
  pid int not null references pages(pid),
  uid int not null references users(uid),
  iid int not null references items(iid), 
  itemOrder int,
  primary key (pid, uid, iid),
  unique (pid, uid, itemOrder)
);

The primary key ensures that for each page, for each user, there are unique items. The unique constraint ensures that for each page, for each user, there are unique itemOrders. Here's my worry about 3NF: in this scenario, itemOrder is not fully dependent on the primary key; it depends only on the (pid, uid) parts. That's not even 2NF; and that's a problem. We could include itemOrder in the primary key, but then I worry that it might not be minimal, as PKs need to be. We might need to decompose this into more tables. Still thinking . . .


[ EDIT - More thinking on the topic . . . ]

Assumptions

  1. There are users.

  2. There are pages.

  3. There are items.

  4. (page, user) identifies a SET of items.

  5. (page, user) identifies an ordered LIST of slots in which we can store items if we like.

  6. We do not wish to have duplicate items in a (page,user)'s list.

Plan A

Kill the details table, above.

Add a table, ItemsByPageAndUser, to represent the SET of items identified by (page, user).

create table ItemsByPageAndUser (
   pid int not null references pages(pid),
   uid int not null references users(uid),
   iid int not null references items(iid),
  primary key (pid, uid, iid)   
)

Add table, SlotsByPageAndUser, to represent the ordered LIST of slots that might contain items.

create table SlotsByPageAndUser (
   pid       int not null references pages(pid),
   uid       int not null references users(uid),
   slotNum   int not null,
   iidInSlot int          references items(iid),
 primary key (pid, uid, slotNum),   
 foreign key (pid, uid, iid) references ItemsByPageAndUser(pid, uid, iid),
 unique (pid, uid, iid)
)

Note 1: iidInSlot is nullable so that we can have empty slots if we want to. But if there is an item present it has to be checked against the items table.

Note 2: We need the last FK to ensure that we don't add any items that are not in the set of possible items for this (user,page).

Note 3: The unique constraint on (pid, uid, iid) enforces our design goal of having unique items in the list (assumption 6). Without this we could add as many items from the set identified by (page,user) as we like so long as they are in different slots.

Now we have nicely decoupled the items from their slots while preserving their common dependence on (page, user).

This design is certainly in 3NF and might be in BCNF, though I worry about SlotsByPageAndUser in that regard.

The problem is that because of the unique constraint in table SlotsByPageAndUser the cardinality of the relationship between SlotsByPageAndUser and ItemsByPageAndUser is one-to-one. In general, 1-1 relationships that are not entity subtypes are wrong. There are exceptions, of course, and maybe this is one. But maybe there's an even better way . . .

Plan B

  1. Kill the SlotsByPageAndUser table.

  2. Add a slotNum column to ItemsByPageAndUser.

  3. Add a unique constraint on (pid, uid, iid) to ItemsByPageAndUser.

Now it's:

create table ItemsByPageAndUser (
   pid     int not null references pages(pid),
   uid     int not null references users(uid),
   iid     int not null references items(iid),
   slotNum int,
 primary key (pid, uid, iid),   
 unique (pid, uid, slotNum)
)

Note 4: Leaving slotNum nullable preserves our ability to specify items in the set that are not in the list. But . . .

Note 5: Putting a unique constraint on a expression involving a nullable column might cause "interesting" results in some databases. I think it will work as we intend it to in Postgres. (See this discussion here on SO.) For other databases, your mileage may vary.

Now there is no messy 1-1 relationship hanging around, so that's better. It's still 3NF as the only non-key attribute (slotNum) depends on the key, the whole key, and nothing but the key. (You can't ask about slotNum without telling me what page, user, and item you are talking about.)

It's not BCNF because [ (pid, uid, iid) -> slotNum ] and [(pid,uid,slotNum) -> iid ]. But that's why we have the unique constraint on (pid, uid, slotNum) which prevents the data from getting into an inconsistent state.

I think this is a workable solution.

Gut answered 3/3, 2012 at 18:29 Comment(4)
Plan B is the same as the original post? At least that's what I see.Thistledown
Third normal form (3NF) is the third step in normalizing a database and it builds on the first and second normal forms, 1NF and 2NF.Snowwhite
Nothing in this violates 3NF or any other NF or has to do with affecting NFs, because itemorder is a CK. (NF definitions involve all CKs, not PKs, & it doesn't matter what CK might get called PK.) This post does not reflect an understanding of normalization & NFs. @Snowwhite 3NF was the 3rd found. Normalization to such NFs based on FDs & JDs does not involve going through less strict ones.Shanks
Slots in a separate table is indeed the correct way of avoiding integrity check failures when updating indexesDeprecative
E
8

You could add a new character (nvarchar) column to the Page table called order that contains a delimited list of iid's in the order you prefer, i.e. 1,4,3,2. The advantage is just one field in one table to maintain - the obvious disadvantage would be the need to write a utility function(s) to convert between the character and numeric types which in reality probably wouldn't take too long.

Electron answered 2/3, 2012 at 16:7 Comment(2)
I don't understand why this is such a bad idea. There are many problems with reordering items in relational databases; this takes a page right out of NoSQL and solves it quickly. I agree it might not be so nice, but it does solve the problem quickly.Stinkwood
This is definitely the right answer if the list size is bounded and "small". If the list has the potential to be 100,000 items or 100,000,000 items this doesn't work.Adherence
V
3

If you expect number of items is not huge, you can use a bit modified version of your first approach. Just make gap between consecutive indexes. For example, first item has index 100, second 200, etc. This way you don't have to update all indexes every time, only if you cannot find a gap

Vestavestal answered 2/3, 2012 at 16:18 Comment(2)
This is something I have considered (an idea left over from programming BASIC in high-school), something feels wrong about it though. I think that it breaks 3NF to have gaps like this since the meaning of the number is now dependent on the values of all the other items.Distorted
@Greg Guida: I'm not sure I follow. Meaning of number is still the same . 1 is less than 10, and 5 is less than 10.Vestavestal
H
3

Use the Approach 1 and live with the performance implications of index updates. Unless you are dealing with millions of items per page, you are unlikely to find the performance lacking, and you retain all the power of SQL in dealing with sets of data.

In addition to being much harder to work with from the pure non-procedural SQL, the Approach 2 would still require you to traverse the list to find the right place to reconnect the "links" when reordering the item.

Hardship answered 2/3, 2012 at 17:32 Comment(5)
About your second statement, there is no need to traverse the whole list to exchange two items. All you need is these two items, the two items pointing to them and the two items pointed by them. All of these can be selected with very simple conjunctive queries. (Sorry for necro-posting, I don't know if this is bad practice here ...)Ddt
@FabianPijcke Sure, if you have a doubly-linked list. I was commenting on the original poster's "Approach 2" which appears to be single-linked. That being said, I probably should have mentioned doubly-linked list... Maybe you can post that answer?Hardship
@FabianPijcke But in both cases, ordering the list (e.g. if you want to display it in-order) would be inefficient.Hardship
I don't agree, even with a classic linked list, given a node N, you can get the predecessor of N with something like "SELECT id FROM node WHERE successor = N" and its successor with "SELECT successor FROM node WHERE id = N". I agree with the fact that printing the list is not efficient.Ddt
@FabianPijcke Sure, but you'd have to index successor to make it efficient. That index acts as a reverse pointer, making the list doubly-linked. Whether it's better to implement the reverse pointer as index or as explicit field (without index) is another matter...Hardship
U
2

Why not building an ordered list like a linked list in C like you suggested ?

Create an "Orderedlist" table that represents your different lists. Then create a "ListElement" table that has a (nullable) self-reference pointing to the next listElement. An instance of OrderedList table would have a reference pointing to an instance of the ListElement table called "StartingElement".

In case you need reordering, simply update the "NextElement" reference from involved nodes ;)

In case some visuals could help : https://www.geeksforgeeks.org/data-structures/linked-list/

Clever Ordering of the ListElement table might help with performance of select operations. (When rebuilding the list for example. Recursion might be an idea as well) Don't know what exactly you're calling FrontEnd programming, but you could create functions in SQL to help you get the results before retrieving data.

Un answered 16/11, 2022 at 14:3 Comment(2)
One reason not to use this approach is when items are filtered. The order had to always be applied before the filtering. Filtering may be a permission functionality (where not all list items are available to a user), thus the order breaks for all successors.Vibraharp
Indeed... My first intuition would be to add a column per ordering, but that seems a bit excessive and hard to maintain... Unfortunatly, while I find the question interesting, I don't have much time to look into itUn
S
0
Page           Item
-----------    ---------------
PK | pid       PK, FK | pid 
   | name      PK     | index 
                      | content 

where index could be a string (lexicographic order), or int if you wish to order numerically (with gaps or not depends on the specific usecase)

The composite primary key makes sure that you can have local indexing relative to any given pid, rather than the global "iid" idea referred in the question.

Stonedeaf answered 23/4, 2021 at 3:30 Comment(0)
A
0

Small lists: store a separate index

For lists that are likely to remain "small" (you choose your definition here, but for me I'd say something on the order of 100 or fewer items), an ordered index of item ID's is byfar the simplest solution:

// Starting order
itemOrder = [5,12,93,1,22]
// Move item 22 up to the second position:
itemOrder = [5,22,12,93,1]

However if your list has the potential to become 10,000 or 100,000 or 100,000,000 items long, this approach will very quickly have poor performance. You will be constantly writing a very large array into your database, parsing a very long array (likely stored as a string in your database), etc... for what should be a simple reordering move.

Large lists: assign each item a rank

In this case, assigning each item a rank attribute is the best solution. Moving one item to another location in the list usually only requires changing that one item's rank, and the list is easily sorted using the rank attribute of each item. The last n items can be pull from the database using order based queries. See @alexander-bird's solution for a good summary of how to implement a string based arbitrary precision rank system.

Something about the string based arbitrary precision solution feels overly complicated to me -- I prefer to use integer ranks. Integers make for simplicity in database operations, sorting of the list, and assigning new numbers, and just sit more cleanly in my head with fewer edge cases. So my solution uses integers spaced by 100, and then after any reorder operation I check to see if there are any other list items "too close" to the new position, and initiate a respacing operation.

// Pseudocode:
function reorder(item, newRank, items) {
  // Set the new rank. New items should be set to the max items rank + 100
  item.rank = newRank
  if any items rank >= newRank-2 and <= newRank+2:
    // You just placed an item too close to some neighbors, respace them
    respaceItems(items, newRank-2)
}

function respaceItems(items, startingRank) {
  itemsToRespace = items where rank >= startingRank, sorted by rank (increasing)
  for item in itemsToRespace:
    item.rank = startingRank
    startingRank += 100
}

In practice I have found that most reordering of my large lists tends to happen near the end of the list, so the "re-writing the rank of all items" problem is a theoretical problem not a realistic one.

Adherence answered 2/3, 2024 at 15:17 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.