Storing item positions (for ordering) in a database efficiently
Asked Answered
F

6

34

Scenario:

There is a database of movies a user owns, movies are displayed on a page called "my-movies", the movies can be displayed in the order that the user desires. For example "Fight Club" in position #1, "Drive" in position #3 and so on and so forth.

The obvious solution is to store a position with each item, for example:

movieid, userid, position
1 | 1 | 1
2 | 1 | 2
3 | 1 | 3

Then when outputting the data is ordered by the position. This method works fine for output however it has a problem when updating: the position of an item all the other positions need to be updated because positions are relative. If movie #3 is now in position number 2 then movie #3 now needs to be updated to position #2. If the database contains 10,000 movies and a movie is moved from position #1 to position #9999 that's almost 10,000 rows to be updated!

My only solution is to store positioning separately, instead of having an individual field for each items position it's just one big data dump of positions that are taken in run time and associated with each item (json, xml, whatever) but that feels... inefficient because the database can't be left to do the sorting.

My summarised question: What's the most efficient way of storing items positions in a list that is friendly to fetching and updating?

Fevre answered 19/6, 2012 at 4:25 Comment(1)
Similar question: #9536762Shoplifter
F
18

August 2022: Note that the below is flawed and doesn't work when moving a movie down the list. I've posted a new answer here which fixes this issue.

If you use a combination of the position and a timestamp that the user put a movie in a given position rather than trying to maintain the actual position, then you can achieve a fairly simple means of both SELECTing and UPDATEing the data. For example; a base set of data:

create table usermovies (userid int, movieid int, position int, positionsetdatetime datetime)

insert into usermovies (userid, movieid, position, positionsetdatetime)
values (123, 99, 1, getutcdate())
insert into usermovies (userid, movieid, position, positionsetdatetime)
values (123, 98, 2, getutcdate())
insert into usermovies (userid, movieid, position, positionsetdatetime)
values (123, 97, 3, getutcdate())
insert into usermovies (userid, movieid, position, positionsetdatetime)
values (123, 96, 4, getutcdate())
insert into usermovies (userid, movieid, position, positionsetdatetime)
values (123, 95, 5, getutcdate())
insert into usermovies (userid, movieid, position, positionsetdatetime)
values (123, 94, 6, getutcdate())

insert into usermovies (userid, movieid, position, positionsetdatetime)
values (987, 99, 1, getutcdate())
insert into usermovies (userid, movieid, position, positionsetdatetime)
values (987, 98, 2, getutcdate())
insert into usermovies (userid, movieid, position, positionsetdatetime)
values (987, 97, 3, getutcdate())
insert into usermovies (userid, movieid, position, positionsetdatetime)
values (987, 96, 4, getutcdate())
insert into usermovies (userid, movieid, position, positionsetdatetime)
values (987, 95, 5, getutcdate())
insert into usermovies (userid, movieid, position, positionsetdatetime)
values (987, 94, 6, getutcdate())

If you query the user's movies using a query like this:

;with usermovieswithrank as (
  select userid
  , movieid 
  , dense_rank() over (partition by userid order by position asc, positionsetdatetime desc) as movierank
  from usermovies
)
select * from usermovieswithrank where userid=123 order by userid, movierank asc

Then you'll get the expected result:

USERID  MOVIEID     MOVIERANK
123     99          1
123     98          2
123     97          3
123     96          4
123     95          5
123     94          6

To move one of the rankings of the movies we need to update the position and the positionsetdatetime columns. For example, if userid 123 moves movie 95 from rank 5 to rank 2 then we do this:

update usermovies set position=2, positionsetdatetime=getutcdate() 
where userid=123 and movieid=95 

Which results in this (using the SELECT query above following the update):

USERID  MOVIEID     MOVIERANK
123     99          1
123     95          2
123     98          3
123     97          4
123     96          5
123     94          6

Then if userid 123 moves movie 96 to rank 1:

update usermovies set position=1, positionsetdatetime=getutcdate()
where userid=123 and movieid=96 

We get:

USERID  MOVIEID     MOVIERANK
123     96          1
123     99          2
123     95          3
123     98          4
123     97          5
123     94          6

Of course you'll end up with duplicate position column values within the usermovies table, but with this method you'll never show that column, you simply use it along with positionsetdatetime to determine a sorted rank for each user and the rank you determine is the real position.

If at some point you want the position column to properly reflect the movie rankings without reference to the positionsetdatetime you can use the movierank from the select query above to update the usermovies position column value, as it wouldn't actually affect the determined movie rankings.

Facula answered 13/6, 2014 at 15:45 Comment(9)
Just noticed this question is a year old - oops! Never mind, maybe my suggestion helps someone :-)Facula
This doesn't work if the user movies a movie down the list though. For instance, if they move movie 98 from position 4 to position 6, there will be two movies with position 6, but movie 98 will be shown first (in position 5) because of its more recent positionsetdatetime.Gehlenite
@Gehlenite You are correct - I'm rather sorry I missed that! I suspect it might be easily remedied by adding 1 to the desired position on a downward move; so in your example setting movie 98 from position 4 to position 7 (i.e. the desired position 6, plus 1) would do it, I think?Facula
No, forget that suggestion; it needs more attention than that. I'll see what I can come up with!Facula
@Facula Almost 4 years later, have you come up with anything? Great answers anyway :)Hornbeam
one way would be that before updating a position with an index, check if there are other movies with the same index, and then update the time stamp of only those movies (of index 6) in the same order they should be - updating the movie 98 later. This might add complexity so the one single string solution sounds better, so preferably only for a really large dataset - else keep it simple and simply update all the indexes for a smaller one.Elective
This solution does not work. Simple proof: Move movie 98 to position 1, which causes 99 to be ranked as the second movie, and 98 as the first movie. Now there are two movies with position 1, namely 99 and 98. Say I want to move movie 97 to the second position (after 98, before 99), using the method above, which is to set the position of 97 as 2. The result will be that 97 is still ranked as the third movie instead of the second movie.Airdrie
@YanaAgunSiswanto - Sorry about my extremely tardy reply! I have finally worked on this and have posted a new answer which builds on the previous approach.Facula
@WongJiaHau - thank you for your comment, it caused me to look at this again and to attempt to solve the issue you described.Facula
O
13

I've been struggling with what best to do with this situation and have come to the realisation that BY FAR the best solution is a list/array of the movies in the order you want them eg;

userId, moviesOrder

1 : [4,3,9,1...]

obviously you will serialise your array.

'that feels... inefficient'?

consider the user had a list of 100 movies. Searching by position will be one database query, a string to array conversion and then moviesOrder[index]. Possibly slower than a straight DB lookup but still very very fast.

OTOH, consider if you change the order;

with a position stored in the db you need up to 100 row changes, compared to an array splice. The linked list idea is interesting but doesn't work as presented, would break everything if a single element failed, and looks a hell of a lot slower too. Other ideas like leaving gaps, use float are workable although a mess, and prone to failure at some point unless you GC.

It seems like there should be a better way to do it in SQL, but there really isn't.

Omora answered 11/10, 2012 at 21:44 Comment(3)
I like this because if you think about it, a child ordering belongs to a parent. In a vacuum what does it mean for a row to have an "order" property of "5"? You have to see all of the other rows for this to mean anything.Goodhumored
I think I prefer this myself too... and we need to make sure we update it if a movie gets deleted as wellElective
the only issue is a single query with an order by.... we have to order the data post-query... ( unless there's some fancy sql that can split the string and do all that )Elective
C
6

Store the order linked-list style. Instead of saving the absolute position, save the ID of the previous item. That way any change only requires you to update two rows.

movieid | userid  | previousid
   1    |    1    | 
   2    |    1    |    1
   3    |    1    |    4
   4    |    1    |    2

To get the movies in order ...

SELECT movieid WHERE userid = 1 ORDER BY previousid

-> 1, 2, 4, 3

To (say) move #4 up a space:

DECLARE @previousid int, @currentid int
SET @previousid = SELECT previousid FROM movies WHERE movieid = @currentid

-- current movie's previous becomes its preceding's preceding
UPDATE movies SET previousid = 
    (SELECT previousid FROM movies WHERE movieid = @previousid)
WHERE movieid = @currentid

-- the preceding movie's previous becomes the current one's previous
UPDATE movies SET previousid = @currentid WHERE movieid = @previousid

That's still 1 read + 2 writes, but it beats 10,000 writes.

Cynewulf answered 19/6, 2012 at 4:30 Comment(7)
and what would be the SQL query to list movies?Billionaire
@Billionaire Selecting should be fairly straightforward... updating is a bit tricker, but I think that works.Cynewulf
As per my test, it is resulting in duplicate previousid !!Billionaire
That query to get the movies in order doesn't quite work. Consider: (id,prev):(1,2),(2,3),(3,_). That query would return 3, 1, 2. It should be 3, 2, 1. There doesn't seem to be a great (non-recursive, one-scan) way to do this query in pure SQL given your schema. If you instead do SELECT movieid, previousid WHERE userid = 1, it's trivial to sort them out in whatever other programming language, however.Melano
@Cynewulf The UPDATE looks easy, but there is no easy way to SELECTVivanvivarium
The order has to be implemented in the client AppNational
@Cynewulf Isn't that more like Stack style? Wouldn't an actually linked-list style be more efficient and functional? (id, prev, next)Erastatus
E
2

Really interesting solutions here. Another possibility might be to store positions with some space, say multiples of 10 or 100.

ID   NAME  POSITION
7     A       100
9     B       200
13    C       300
15    D       400
21    F       500

This multiple of 100 can be done for every new addition. Then moving a row C to position 1, would be -1 the current value or +1 after the current value. Or even -50 so that the same can be possible in future.

ID   NAME  POSITION
7     A       100
9     B       200
13    C       50
15    D       400
21    F       500

This can be continued, and in cases of so many movements that it is not possible, then a reorder of all the rows is done once again.

Elective answered 27/10, 2021 at 11:50 Comment(3)
another similar answer I saw which Atlassian uses in Jira is that they use alphabets instead of numbers for lexicographic sorting... and altering the order is easy by prefixing or appending characters at a particular position. There's more info if we google.Elective
Just a note, a little more efficient way would be starting from 0 and using a 2's power increments like 128 or 1024. This way you can maximize update count without renumbering, given that you always evaluate half between existing values. It should be sufficient for all user-ordering scenarios, because given int reaching ~2G, ordering by 1024 gives you ~2M items before order value overflows. This is far bigger than user-managable count (possibly thousands?). Usage of negative values is also possible in case of moving something to top.Turley
"Atlassian uses in Jira is that they use alphabets instead of numbers for lexicographic sorting" They call it LexoRank → confluence.atlassian.com/adminjiraserver/….Scraper
B
1
ID   NAME  POSITION
7     A       1
9     B       2
13    C       3
15    D       4
21    F       5

Given the current scenario if we want to move item D to position 2 we can search for the interval between 2(the position we want to move the item) and 4 (The item's current position) and write a query to ADD +1 to the position of every element inside this interval hence in this case we can do the following steps:

  1. Search for items in the interval where position >= 2 AND position < 4, and add a +1 to its position
  2. Set Item D position to 2.

This will generate that know : A->1, B->3, C-> 4, D->2, F->5

In case we want to move B to D then we need to do the opposite and apply a -1 instead.

  1. Search for items in the interval where position > 2 AND position <= 4 substract -1 from its position
  2. Set item position to 4

When deleting an Item from the table we need to update every item where its position is greater than the position of the element that's being deleted.

And when creating and Item its position is equal to the COUNT of every item +1.

DISCLAIMER: If you have a really big amount maybe this solution is not what you want, but for most cases will do. Normally a user wont move an item from position 10000 to position 2 but if instead the users delete item 1 then the query will substract -1 to the 9999 remaining items. If this is your scenario then maybe the solution with the linked list is probably the best for you, but then ordering will be more challenging because you need to go item by item to see who's next on the list.

Example querys

-- MOVE DOWN
UPDATE movie SET position = position-1  WHERE position <= 18 AND position > 13 AND id > 0;
UPDATE movie SET position = 18 WHERE id = 130;

-- MOVE UP
UPDATE movie SET position = position+1  WHERE position < 18 AND position >= 13 AND id > 0;
UPDATE movie SET position = 13 WHERE id = 130;
Baseborn answered 21/10, 2020 at 23:40 Comment(0)
F
1

Further to my 2014 answer I finally got back to this question and have built upon my previous approach and it's fatal flaw. I've come up with the following solution, using a SQL Server Stored Procedure to show the logic.

First, the table of movies:

CREATE TABLE [dbo].[usermovies]
    ([userid] [int], [movieid] [int], [position] [int], [subposition] [int]) 

And the test data. Note that when we load the data the initial movierank is set in the position column and the subposition is set to 0:

insert into usermovies (userid, movieid, position, subposition)
values (123, 99, 1, 0)
      ,(123, 98, 2, 0)
      ,(123, 97, 3, 0)
      ,(123, 96, 4, 0)
      ,(123, 95, 5, 0)
      ,(123, 94, 6, 0)
      ,(987, 99, 1, 0)
      ,(987, 98, 2, 0)
      ,(987, 97, 3, 0)
      ,(987, 96, 4, 0)
      ,(987, 95, 5, 0)
      ,(987, 94, 6, 0)

It is important to understand that the rank of each movie (movierank) is not determined from the position value, but instead from the rank of the row when the records are sorted by position and then by subposition. I created a view to provide the movierank:

CREATE OR ALTER VIEW vwUserMoviesWithRank 
as
with userMoviesWithRanks as (
  SELECT *
   , dense_rank() over (partition by userid order by position asc, subposition asc) as movierank
  FROM usermovies
)
SELECT * FROM userMoviesWithRanks
GO

Each user can only have one movie with a given position/subposition value, as this provides the unique movierank. Adding a unique clustered index to the table nicely enforces this rule and also, with sufficient data, would make for faster data access.

CREATE UNIQUE CLUSTERED INDEX [IX_usermovies] 
    ON [dbo].[usermovies] ([userid] ASC, [position] ASC, [subposition] ASC)

The below stored procedure performs the updates which allow a user's movie rankings to be changed. I've added comments to help explain the logic:

CREATE OR ALTER PROC proc_ChangeUserMovieRank
@userID int,
@movieID int,
@moveToRank int
as

DECLARE @moveFromRank int

DECLARE @movieIDAtNewRank int
DECLARE @positionAtNewRank int
DECLARE @subpositionAtNewRank int

IF @moveToRank<1 THROW 51000, '@moveToRank must be >= 1', 1;

BEGIN TRAN

-- Get the current rank of the movie being moved
SELECT @moveFromRank=movierank FROM vwUserMoviesWithRank WHERE userid=@userID and movieid=@movieID 

IF @moveFromRank<>@moveToRank BEGIN
  -- Get the position and subposition of the movie we need to shift down the list  
  -- if this move is decreasing the movie rank then we need to shift the movie at @moveToRank
  -- if this move is increasing the movie rank then we need to shift the movie at @moveToRank+1, to accommodate the removal
  SELECT @positionAtNewRank=position, @subpositionAtNewRank=subposition 
  FROM vwUserMoviesWithRank 
  WHERE userid=@userID and movierank=(@moveToRank + CASE WHEN @moveToRank>@moveFromRank THEN 1 ELSE 0 END)

  IF @positionAtNewRank IS NULL BEGIN 
    -- No movie needs to be updated, so we're adding to the end of the list
    -- Our destination is the position+1 of the highest ranked movie (with subposition=0)
    SELECT @positionAtNewRank=max(p.position)+1, @subpositionAtNewRank=0
      FROM vwUserMoviesWithRank p WHERE p.userid=@userID
  END ELSE BEGIN
    -- Move down (increase the subposition of) any movies with the same position value as the destination rank
    UPDATE m
    SET subposition=subposition+1
    FROM usermovies m
    WHERE userid=@userID AND position=@positionAtNewRank and subposition>=@subpositionAtNewRank
  END

  -- Finally move the movie to the new rank
  UPDATE m
  SET position=@positionAtNewRank, subposition=@subpositionAtNewRank
  FROM usermovies m
  WHERE m.userid=@userID AND m.movieid=@movieID
END
COMMIT TRAN
GO

Here's a test run using the test data above. The movies are listed using the following SELECT statement, I haven't repeated this each time below for the sake of brevity. Here's our movie ranking at the beginning:

SELECT movieid, movierank FROM vwUserMoviesWithRank WHERE userid=123 ORDER BY movierank

movieid     movierank
----------- --------------------
99          1
98          2
97          3
96          4
95          5
94          6

Let's move movie 98 to rank 5:

EXEC proc_ChangeUserMovieRank @userID=123, @movieID=98, @moveToRank=5
movieid     movierank
----------- --------------------
99          1
97          2
96          3
95          4
98          5
94          6

Move movie 94 to rank 2:

EXEC proc_ChangeUserMovieRank @userID=123, @movieID=94, @moveToRank=2
movieid     movierank
----------- --------------------
99          1
94          2
97          3
96          4
95          5
98          6

Move movie 95 to rank 1:

EXEC proc_ChangeUserMovieRank @userID=123, @movieID=95, @moveToRank=1
movieid     movierank
----------- --------------------
95          1
99          2
94          3
97          4
96          5
98          6

Move movie 99 to rank 4:

EXEC proc_ChangeUserMovieRank @userID=123, @movieID=99, @moveToRank=4
movieid     movierank
----------- --------------------
95          1
94          2
97          3
99          4
96          5
98          6

Move movie 97 to rank 6:

EXEC proc_ChangeUserMovieRank @userID=123, @movieID=97, @moveToRank=6
movieid     movierank
----------- --------------------
95          1
94          2
99          3
96          4
98          5
97          6

Move movie 97 to rank 4:

EXEC proc_ChangeUserMovieRank @userID=123, @movieID=97, @moveToRank=4
movieid     movierank
----------- --------------------
95          1
94          2
99          3
97          4
96          5
98          6

Move movie 95 to rank 4:

EXEC proc_ChangeUserMovieRank @userID=123, @movieID=95, @moveToRank=4
movieid     movierank
----------- --------------------
94          1
99          2
97          3
95          4
96          5
98          6

Which all looks good I think.

Note that following these operations the position/subposition data now looks like this:

select * from vwUserMoviesWithRank WHERE userid=123 order by movierank

userid      movieid     position    subposition movierank
----------- ----------- ----------- ----------- --------------------
123         94          3           0           1
123         99          4           0           2
123         97          4           1           3
123         95          4           2           4
123         96          4           3           5
123         98          6           0           6

The values are quite different from the determined movierank.

As the movie rankings change the position may become the same across a number of rows, such as position 4 above. When this happens more rows will need to be updated when the rankings change, so it is advisable to periodically reset the position and subposition to the movierank value:

UPDATE usermovies
SET position=vwUserMoviesWithRank.movierank, subposition=0
FROM vwUserMoviesWithRank
INNER JOIN usermovies on usermovies.userid=vwUserMoviesWithRank.userid AND usermovies.movieid=vwUserMoviesWithRank.movieid
WHERE usermovies.position<>vwUserMoviesWithRank.movierank OR usermovies.subposition<>0

This works very efficiently and will scale very well and I think it all works, let me know if you think otherwise and I'll take another look (and this time I won't wait 8 years to do so!)

And just to note that I tried to add a SQL Fiddle link here but it appears that they have no SQL Server hosts presently :-/

Facula answered 31/8, 2022 at 7:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.