Efficient way to store reorderable items in a database [closed]
Asked Answered
P

3

10

So I have a table of user favorites. There's a few million rows of them.

Currently, they have only three columns: id(pk),userId and someFkRef. There's an index on userId to allow me to select a user's favorites quickly.

Currently these are ordered by id which is effectively just the insert order. We'd like to offer the user a chance to re-order their favorites, most likely via some sort of drag and drop interaction.

My first (and I suspect naive) approach to this would be to simply add an order column and a composite index over userId,order. However, upon reflection, when the user moves their item some distance over the list, all intermediate rows between the item's start position and end position will need their order column recalculated and therefore, the index too.

This is (most likely) bad.

Before I spend ages trying to quantify exactly how bad, I'm wondering if there's a better table-based representation that is cheaper to manipulate with the kinds of operations I describe above.

Phonetician answered 27/2, 2013 at 17:16 Comment(5)
I'm not convince you need to index the new field.Budget
Generally, order by ops require an index, no?Phonetician
@Phonetician Require no, but if your table rows are large and you're getting a large result set, sort using an index may generate quite a bit less I/O.Diphyllous
That's a great question. Probably you should've made it more abstract. There are numerous places when you have a reorderable list on a web page and you'd like that order saved into DB.Irreplaceable
Similar: Storing a re-orderable list in a databaseKerakerala
T
6

For a drag and drop interaction, the better bet is a priority. You would start with the priorities being 1, 2, 3, and so on, just like a sort order.

But then, the user wants to move item 5 between 1 and 2. Voila! Give it the value of 1.5. No other values need to change. The index update takes care of the rest.

For this to work, the priority needs to be stored as a floating point number. That might be an issue. Also, a sufficiently large number of changes could result in pushing the limits of floating point. So, if a user tries to take the last element and insert it between the first two, s/he can get away with it about few dozen times or so.

You can fix this with a process that periodically reassigns number for one (or all users, if in batch) starting at 1.

Tracheid answered 27/2, 2013 at 17:21 Comment(2)
This is also a valuable approach. But you still need to have an index on the column someFkRef, so it will still be a bit consuming is the table is VERY large.Flatware
@SamuelEUSTACHI Index is not the worst part. The worst part is that floating-point numbers have finite precision and after some 53 well-designed moves one can break the ordering logic. Yes, you can always have a counter and a trigger to renormalize this list, but I'm quite unsure whether it's going to be a more efficient solution at all.Irreplaceable
F
3

If you don't need to be able to manipulate someFkRef across users (for instance, getting the list of users interested in something), then you could have only one record per user, with an ordered list of someFkRef (refA, refB).

But it's a form of de-normalization, and as it has some drawbacks, it really depends on your needs (and your future needs, that is where comes the trouble)

Flatware answered 27/2, 2013 at 17:21 Comment(1)
Yep, denormalization is going to hit you even when it's the same user: a) you need to limit/offset and manipulate a big list; b) those data are transferred over internet and user sorts a list of a hundred elements in a fast manner (hello, O(N<sup>2</sup>), lags and desynchronization). Denormalization is a great pain and nobody should ever want to do that.Irreplaceable
E
1

Not sure what your dependent references might be to the ID field, but have you thought about over-writing it? I think there is a SET IDENTITY INSERT = ON, or some such that you can do.

I realize this is an odd thing to suggest, but considering what you're trying to do, it might make sense, adn cause the least amount of overhead.

Exasperate answered 27/2, 2013 at 17:26 Comment(1)
@Joachim -- Renumber recordcount = 2 -- the donor and the recipient records only. Reindex is indeterminant -- presumably he's got some padding built-in, and with millions of records, is probably re-indexing off-peak.Exasperate

© 2022 - 2024 — McMap. All rights reserved.