Arbitrary document ordering in CouchDB/PouchDB

Asked 24/8, 2016 at 13:45 Answered 9/6, 2017 at 3:15

I’m building what can be treated as a slideshow app with CouchDB/PouchDB: each “slide” is its own Couch document, and slides can be reordered or deleted, and new slides can be added in between existing slides or at the beginning or end of the slideshow. A slideshow could grow from one to ≲10,000 slides, so I am sensitive to space- and time-efficiency.

I made the slide creation/editing functionality first, completely underestimating how tricky it is to keep track of slide ordering. This is hard because the order of each slide-document is completely independent of the slide-doc itself, i.e., it’s not something I can sort by time or some number contained in the document. I see numerous questions on StackOverflow about how to keep track of ordering in relational databases:

but all these involve either

using a floating-point secondary key for reordering/creation/deletion, with periodic normalization of indexes (i.e., imagine two documents are order-index 1.0 and 2.0, then a third document in between gets key 1.5, then a fourth gets 1.25, …, until ~31 docs are inserted in between and you get floating-point accuracy problems);
a linked list approach where a slide-document has a previous and next field containing the primary key of the documents on either side of it;
a very straightforward approach of updating all documents for each document reordering/insertion/deletion.

None of these are appropriate for CouchDB: #1 incurs a huge amount of incidental complexity in SQL or CouchDB. #2 is unreliable due to lack of atomic transactions (CouchDB might update the previous document with its new next but another client might have updated the new next document meanwhile, so updating the new next document will fail with 409, and your linked list is left in an inconsistent state). For the same reason, #3 is completely unworkable.

One CouchDB-oriented approach I’m evaluating would create a document that just contains the ordering of the slides: it might contain a primary-key-to-order-number hash object as well as an array that converts order-number-to-primary-key, and just update this object when slides are reordered/inserted/deleted. The downside to this is that Couch will keep a copy of this potentially large document for every order change (reorder/insert/delete)—CouchDB doesn’t support compacting just a single document, and I don’t want to run compaction on my entire database since I love preserving the history of each slide-document. Another downside is that after thousands of slides, each change to ordering involves transmitting the entire object (hundreds of kilobytes) from PouchDB/client to Couch.

A tweak to this approach would be to make a second database just to hold this ordering document and turn on auto-compaction on it. It’ll be more work to keep track of two database connections, and I’ll eventually have to put a lot of data down the wire, but I’ll have a robust way to order documents in CouchDB.

So my questions are: how do CouchDB people usually store the order of documents? And can more experienced CouchDB people see any flaws in my approach outlined above?

Fitzhugh answered 24/8, 2016 at 13:45 Comment(6)

Possibly of interest: #38923876 – William 25/8, 2016 at 6:56

@LynHeadley thanks for this—I’m working on a super-charged version of m69’s answer and I think that’ll work really well with CouchDB’s nice support for querying previous/next primary ID! – Fitzhugh 26/8, 2016 at 13:49

awesome! I have been thinking about this problem too and never found any good answers on the net. Maybe we're onto something... – William 26/8, 2016 at 18:4

@LynHeadley I think with a function that takes two strings and returns a string that lexicographically sorts between them (ideally close to their ‘midpoint’) will do the trick. And m69 offers such code, I’m just making it a bit better (base-62 would offer very short keys for tons of documents). This would easily do insertion. Moving documents would be a bit less elegant: copy document to new primary key (lexicographically between new neighbors), then delete old primary key. No big document to track order, nice leveraging of CouchDB’s sensibilities… or am I missing something? – Fitzhugh 26/8, 2016 at 19:16

@LynHeadley it took a bit of time but I wrapped up that library generalizing @m69’s answer and it worked nicely! See my response https://mcmap.net/q/1962938/-arbitrary-document-ordering-in-couchdb-pouchdb – Fitzhugh 9/6, 2017 at 3:15

I'm not sure if any of the advanced techniques using fractions described here can be used in CouchDB but nonetheless a relevant read: begriffs.com/posts/2018-03-20-user-defined-order.html – Fitzhugh 4/4, 2019 at 21:38

Thanks to a tip by @LynHeadley, I wound up writing a library that could subdivide the lexicographical interval between strings: Mudder.js. This allows me to infinitely insert and move around documents in CouchDB, by creating new keys at will, without any overhead of a secondary document to store the ordering. I think this is the right way to solve this problem!

Fitzhugh answered 9/6, 2017 at 3:15 Comment(0)

Based on what I've read, I would choose the "ordering document" approach. (ie: slideshow document that has an array of ids for each slide document) This is really straightforward and accomplishes the use-case, so I wouldn't let these concerns get in the way of clean/intuitive code.

You are right that this document can grow potentially very large, compounded by the write-heavy nature of that specific document. This is why compaction exists and is the solution here, so you should not fight against CouchDB on this point.

It is a common misconception that you can use CouchDB's revision history to keep a comprehensive history to your database. The revisions are merely there to aid in write concurrency, not as a full version control system.

CouchDB has auto-compaction enabled by default, and without it your database will grow in size unchecked. Thus, you should abandon the idea of tracking document history using this approach, and instead adopt another, safer alternative. (a list of these alternatives is beyond the scope of this answer)

Agreed answered 24/8, 2016 at 17:14 Comment(5)

When you say “CouchDB has auto-compaction enabled by default”, you mean the _revs_limit option, which defaults to 1000, i.e., CouchDB will keep no more than 1000 revisions? 1000 is still a lot!, so wouldn’t auto-compaction (immediately discard non-leaf nodes after every write) still be important—hence the need for a second database? – Fitzhugh 24/8, 2016 at 18:2

Please, pretty please, could you comment or give a pointer to “proper” version control on top of (or beyond) CouchDB, if Couch’s revisions system doesn’t provide it 😇? (I am planning on using it as a very gentle “undo” system where, in a catastrophe, I can at least read old versions of documents, but your comments along these lines make me think I shouldn’t expect to be able to do that.) – Fitzhugh 24/8, 2016 at 18:3

I would read their documentation on compaction, specifically this section on automatic compaction – Agreed 25/8, 2016 at 19:41

As for the other question, I will simply refer you to another question where I answered this: https://mcmap.net/q/2031622/-couchdb-view-that-includes-revision-history – Agreed 25/8, 2016 at 19:42

Thanks. wiki.apache.org/couchdb/How_to_design_for_replication was also helpful in understanding the strategies to handle semantic updates. – Fitzhugh 25/8, 2016 at 19:49

Recommended topics

Hot tags