CouchDB: Single document vs "joining" documents together
Asked Answered
P

1

7

I'm tryting to decide the best approach for a CouchApp (no middleware). Since there are similarities to my idea, lets assume we have a stackoverflow page stored in a CouchDB. In essence it consists of the actual question on top, answers and commets. Those are basically three layers.

There are two ways of storing it. Either within a single document containing a suitable JSON representation of the data, or store each part of the entry within a separate document combining them later through a view (similar to this: http://www.cmlenz.net/archives/2007/10/couchdb-joins)

Now, both approaches may be fine, yet both have massive downsides from my current point of view. Storing a busy document (many changes through multiple users are expected) as a signle entity would cause conflicts to happen. If user A stores his/her changes to the document, user B would receive a conflict error once he/she is finished typing his/her update. I can imagine its possible to fix this without the users knowledge through re-downloading the document before retrying.

Multi User Update problem

But what if the document is rather big? I'll except them to become rather blown up over time which would put quite some noticeable delay on a save process, especially if the retry process has to happen multiple times due to many users updating a document at the same time.

Another problem I'd see is editing. Every user should be allowed to edit his/her contributions. Now, if they're stored within one document it might be hard to write a solid auth handler.

Ok, now lets look at the multiple documents approach. Question, Answers and Comments would be stored within their own documents. Advantage: only the actual owner of the document can cause conflicts, something that won't happen too often. Being rather small elements of the whole, redownloading wouldn't take much time. Furthermore the auth routine should be quite easy to realize.

Now here's the downside. The single document is real easy to query and display. Having a lot of unsorted snippets laying around seems like a messy thing since I didn't really get the actual view to present me with a 100% ready to use JSON object containing the entire item in an ordered and structured format.

enter image description here

I hope I've been able to communicate the actual problem. I try to decide which solution would be more suitable for me, which problems easier to overcome. I imagine the first solution to be the prettier one in terms of storage and querying, yet the second one the more practical one solvable through better key management within the view (I'm not entirely into the principle of keys yet).

Thank you very much for your help in advance :)

Parks answered 24/11, 2011 at 10:37 Comment(0)
H
8

Go with your second option. It's much easier than having to deal with the conflicts. Here are some example docs how I might structure the data:

{
   _id: 12345,
   type: 'question',
   slug: 'couchdb-single-document-vs-joining-documents-together',
   markdown: 'Im tryting to decide the best approach for a CouchApp (no middleware). Since there are similarities to...' ,
   user: 'roman-geber',
   date: 1322150148041,
   'jquery.couch.attachPrevRev' : true
}
{
   _id: 23456,
   type: 'answer'
   question: 12345,
   markdown: 'Go with your second option...',
   user : 'ryan-ramage',
   votes: 100,
   date: 1322151148041,
   'jquery.couch.attachPrevRev' : true
}
{
   _id: 45678,
   type: 'comment'
   question: 12345,
   answer: 23456,
   markdown : 'I really like what you have said, but...' ,
   user: 'somedude',
   date: 1322151158041, 
   'jquery.couch.attachPrevRev' : true
}

To store revisions of each one, I would store the old versions as attachments on the doc being edited. If you use the jquery client for couchdb, you get it for free by adding the jquery.couch.attachPrevRev = true. See Versioning docs in CouchDB by jchris

Create a view like this

fullQuestion : {
   map : function(doc) {
       if (doc.type == 'question') emit([doc._id, null, null], null);
       if (doc.type == 'answer')   emit([doc.question, doc._id, null], null);
       if (doc.type == 'comment')  emit([doc.question, doc.answer, doc._id], null) ;
   }
}

And query the view like this

http://localhost:5984/so/_design/app/_view/fullQuestion?startkey=['12345']&endkey=['12345',{},{}]&include_docs=true

(Note: I have not url encoded this query, but it is more readable)

This will get you all of the related documents for the question that you will need to build the page. The only thing is that they will not be sorted by date. You can sort them on the client side (in javascript).

EDIT: Here is an alternative option for the view and query

Based on your domain, you know some facts. You know an answer cant exist before a question existed, and a comment on an answer cant exist before an answer existed. So lets make a view that might make it faster to create the display page, respecting the order of things:

fullQuestion : {
   map : function(doc) {
       if (doc.type == 'question') emit([doc._id, doc.date], null);
       if (doc.type == 'answer')   emit([doc.question, doc.date], null);
       if (doc.type == 'comment')  emit([doc.question, doc.date], null);
   }
 }

This will keep all the related docs together, and keep them ordered by date. Here is a sample query

http://localhost:5984/so/_design/app/_view/fullQuestion?startkey=['12345']&endkey=['12345',{}]&include_docs=true

This will get back all the docs you will need, ordered from oldest to newest. You can now zip through the results, knowing that the parent objects will be before the child ones, like this:

function addAnswer(doc) {
   $('.answers').append(answerTemplate(doc));
}

function addCommentToAnswer(doc) {
   $('#' + doc.answer).append(commentTemplate(doc));
}

$.each(results.rows, function(i, row) {
   if (row.doc.type == 'question') displyQuestionInfo(row.doc);
   if (row.doc.type == 'answer') addAnswer(row.doc);
   if (row.doc.type == 'comment') addCommentToAnswer(row.doc)
})

So then you dont have to perform any client side sorting.

Hope this helps.

Hypotaxis answered 24/11, 2011 at 16:30 Comment(1)
Hi Ryan! Thanks a lot for your detailed and understandable answer. You told me quite some things I didn't know yet. I've managed to implement a first version of the second option rather quickly, yet will tweak it based on your input. Thanks a lot! :)Parks

© 2022 - 2024 — McMap. All rights reserved.