MongoDB, C# and NoRM + Denormalization
Asked Answered
F

4

8

I am trying to use MongoDB, C# and NoRM to work on some sample projects, but at this point I'm having a much harder time wrapping my head around the data model. With RDBMS's related data is no problem. In MongoDB, however, I'm having a difficult time deciding what to do with them.

Let's use StackOverflow as an example... I have no problem understanding that the majority of data on a question page should be included in one document. Title, question text, revisions, comments... all good in one document object.

Where I start to get hazy is on the question of user data like username, avatar, reputation (which changes especially often)... Do you denormalize and update thousands of document records every time there is a user change or do you somehow link the data together?

What is the most efficient way to accomplish a user relationship without causing tons of queries to happen on each page load? I noticed the DbReference<T> type in NoRM, but haven't found a great way to use it yet. What if I have nullable optional relationships?

Thanks for your insight!

Farrago answered 18/3, 2011 at 14:53 Comment(1)
+1, I've wondered the same thing.Realty
K
1

I think you need to strike a balance.

If I were you, I'd just reference the userid instead of their name/reputation in each post.

Unlike a RDBMS though, you would opt to have comments embedded in the document.

Kaftan answered 5/4, 2011 at 13:15 Comment(1)
I agree. I liked using DBRef, as user data is prone to frequent updates. Comments on the other hand are perfectly acceptable within a document.Farrago
R
2

The balance that I have found is using SQL as the normalized database and Mongo as the denormalized copy. I use a ESB to keep them in sync with each other. I use a concept that I call "prepared documents" and "stored documents". Stored documents are data that is only kept in mongo. Useful for data that isn't relational. The prepared documents contain data that can be rebuilt using the data within the normalized database. They act as living caches in a way - they can be rebuilt from scratch if the data ever falls out of sync (in complicated documents this is an expensive process because these documents require many queries to be rebuilt). They can also be updated one field at a time. This is where the service bus comes in. It responds to events sent after the normalized database has been updated and then updates the relevant mongo prepared documents.

Use each database to their strengths. Allow SQL to be the write database that ensures data integrity. Let Mongo be the read-only database that is blazing fast and can contain sub-documents so that you need less queries.

** EDIT ** I just re-read your question and realized what you were actually asking for. I'm leaving my original answer in case its helpful at all.

The way I would handle the Stackoverflow example you gave is to store the user id in each comment. You would load up the post which would have all of the comments in it. Thats one query.

You would then traverse the comment data and pull out an array of user ids that you need to load. Then load those as a batch query (using the Q.In() query operator). Thats two queries total. You would then need to merge the data together into a final form. There is a balance that you need to strike between when to do it like this and when to use something like an ESB to manually update each document. Use what works best for each individual scenario of your data structure.

Redletter answered 17/9, 2011 at 0:10 Comment(1)
I like this solution. Loading up the batch of user IDs and then assembling the data is a good idea.Farrago
L
1

Why you want to avoid denormalization and updating 'thousands of document records'? Mongodb db designed for denormalization. Stackoverlow handle millions of different data in background. And some data can be stale for some short period and it's okay.

So main idea of above said is that you should have denormalized documents in order to fast display them at ui.

You can't query by referenced document, in any way you need denormalization.

Also i suggest have a look into cqrs architecture.

Leoraleos answered 18/3, 2011 at 15:41 Comment(5)
It's not that I want to avoid denormalization, but I want to avoid inherently bad design. Detaching something as common as a user record to the point where I could be updating thousands of user records per second consistently seems 1. Like overkill 2. Like a bad use of disk space. Are there no other options?Farrago
It depend on what you want: if you care about 'disk space' and denormalization like overkill for you than probably my answer not for you, but if you care about performance and you want reach SO speed -- than you should go described above way.Leoraleos
Not to mention, disk space is cheapHeartsease
Thoughts on DBRef? It's my understanding that frequently accessed records are cached, and still pretty performant. mongodb.org/display/DOCS/Database+ReferencesFarrago
Yes, sure i thought about DBRef's first time when come from relational databases world. DBRef in mongodb is always bad, because it done on the client side(internally in driver), caching is bonus, but denormalization is better, because with denormalized data you can query with dbref can't, also dbref it's additional requests to database.Leoraleos
D
1

Try to investigate cqrs and event sourcing architecture. This will allow you to update all this data by queue.

Dickerson answered 18/3, 2011 at 15:43 Comment(0)
K
1

I think you need to strike a balance.

If I were you, I'd just reference the userid instead of their name/reputation in each post.

Unlike a RDBMS though, you would opt to have comments embedded in the document.

Kaftan answered 5/4, 2011 at 13:15 Comment(1)
I agree. I liked using DBRef, as user data is prone to frequent updates. Comments on the other hand are perfectly acceptable within a document.Farrago

© 2022 - 2024 — McMap. All rights reserved.