How to maintain denormalized consistency in NoSQL?
Asked Answered
C

1

12

Say I have two collections, each with value independent of each other, but each related to one another. They are photos and users. There is a one-to-many relationship between users and photos.

An example of denormalized data:

users:
{
  "id": "AABC",
  "name": "Donna Smith"
}

photos:
{
  "id": "FAD4",
  "description": "cute dog",
  "user_id": "AABC",  // This is the relationship
  "user_name": "Donna Smith"  // This is the denormalized value from the "users" collection
}

How can I ensure consistency with documents in the photos collection when user "AABC" changes name from "Donna Smith" to "Donna Chang"?

Being non-transactional, I understand the consistency is going to be eventual.

A simple (naive) implementation might trigger a background job after the change to user "AABC" to update all photos where user_id = "AABC". And in the case of a single update, that would work well. But this is a multi-user environment, and there's going to be updates flying in all directions concurrently. What if, for example, half-way through the background update of photos to change "Donna Smith" to "Donna Chang", the name of user "AABC" is changed back to "Donna Smith"?

Searching online, I see a lot of discussion of how to model denormalized data. But any discussion about how to maintain it seems to be trivialised as "you'll also need to update all related records". Are there any NoSQL systems that do the heavy lifting for you in this scenario? Any frameworks or utilities?

I've read Thomas Wanschik's excellent blog articles on the topic of "materialized views" and background updates for exactly this scenario. But I'm left concerned that:

  1. The background jobs must be delayed by a pre-determined amount greater than the maximum time permitted for updates (how do I determine that delay? what if an operation takes longer?), and;
  2. This is the only discussion I've yet found of a practical solution. NoSQL is kind of a big deal right, why am I not seeing more discussion of this? What am I missing?
Cadelle answered 28/1, 2016 at 0:40 Comment(1)
Any thoughts? I would have thought with the popularity of NoSQL, this would be a "solved" problem.Cadelle
P
4

My early understanding of NoSQL is there is a true analysis of cost when delivering huge amounts of data back to the user/application.

When delivering back your photos in your application, what is more likely to happen more frequent? Delivery of the photos back to the user and perhaps their friends which are viewing them...or the changing of the user's name?

Since the changing of the user's name is a less common instance in the application, NoSQL's Denormalization claim to fame is that you can deliver hi-speed gobs of photo data back to the users without the expense of JOINs in a traditional normalized/RDBMS environment.

Using a few tools that are out there these days (since you wrote this a fairly long time ago) can assist with situations like this, but you were essentially correct in that you can schedule a code change to handle this...it will be slow...it will be expensive....but it will work...and you'll still have the benefits of the speed of delivering your photos to the application, which essentially is the main purpose of your app.

This question grows into an epic novel which has SQL Defenders on one side and the "rabble" NoSQL followers on the other. Traditional DBA's shudder at the thought of compromising structure for speed, but think of NoSQL as the old "Super Table" concept of long ago where we used to think in terms of what would be returned vs. what needs to be stored. Essentially...this is what gave rise to the NoSQL concept and it is proving to be very helpful in large scale applications and big data reporting.

I know this is an old question, but I still hope my answer helps others such as myself demystify the NoSQL benefits when it comes to this type of question.

Prokofiev answered 9/2, 2016 at 13:3 Comment(1)
My question isn't all the old, less than 2 weeks. @Indy-Jones, you raise some good discussion points, but I've seen these points many times before in different places. What I'm looking for is patterns and/or tools for implementing said patterns for that "slow, expensive update". I continue to be surprised that this non-trivial engineering task is left to the application developer.Cadelle

© 2022 - 2024 — McMap. All rights reserved.