Say I have two collections, each with value independent of each other, but each related to one another. They are photos
and users
. There is a one-to-many relationship between users and photos.
An example of denormalized data:
users:
{
"id": "AABC",
"name": "Donna Smith"
}
photos:
{
"id": "FAD4",
"description": "cute dog",
"user_id": "AABC", // This is the relationship
"user_name": "Donna Smith" // This is the denormalized value from the "users" collection
}
How can I ensure consistency with documents in the photos
collection when user "AABC" changes name from "Donna Smith" to "Donna Chang"?
Being non-transactional, I understand the consistency is going to be eventual.
A simple (naive) implementation might trigger a background job after the change to user "AABC" to update all photos where user_id = "AABC". And in the case of a single update, that would work well. But this is a multi-user environment, and there's going to be updates flying in all directions concurrently. What if, for example, half-way through the background update of photos to change "Donna Smith" to "Donna Chang", the name of user "AABC" is changed back to "Donna Smith"?
Searching online, I see a lot of discussion of how to model denormalized data. But any discussion about how to maintain it seems to be trivialised as "you'll also need to update all related records". Are there any NoSQL systems that do the heavy lifting for you in this scenario? Any frameworks or utilities?
I've read Thomas Wanschik's excellent blog articles on the topic of "materialized views" and background updates for exactly this scenario. But I'm left concerned that:
- The background jobs must be delayed by a pre-determined amount greater than the maximum time permitted for updates (how do I determine that delay? what if an operation takes longer?), and;
- This is the only discussion I've yet found of a practical solution. NoSQL is kind of a big deal right, why am I not seeing more discussion of this? What am I missing?