How do implement schema changes in a NOSQL storage system

Asked 30/8, 2011 at 5:42 Answered 31/8, 2011 at 0:39

How do you manage a major schema change when you are using a Nosql store like SimpleDB?

I know that I am still thinking in SQL terms, but after working with SimpleDB for a few weeks I need to make a change to a running database. I would like to change one of the object classes to have a unique id, as rather than a business name, and as it is referenced by another object, I will need to also update the reference value in these objects.

With a SQL database you would run set of sql statements as part of the client software deployment process. Obviously this will not work with something like SimpleDB as

there is no equivalent of a SQL update statement.
Due to the distributed nature of SimpleDB, there is no way of knowing when the changes you have made to the database have 'filtered' out to all the nodes running your client software.

Some solutions I have thought of are

Each domain has a version number. The client software knows which version of the domain it should use. Write some code that copies the data from one domain version to another, making any required changes as you go. You can then install new client software that then accesses the new domain version. This approach will not work unless you can 'freeze' all write access during the update process.
Each item has a version attribute that indicates the format used when it was stored. The client uses this attribute when loading the object into memory. Object can then be converted to the latest format when it is written back to SimpleDB. The problem with this is that the new software needs to be deployed to all servers before any writes in the new format occur, or clients running the old software will not know how to read the new format.

It all is rather complex and I am wondering if I am missing something?

Thanks

Richard

Mecklenburg answered 30/8, 2011 at 5:42 Comment(0)

I use something similar to your second option, but without the version attribute.

First, try to keep your changes to things that are easy to make backward compatible - changing the primary key is the worst case scenario for this.

Removing a field is easy - just stop writing to that field once all servers are running a version that doesn't require it.

Adding a field requires that you never write that object using code that won't save that field. If you can't deploy the new version everywhere at once, use an intermediate version that supports saving the field before you deploy a version that requires it.

Changing a field is just a combination of these two operations.

With this approach changes are applied as needed - write using the new version, but allow reading of the old version with default or derived values for the new field.

You can use the same code to update all records at once, though this may not be appropriate on a large dataset.

Changing the primary key can be handled the same way, but could get really complex depending on which nosql system you are using. You are probably stuck with designing custom migration code in this case.

Nixie answered 31/8, 2011 at 0:39 Comment(0)

RavenDB another NoSQL database uses migrations to acheive this

http://ayende.com/blog/66563/ravendb-migrations-rolling-updates

http://ayende.com/blog/66562/ravendb-migrations-when-to-execute

Normally these type of changes are handled by your application that changes the schema to a newer one upon loading version X and converting to version Y and persisting

Nanananak answered 30/8, 2011 at 5:46 Comment(0)

Recommended topics

Hot tags