Handling data maintenance in Object Databases like db4o

Asked 10/3, 2010 at 20:15 Answered 10/3, 2010 at 21:25

Solved jdbc db4o database-schema object-oriented-database

One thing I have continually found very confusing about using an object database like db4o is how you are supposed to handle complex migrations that would normally be handled by SQL/PL-SQL.

For example imagine you had a table in a relational database called my_users. Originally you had a column named "full_name", now that your software is in V2 you wish to remove this column, split the full names on a blank space and put the first part in a column named "first_name" and the second in a column named last_name. In SQL I would simply populate the "first_name" and "second_name" columns then remove the original column named "full_name".

How would I do this in something like db4o? Do I write a Java program that scripts looking up all objects of User.class, setting full_name to null while setting first_name and last_name? When I do my next svn commit there will be no field/bean-property corresponding to full_name, would this be a problem? It seems as though to use it in a production application where my "schema" changes I would want to write a script to migrate data from version x to version x+1 and then in version x+2 actually remove the properties I am trying to get rid of for version x+1 as I cannot write a Java script to modify properties that no longer are part of my type.

It seems that part of the problem is that an RDBMS resolves what object you are referring to based on a simple case insensitive string-based name, in a language like Java typing is more complicated than this, you cannot refer to a property if the getter/setter/field are not a member of the class loaded at runtime so you essentially need to have 2 versions of your code in the same script (hmm, custom classloaders sound like a pain), have the new version of your class stored belong to another package (sounds messy), or use the version x+1 x+2 strategy I mentioned (requires a lot more planning). Perhaps there is some obvious solution I never gleaned from the db4o documents.

Any ideas? Hopefully this makes some sense.

Unscrew answered 10/3, 2010 at 20:15 Comment(0)

First, db4o handles the 'simple' scenarios like adding or removing a field automatically. When you adding the field, all existing object have the default value stored. When you remove a field, the data of existing object is still in the database and you can still access it. Renaming field etc are special 'refactoring'-calls.

Now your scenario you would do something like this:

Remove the field 'full_name', add the new fields 'first_name' and 'second_name'
Iterate over all 'Address'-objects
Access the old field via the 'StoredClass'-API
Split, change, update etc the value. Set the new values on the new field and store the object.

Let's assume we have a 'Address'-class. The 'full_name' field has been removed. Now we wan't to copy it to the 'firstname' and 'surname'. Then it could go like this (Java):

    ObjectSet<Address> addresses = db.query(Address.class);
    StoredField metaInfoOfField = db.ext().storedClass(Address.class).storedField("full_name", String.class);
    for (Address address : addresses) {
        String fullName = (String)metaInfoOfField.get(address);
        String[] splitName = fullName.split(" ");
        address.setFirstname(splitName[0]);
        address.setSurname(splitName[1]);
        db.store(address);
    }

As you suggested, you would write migration-code for each version-bump. It a field isn't part of your class anymore, you have to access it with 'StoredField'-API like above.

You can get a list of all 'stored' classes with ObjectContainer.ext().storedClasses(). With StoredClass.getStoredFields() you can get a list of all store fields, no mather is the field doesn't exist anymore in your class. If a class doesn't exist anymore, you can still get the objects and access it via 'GenericObject'-class.

Update: For complexer scenarios where a database needs to migrated over multiple-version-steps.

For example it in the version v3 the address-object looks completely different. So the 'migration-script' for v1 to v2 hasn't got the fields anymore it requires (firstname and surename in my example). I think there are multiple possibilities for handling this.

(Assuming Java for this idea. Certainly there's an equivalent in .NET). You could make the migration-step a Groovy-script. So each that each script does not interfere with another. Then you define 'classes' the needed classes for the migration there. So each migration has its own migration-classes. With aliases you would bind your groovy-migration-classes to the actual java-classes.
Creating refactoring-classes for complex scenarios. Also bind this classes with aliases.

Emyle answered 10/3, 2010 at 21:25 Comment(0)

I'm taking a bit of a wild shot here, because I didn't refactor too much data in my life.

You're making a strange comparison: If you wanted to 'hot-migrate' the db, you'd probably have to do the x+1, x+2 versioning approach you described, but I don't really know - I wouldn't know how to do this with SQL either since I'm not a db expert.

If you're migrating 'cold', however, you could just do it in one step by instantiating a new object from the old data, store the new object, delete the old object for each object in the store. See db4o reference.

But honestly: the same process in a RDBMS is complicated, too, because you will have to de-activate constraint checks (and possibly triggers, etc.) to actually perform the operation - perhaps not in the example you provided, but for most real-world cases. After all, the string split is so easy that there will be little gain.

In SQL I would simply populate "first_name" and "second_name" columns

Yes, with a simple string split operation, you can simply do that. But in a typical refactoring scenario, you're re-structuring objects based on large and complicated sets of rules that might not be easily expressed in SQL, might need complex calculation, or external data sources.

To do that, you'd have to write code, too.

After all, I don't see too much difference in the two processes. You will always have to be careful with live data, and you will certainly make a backup in both cases. Refactoring is fun, but persistence is tricky so synchronizing it is a challenge in any case.

Moe answered 10/3, 2010 at 21:3 Comment(0)

Recommended topics

Hot tags