I'm using an object database (ZODB) in order to store complex relationships between many objects but am running into performance issues. As a result I started to construct indexes in order to speed up object retrieval and insertion. Here is my story and I hope that you can help.
Initially when I would add an object to the database I would insert it in a branch dedicated to that object type. In order to prevent multiple objects representing the same entity I added a method that would iterate over existing objects in the branch in order to find duplicates. This worked at first but as the database grew in size the time it took to load each object into memory and check attributes grew exponentially and unacceptably.
To solve that issue I started to create indexes based on the attributes in the object so that when an object would be added it would be saved in the type branch as well as within an attribute value index branch. For example, say I was saving an person object with attributes firstName = 'John' and lastName = 'Smith', the object would be appended to the person object type branch and would also be appended to lists within the attribute index branch with keys 'John' and 'Smith'.
This saved a lot of time with duplicate checking since the new object could be analysed and only the set of objects which intersect within the attribute indexes would need to be checked.
However, I quickly ran into another issue with regards to dealing when updating objects. The indexes would need to updated to reflect the fact that they may not be accurate any more. This requires either remembering old values so that they could be directly accessed and the object removed or iterating over all values of an attribute type in order to find then remove the object. Either way performance is quickly beginning to degrade again and I can't figure out a way to solve it.
Has you had this kind of issue before? What did you do solve it, or is this just something that I have to deal with when using OODBMS's?
Thank in advance for the help.