Why does mongodb seem to save some binary objects and not others?
Asked Answered
D

1

7

I'm not sure where to start or what information is relevant please let me know what additional information may be useful in solving this problem.

I am developing a simple cometd application and I'm using mongodb as my storage backend. I obtain a single mongodb instance when the application starts and I use this instance for all queries. This is in fact recommended by the mongo java driver documentation as stated here: http://www.mongodb.org/display/DOCS/Java+Driver+Concurrency. I was grasping at straws thinking that the issue had something to do with thread safety but according to that link mongodb is completely thread safe.

Here's where it gets interesting. I have a class that extends BasicDBObject.

public class MyBasicDBObject {

    private static final String MAP = "map";

    public boolean updateMapAnd(String submap, String key, byte[] value) {
         Map topMap = (Map)this.get(MAP);
         Map embeddedMap = topMap.get(submap);
         byte[] oldValue = embeddedMap.get(key);

         newValue = UtilityClass.binaryAnd(oldValue, value);

         embeddedMap.put(key, newValue);
         topMap.put(submap, embeddedMap);
         this.put(MAP, topMap);
    }

    public boolean updateMapXor(String submap, String key, byte[] value) {
         Map topMap = (Map)this.get(MAP);
         Map embeddedMap = topMap.get(submap);
         byte[] oldValue = embeddedMap.get(key);

         newValue = UtilityClass.binaryXor(oldValue, value);

         embeddedMap.put(key, newValue);
         topMap.put(submap, embeddedMap);
         this.put(MAP, topMap);
    }
}

Next two skeleton classes that extend MyBasicDBObject.

public class FirstDBObject extends MyBasicDBObject { //no code }

public class SecondDBObject extends MyBasicDBObject { //no code }

The only reason I've set up my classes this way is to improve code readability in dealing with these two objects within the same scope. This lets me do the following...

//a cometd service callback
public void updateMapObjectsFoo(ServerSession remote, Message message) {

    //locate the objects to update...
    FirstDBObject first = (FirstDBObject) firstCollection.findOne({ ... });
    SecondDBObject second = (SecondDBObject) secondCollection.findOne({ ... });

    //update them as follows
    first.updateMapAnd("default", "someKey1", newBinaryData1);
    second.updateMapAnd("default", "someKey2", newBinaryData2);

    //save (update) them to their respective collections
    firstCollection.save(first);
    secondCollection.save(second);
}

public void updateMapObjectsBar(ServerSession remote, Message message) {

    //locate the objects to update...
    FirstDBObject first = (FirstDBObject) firstCollection.findOne({ ... });
    SecondDBObject second = (SecondDBObject) secondCollection.findOne({ ... });

    /** 
     * the only difference is these two calls 
     */
    first.updateMapXor("default", "someKey1", newBinaryData1);
    second.updateMapXor("default", "someKey2", newBinaryData2);

    //save (update) them to their respective collections
    firstCollection.save(first);
    secondCollection.save(second);
}

The UtilityClass does exactly as the methods are named, bitwise & and bitwise ^ by iterating over the passed byte arrays.

This is where I'm totally lost. updateMapObjectsFoo() works exactly as expected, both first and second reflect the changes in the database. updateMapObjectsBar() on the other hand only manages to properly update first.

Inspection via debugging updateMapObjectsBar() shows that the binary objects are in fact updated properly on both objects, but when I head over to the mongo shell to investigate the problem I see that first is updated in the DB and second is not. Where did I get the idea that thread safety had anything to do with it? The only difference that bugs me is that secondCollection is used by other cometd services while firstCollection is not. That seems relevant in one hand, but not in the other since Foo works and Bar does not.

I have torn the code apart and put it back together and I keep coming back to this same problem. What in the world is going on here?

It seems I left out the most relevant part of all which is the nightmare of java generics and the mongodb driver's reliance on this feature of the language. BasicDBObject is essentially a wrapper for a Map<String, Object>. The problem is that once you store an object in that map, you must cast it back to what it was when you put it in there. Yes that may seem completely obvious, and I knew that full well before posting this question.

I cannot pinpoint what happened exactly but I will offer this advice to java + mongodb users. You will be casting, A LOT, and the more complicated your data structures the more casts you will need. Long story short, don't do this:

DBObject obj = (DBObject) collection.findOne(new BasicDBObject("_id", new ObjectId((String)anotherObj.get("objId"))));

One liners are tempting when you are doing rapid prototypes but when you start doing that over and over you are bound to make mistakes. Write more code now, and suffer less frustration later:

DBObject query = new DBObject();
String objId = (String) anotherObj.get("objId");
query.put("_id", new ObjectId(objId));
obj = (DBObject) collection.findOne(query);

I think this is annoyingly verbose but I should expect as much interacting directly with Mongo instead of using some kind of library to make my life easier. I have made a fool of myself on this one, but hopefully someone will learn from my mistake and save themselves a lot of frustration.

Thanks to all for your help.

Dropsy answered 4/9, 2012 at 2:17 Comment(7)
The problem, of course, is with the phrase "essentially identical method," which clearly it is not if it has different behavior. You might want to include the code for that method as well.Compartmentalize
it would be helpful if you can provide the definition of updateMapObjectsBar() method. If they are similar then its improbable that one works and other does not.Preternatural
You both are probably right that something is different, but for brevity's sake I cannot include the entire body of both methods. I just feel like I am somehow misusing Mongo since the only visible difference is that the second method fails to store the second object.Dropsy
You mention the "nightmare of java generics", but wouldn't you have to do just as much casting for a plain Map as you're doing now for Map<String, Object> ? Afaics, the only difference is that your keys are at compile-time guaranteed to be Strings.Acrodrome
Can you turn on profiling to see what is being sent to the DB? mongodb.org/display/DOCS/Database+ProfilerGlengarry
@kristina, Thank you for the info on the profiling tool for mongo, I had no idea that it existed. It should be invaluable for future troubleshooting. Much appreciated.Dropsy
it looks to me like this code is extremely thread unsafe. You are doing two things non-atomically: read an object from the database, then later write a different version of the object. If another thread is also reading/updating this collection you may stomp over its changes or it may stomp over your changes. You should be doing updates of just the fields that you changed, not save(entireObject) which unnecessarily overwrites fields unchanged by you and causes major bugs when those fields are meanwhile changed by another thread/process...Ramirez
P
2

It could very easily be a multi-threading issue. While you are correct that the Mongo, DB, and DBCollection objects are threadsafe if there is only one Mongo instance, DBObjects are not threadsafe. But even if they were threadsafe, your updateMapObjectsFoo/Bar methods do nothing to ensure that they are atomic operations on the database.

Unfortunately, the changes you would need to make to your code are more intense than just sprinkling a few "synchronized" keywords around. See if http://www.mongodb.org/display/DOCS/Atomic+Operations doesn't help you understand the scope of the problem and some potential solutions.

Peptize answered 22/10, 2012 at 18:21 Comment(3)
thank you for the link. It was very helpful in better understanding updates in MongoDB and the importance of atomic operations. I'm very new to document storage and the nosql scene. Coming from RDMS and mostly using ORMs (even tho I have extensive knowledge of SQL), I am poorly educated on best practices of database transactions in programming. Might you have suggestions for good resources on this topic?Dropsy
Most of my knowledge stems from a session at a local Mongo conference. I don't know of any online resource other than the one given in my answer. The Mongo community and staff are your best resource by far. I would recommend the mongo-user google group for more involved help than you can get from a stack overflow answer.Peptize
much appreciated. I believe your answer satisfies my original question very well.Dropsy

© 2022 - 2024 — McMap. All rights reserved.