Core Data Multithreading Import (Duplicate Objects)
Asked Answered
C

3

13

I have an NSOperationQueue that imports objects into Core Data that I get from a web api. Each operation has a private child managedObjectContext of my app's main managedObjectContext. Each operation takes the object to be imported and checks whether the object already exists in which case it updates the existing object. If the object doesn't exist it creates this new object. These changes on the private child contexts are then propagated up to the main managed object context.

This setup has worked very well for me, but there is a duplicates issue.

When I've got the same object being imported in two different concurrent operations I get duplicate objects that have the exact same data. (They both check to see if the object exists, and it doesn't appear to them to already exist). The reason i'll have 2 of the same objects importing at around the same time is that I'll often be processing a "new" api call as well as a "get" api call. Due to the concurrently asynchronous nature of my setup, it's hard to ensure that I won't ever have duplicate objects attempting to import.

So my question is what is the best way to solve this particular issue? I thought about limiting imports to max concurrent operations to 1 (I don't like this because of performance). Similarly I've considering requiring a save after every import operation and trying to handle merging of contexts. Also, i've considered grooming the data afterwards to occasionally clean up duplicates. And finally, i've considered just handling the duplicates on all fetch requests. But none of these solutions seem great to me, and perhaps there is an easy solution I've over looked.

Confession answered 14/8, 2013 at 21:11 Comment(1)
Excellent question. I'm facing the same problem and was about to ask this question myself.Sexed
D
5

So the problem is:

  • contexts are a scratchpad — unless and until you save, changes you make in them are not pushed to the persistent store;
  • you want one context to be aware of changes made on another that hasn't yet been pushed.

To me it doesn't sound like merging between contexts is going to work — contexts are not thread safe. Therefore for a merge to occur nothing else can be ongoing on the thread/queue of the other context. You're therefore never going to be able to eliminate the risk that a new object is inserted while another context is partway through its insertion process.

Additional observations:

  • SQLite is not thread safe in any practical sense;
  • hence all trips to the persistent store are serialised regardless of how you issue them.

Bearing in mind the problem and the SQLite limitations, in my app we've adopted a framework whereby the web calls are naturally concurrent as per NSURLConnection, subsequent parsing of the results (JSON parsing plus some fishing into the result) occurs concurrently and then the find-or-create step is channeled into a serial queue.

Very little processing time is lost by the serialisation because the SQLite trips would be serialised anyway, and they're the overwhelming majority of the serialised stuff.

Diatribe answered 14/8, 2013 at 21:48 Comment(3)
Yes, this is useful information. I've solved some of the SQL issues you are talking about by how I've set up my Core Data stack. I have a private context of type NSPrivateQueueConcurrencyType this context's only job is to write to the persistent store. From this context, i have a child context of type NSMainQueueConcurrencyType that i use as my app's main context. The beauty of this setup is I can control when I write to my persistent store. My setup follows this setup if anyone is interested : cocoanetics.com/2012/07/multi-context-coredataConfession
On this line of thought, it's not really what the functionality is meant to be there for but I've found that using UIApplication -beginBackgroundTaskWithExpirationHandler: when the application resigns active and doing expensive blocking Core Data work in there is perfectly acceptable from Apple's point of view. Just make sure it's interruptible in case your application becomes active again. That's where we do our deletions. If you're able to defer writing to disk then that's probably a really good opportunity.Diatribe
This is wonderful advice! I hadn't thought about this for deletions. Thanks.Confession
P
3

Start by creating dependences between your operations. Make sure one can't complete until its dependency does.

Check out http://developer.apple.com/library/mac/documentation/Cocoa/Reference/NSOperation_class/Reference/Reference.html#//apple_ref/occ/instm/NSOperation/addDependency:

Each operation should call save when it finished. Next, I would try the Find-Or-Create methodology suggested here:

https://developer.apple.com/library/ios/documentation/Cocoa/Conceptual/CoreData/Articles/cdImporting.html

It'll solve your duplicates problem, and can probably result in you doing less fetches (which are expensive and slow, thus drain battery quickly).

You could also create a global child context to handle all of your imports, then merge the whole huge thing at the end, but it really comes down to how big the data set is and your memory considerations.

Pozzuoli answered 14/8, 2013 at 21:16 Comment(6)
So I believe i'm following the pattern suggested in the Find-Or-Create methodology. My issue with duplicates only ever occurs because I'm doing two of these imports at the same time. I should have been more clear, but a single operation in my setup could be handling an entire array of a particular object. For example, one operation would be handling an array returned from a "get" request, and another operation would be handling a single object returned from a "new" request.Confession
See edit above, but I think using addDependency: on your NSOperation subclasses is critical if you want to be able to process more than one item at a time. You could, of course just go down to 1 concurrent operation, but what you really have here is a dependency problem.Pozzuoli
So you are suggesting making operations that are importing the same entity type to be dependent operations? This would still allow for concurrent operations for when objects are of different type, but require an order where there could be an issue? I think I like this suggestion. I'll explore this some more. Thanks for your suggestions!Confession
So one downside to using a dependent operation solution is that if you have more complicated objects you are importing (Ones that have relationships of other object types), you would have to additionally walk those relationships to determine operation dependence. This can start to get pretty hairy.Confession
In Regards to your global child context suggestion. I think this may have issues because managed object contexts are not thread safe. And you'd be accessing the same context from multiple threads. Remember each operation is on a different thread. (Unless of course I'm mistaken, in which case this solution would likely solve my exact problem.)Confession
It's not the most elegant solution, but yes, you can store another managedObjectContext somewhere (say on your App Delegate). You could even put it in an accessor method that does an NSAssert to ensure you're never crossing thread boundaries. Yes, figuring out the dependencies is VERY hairy, which makes me wonder if there's a way to optimize your API. Perfect REST compliance is great, unless the client becomes super convoluted to consume it ;) But you're definitely not short on optionsPozzuoli
S
2

I've been struggling with the same issue for a while now. The discussion on this question so far has given me a few ideas, which I will share now.

Please note that this is essentially untested since in my case I only see this duplicate issue very rarely during testing and there's no obvious way for me to reproduce it easily.

I have the same CoreData stack setup - A master MOC on a private queue, which has a child on the main queue and it used as the app's main context. Finally, bulk import operations (find-or-create) are passed off onto a third MOC using a background queue. Once the operation is complete saves are propagated up to the PSC.

I've moved all my Core Data stack from the AppDelegate to a separate class (AppModel) that provides the app with access to the aggregate root object of the domain (the Player) and also a helper function for performing background operations on the model (performBlock:onSuccess:onError:).

Luckily for me, all the major CoreData operations are funnelled through this method so if I can ensure that these operations are run serially then the duplicate problem should be solved.

- (void) performBlock: (void(^)(Player *player, NSManagedObjectContext *managedObjectContext)) operation onSuccess: (void(^)()) successCallback onError:(void(^)(id error)) errorCallback
{
    //Add this operation to the NSOperationQueue to ensure that 
    //duplicate records are not created in a multi-threaded environment
    [self.operationQueue addOperationWithBlock:^{

        NSManagedObjectContext *managedObjectContext = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSPrivateQueueConcurrencyType];
        [managedObjectContext setUndoManager:nil];
        [managedObjectContext setParentContext:self.mainManagedObjectContext];

        [managedObjectContext performBlockAndWait:^{

            //Retrive a copy of the Player object attached to the new context
            id player = [managedObjectContext objectWithID:[self.player objectID]];
            //Execute the block operation
            operation(player, managedObjectContext);

            NSError *error = nil;
            if (![managedObjectContext save:&error])
            {
                //Call the error handler
                dispatch_async(dispatch_get_main_queue(), ^{
                    NSLog(@"%@", error);
                    if(errorCallback) return errorCallback(error);
                });
                return;
            }

            //Save the parent MOC (mainManagedObjectContext) - WILL BLOCK MAIN THREAD BREIFLY
            [managedObjectContext.parentContext performBlockAndWait:^{
                NSError *error = nil;
                if (![managedObjectContext.parentContext save:&error])
                {
                    //Call the error handler
                    dispatch_async(dispatch_get_main_queue(), ^{
                        NSLog(@"%@", error);
                        if(errorCallback) return errorCallback(error);
                    });
                    return;
                }
            }];

            //Attempt to clear any retain cycles created during operation
            [managedObjectContext reset];

            //Call the success handler
            dispatch_async(dispatch_get_main_queue(), ^{
                if (successCallback) return successCallback();
            });
        }];
    }];
}

What I've added here that I hope is going to resolve the issue for me is wrapping the whole thing in addOperationWithBlock. My operation queue is simply configured as follows:

single.operationQueue = [[NSOperationQueue alloc] init];
[single.operationQueue setMaxConcurrentOperationCount:1];

In my API class, I might perform an import on my operation as follows:

- (void) importUpdates: (id) methodResult onSuccess: (void (^)()) successCallback onError: (void (^)(id error)) errorCallback
{
    [_model performBlock:^(Player *player, NSManagedObjectContext *managedObjectContext) {
        //Perform bulk import for data in methodResult using the provided managedObjectContext
    } onSuccess:^{
        //Call the success handler
        dispatch_async(dispatch_get_main_queue(), ^{
            if (successCallback) return successCallback();
        });
    } onError:errorCallback];
}

Now with the NSOperationQueue in place it should no longer be possible for more than one batch operation to take place at the same time.

Sexed answered 15/8, 2013 at 11:46 Comment(1)
Yes, this is a confirmed solution to the specific issue that I'm having. For me, not allowing the queue to have more than 1 concurrent operations is too much of a performance hit. But for many people this may be a great solution.Confession

© 2022 - 2024 — McMap. All rights reserved.