Implementing Fast and Efficient Core Data Import on iOS 5
Asked Answered
B

1

101

Question: How do I get my child context to see changes persisted on the parent context so that they trigger my NSFetchedResultsController to update the UI?

Here's the setup:

You've got an app that downloads and adds lots of XML data (about 2 million records, each roughly the size of a normal paragraph of text) The .sqlite file becomes about 500 MB in size. Adding this content into Core Data takes time, but you want the user to be able to use the app while the data loads into the data store incrementally. It's got to be invisible and imperceptible to the user that large amounts of data are being moved around, so no hangs, no jitters: scrolls like butter. Still, the app is more useful, the more data is added to it, so we can't wait forever for the data to be added to the Core Data store. In code this means I'd really like to avoid code like this in the import code:

[[NSRunLoop currentRunLoop] runUntilDate:[NSDate dateWithTimeIntervalSinceNow:0.25]];

The app is iOS 5 only so the slowest device it needs to support is an iPhone 3GS.

Here are the resources I've used so far to develop my current solution:

Apple's Core Data Programming Guide: Efficiently Importing Data

  • Use Autorelease Pools to keep the memory down
  • Relationships Cost. Import flat, then patch up relationships at the end
  • Don't query if you can help it, it slows things down in an O(n^2) manner
  • Import in Batches: save, reset, drain and repeat
  • Turn off the Undo Manager on import

iDeveloper TV - Core Data Performance

  • Use 3 Contexts: Master, Main and Confinement context types

iDeveloper TV - Core Data for Mac, iPhone & iPad Update

  • Running saves on other queues with performBlock makes things fast.
  • Encryption slows things down, turn it off if you can.

Importing and Displaying Large Data Sets in Core Data by Marcus Zarra

  • You can slow down the import by giving time to the current run loop, so things feel smooth to the user.
  • Sample Code proves that it is possible to do large imports and keep the UI responsive, but not as fast as with 3 contexts and async saving to disk.

My Current Solution

I've got 3 instances of NSManagedObjectContext:

masterManagedObjectContext - This is the context that has the NSPersistentStoreCoordinator and is responsible for saving to disk. I do this so my saves can be asynchronous and therefore very fast. I create it on launch like this:

masterManagedObjectContext = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSPrivateQueueConcurrencyType];
[masterManagedObjectContext setPersistentStoreCoordinator:coordinator];

mainManagedObjectContext - This is the context the UI uses everywhere. It is a child of the masterManagedObjectContext. I create it like this:

mainManagedObjectContext = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSMainQueueConcurrencyType];
[mainManagedObjectContext setUndoManager:nil];
[mainManagedObjectContext setParentContext:masterManagedObjectContext];

backgroundContext - This context is created in my NSOperation subclass that is responsible for importing the XML data into Core Data. I create it in the operation's main method and link it to the master context there.

backgroundContext = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSConfinementConcurrencyType];
[backgroundContext setUndoManager:nil];
[backgroundContext setParentContext:masterManagedObjectContext];

This actually works very, VERY fast. Just by doing this 3 context setup I was able to improve my import speed by over 10x! Honestly, this is hard to believe. (This basic design should be part of the standard Core Data template...)

During the import process I save 2 different ways. Every 1000 items I save on the background context:

BOOL saveSuccess = [backgroundContext save:&error];

Then at the end of the import process, I save on the master/parent context which, ostensibly, pushes modifications out to the other child contexts including the main context:

[masterManagedObjectContext performBlock:^{
   NSError *parentContextError = nil;
   BOOL parentContextSaveSuccess = [masterManagedObjectContext save:&parentContextError];
}];

Problem: The problem is that my UI will not update until I reload the view.

I have a simple UIViewController with a UITableView that is being fed data using a NSFetchedResultsController. When the Import process completes, the NSFetchedResultsController see's no changes from the parent/master context and so the UI doesn't automatically update like I'm used to seeing. If I pop the UIViewController off the stack and load it again all the data is there.

Question: How do I get my child context to see changes persisted on the parent context so that they trigger my NSFetchedResultsController to update the UI?

I have tried the following which just hangs the app:

- (void)saveMasterContext {
    NSNotificationCenter *notificationCenter = [NSNotificationCenter defaultCenter];    
    [notificationCenter addObserver:self selector:@selector(contextChanged:) name:NSManagedObjectContextDidSaveNotification object:masterManagedObjectContext];

    NSError *error = nil;
    BOOL saveSuccess = [masterManagedObjectContext save:&error];

    [notificationCenter removeObserver:self name:NSManagedObjectContextDidSaveNotification object:masterManagedObjectContext];
}

- (void)contextChanged:(NSNotification*)notification
{
    if ([notification object] == mainManagedObjectContext) return;

    if (![NSThread isMainThread]) {
        [self performSelectorOnMainThread:@selector(contextChanged:) withObject:notification waitUntilDone:YES];
        return;
    }

    [mainManagedObjectContext mergeChangesFromContextDidSaveNotification:notification];
}
Brainy answered 10/5, 2012 at 21:3 Comment(6)
+1000000 for the best formed, most prepared question ever. I have an answer too... It will take a few minutes to type it up though...Reprisal
When you say the app is hung, where is it? What's it doing?Reprisal
Sorry to bring up this after a long time. Can you please clarify what does "Import flat, then patch up relationships at the end" mean? Don't you still have to have that objects in memory in order to establish relationships? I'm trying to implement a solution very similar to yours and I could really use some help to lower the memory footprint.Sailing
See the Apple Docs linked to a the first of this article. It explains this. Good luck!Brainy
That's a great setup involving the 3 MOCs, I implemented it in my project, however may I ask how do you avoid the blocking of the persistent store while the masterManagedObjectContext is saving? I'd really appreciate your insight #14341117Whirlybird
Really good question and I picked up a few neat tricks from the description you provided of your setupLacielacing
R
47

You should probably save the master MOC in strides as well. No sense having that MOC wait until the end to save. It has its own thread, and it will help keep memory down as well.

You wrote:

Then at the end of the import process, I save on the master/parent context which, ostensibly, pushes modifications out to the other child contexts including the main context:

In your configuration, you have two children (the main MOC and the background MOC), both parented to the "master."

When you save on a child, it pushes the changes up into the parent. Other children of that MOC will see the data the next time they perform a fetch... they are not explicitly notified.

So, when BG saves, its data is pushed to MASTER. Note, however, that none of this data is on disk until MASTER saves. Furthermore, any new items will not get permanent IDs until the MASTER saves to disk.

In your scenario, you are pulling the data into the MAIN MOC by merging from the MASTER save during the DidSave notification.

That should work, so I'm curious as to where it is "hung." I will note, that you are not running on the main MOC thread in the canonical way (at least not for iOS 5).

Also, you probably only are interested in merging changes from the master MOC (though your registration looks like it is only for that anyway). If I were to use the update-on-did-save-notification, I'd do this...

- (void)contextChanged:(NSNotification*)notification {
    // Only interested in merging from master into main.
    if ([notification object] != masterManagedObjectContext) return;

    [mainManagedObjectContext performBlock:^{
        [mainManagedObjectContext mergeChangesFromContextDidSaveNotification:notification];

        // NOTE: our MOC should not be updated, but we need to reload the data as well
    }];
}

Now, for what may be your real issue regarding the hang... you show two different calls to save on the master. the first is well protected in its own performBlock, but the second is not (though you may be calling saveMasterContext in a performBlock...

However, I'd also change this code...

- (void)saveMasterContext {
    NSNotificationCenter *notificationCenter = [NSNotificationCenter defaultCenter];    
    [notificationCenter addObserver:self selector:@selector(contextChanged:) name:NSManagedObjectContextDidSaveNotification object:masterManagedObjectContext];

    // Make sure the master runs in it's own thread...
    [masterManagedObjectContext performBlock:^{
        NSError *error = nil;
        BOOL saveSuccess = [masterManagedObjectContext save:&error];
        // Handle error...
        [notificationCenter removeObserver:self name:NSManagedObjectContextDidSaveNotification object:masterManagedObjectContext];
    }];
}

However, note that the MAIN is a child of MASTER. So, it should not have to merge the changes. Instead, just watch for the DidSave on the master, and just refetch! The data is sitting in your parent already, just waiting for you to ask for it. That's one of the benefits of having the data in the parent in the first place.

Another alternative to consider (and I'd be interested to hear about your results -- that's a lot of data)...

Instead of making the background MOC a child of the MASTER, make it a child of the MAIN.

Get this. Every time the BG saves, it automatically gets pushed into the MAIN. Now, the MAIN has to call save, and then the master has to call save, but all those are doing is moving pointers... until the master saves to disk.

The beauty of that method is that the data goes from the background MOC straight into your applications MOC (then passes through to get saved).

There is some penalty for the pass-through, but all the heavy lifting gets done in the MASTER when it hits the disk. And if you kick those saves on the master with performBlock, then main thread just sends off the request, and returns immediately.

Please let me know how it goes!

Reprisal answered 11/5, 2012 at 2:37 Comment(12)
Excellent answer. I'll try these ideas today and see what I discover. Thank you!Brainy
Awesome! That worked perfectly! Still, I'm going to try your suggestion of MASTER -> MAIN -> BG and see how that performance works out, that seems like a very interesting idea. Thank you for the great ideas!Brainy
Great! Please post back with your results. Also, note the recent edit -- user performBlockAndWait in the DidSave notification handler instead of performBlock...Reprisal
? performBlockAndWait hangs in the mergeChangesFromContextDidSaveNotification but performBlock works, so I'm sticking with what works. :-)Brainy
performBlockAndWait is, itself, re-entrant. I'd like to see your call stack when it is hung there. You can use NSLog(@"%@",[NSThread callStackSymbols]); to dump the current call stack.Reprisal
Updated to change the performBlockAndWait to performBlock. Not sure why this popped up again in my queue, but when I read it this time, it was obvious... not sure why I let it go before. Yes, performBlockAndWait is re-entrant. However, in a nested environment like this, you can not call the synchronous version on a child context from within a parent context. The notification can be (in this case is) sent from the parent context, which can cause a deadlock. I hope that is clear to any who come along and read this later. Thanks, David.Reprisal
@DavidWeiss Have you tried MASTER -> MAIN -> BG? I'm interested in this design pattern and hope to know if it works well for you. Thank you.Assemble
One suggestion I'd make though. Since you are setting the masterManagedObjectContext as the notification object in the register for notification method, you don't need the check in contextChanged to see if it's the masterManagedObjectContext before calling performBlock.Etz
Nobody has mentioned which version of iOS they are targeting. I'd be interested to know if this approach is working well on iOS 5 and on 6.Phidias
The issue with MASTER -> MAIN -> BG pattern is when you fetches from BG context, it will also fetch from MAIN and that will block UI and make you app not responsiveDeflation
Another issue (which I've had) with MASTER->MAIN->BG is that anything you create in BG will not have a permanent ID until MASTER is saved.Semiotic
@iOSDevil, as NSManagedObjectContextConcurrencyType was introduced in iOS 5, this approach is for iOS 5 and higher.Eye

© 2022 - 2024 — McMap. All rights reserved.