Performance of NSManagedObjectContext save degrades dramatically

Asked 16/6, 2011 at 15:11 Answered 17/5, 2013 at 21:15

ios performance core-data scalability nsmanagedobjectcontext

I am having issues with a CoreData-based iOS app when it tries to build the initial DB from data sent from the server. Basically, the server sends down 1MB chunks of objects (about 3,000 per chunk), and the iOS client deserializes them and writes them into disk.

What I'm seeing is that everything is going pretty well for about the first 8 chunks (out of 44), then performance drops off dramatically and each chunk starts taking longer and longer, as in the image below. Pretty much all the time is consumed in [NSManagedObjectContext save] as you can see in the Instruments profiling data, but also it appears that the app is no longer running at 100% of CPU for some reason, like it's waiting on disk I/O or something.

profiling data showing performance degradation

A few important facts about how I'm doing this:

Each chunk is processed in its own NSManagedObjectContext with its own NSAutoreleasePool, so there is no object build-up in a non-flushed context between processing of chunks.
There is no NSUndoManager set on any of the contexts.
There is no mergeChangesFromContextDidSaveNotification: going on (i.e. the chunk contexts aren't pushing their changes into a "master" context)
I'm using a SQLite-based datastore on iOS 4.3.
The records being written do have indexes on them.
The entire sync job is processed on a single GCD background thread (i.e. dispatch_queue_create() and dispatch_async()).

I have no idea why the performance suddenly drops off like that or what can be done to address it. I have poked around and read the following, but nothing has jumped out at me yet:

Any ideas or pointers for making this app scale up to 100,000 records in the database would be much appreciated.

Edit - extra stats

This Instruments graph shows the same simulation as above (on iPad2), but includes the disk activity stats and you can see pretty plainly that all of the "not running at 100% CPU" time seems to be taken up with writing to disk.

Disk activity for original test

I also ran same sync attempt running on the iOS simulator. Overall memory usage is more or less constant for each chunk except for a dictionary that contains object IDs that grows slightly over time (but these are not CoreData objects or anything that would affect saves, they are just NSNumbers). This dict is a small amount of memory compared to the total heap and so the problem is not running out of memory.

What is interesting about this test is that the CoreData Save instrument reports that the successive saves take roughly the same amount of time, which obviously conflicts with the CPU profiling information from the first set of results. It seems like CoreData thinks it is taking the same amount of time to push changes to the DB, but the DB itself (i.e. SQLite) suddenly takes a lot longer to actually stream those changes to disk.

Glaser answered 16/6, 2011 at 15:11 Comment(4)

Not enough information. Having the rest of that sample available would help tremendously, but you've also omitted some important details such as the type of device you're running on, type of persistent store you're using, and what threading model you're using. – Tita 17/6, 2011 at 7:25

Hi there - the thread that is processing this data is created using the dispatch_async() GCD stuff, i.e. I create a separate "network" queue with dispatch_queue_create() and run this task using dispatch_async(). The profile collected above was run on an iPad 2, and as mentioned in the original post, I'm using a SQLite-based datastore (e.g. NSSQLiteStoreType). Thanks for the follow up. – Glaser 17/6, 2011 at 11:25

That doesn't sound inordinately bad, I think it might be time for bugreport.apple.com. File a radar with the instrument trace attached (the disk monitor is particularly interesting when run against the simulator). – Tita 17/6, 2011 at 15:57

Thanks for the input guys, I'll repost here when I hear back from Apple – Glaser 21/6, 2011 at 18:42

I know this is an old issue, so this is probably no longer relevant for you, but it may be to someone else.

I've seen performance issues seeding a Core Data database over iCloud and discovered that if you have inverse relationships on the data model you can be hurt incredibly badly performance wise. The way iCloud transaction logging has been implemented, it actually seems to be an inevitable problem. Each transaction sent to iCloud (have a look at them on developer.icloud.com - they're just zipped up plists) records every relationship that is affected by a change. Unlike when you modify one end of an relationship in Core Data, and it takes care of the inverse end, the core data transaction log ends recording the changes at BOTH ends, rather than it working it out.

So if you have a 1 to many relationship, and you create another record which will end up hanging off the 'many' end - well the record at the '1' end will also be updated to reflect the fact a new additional record is now hanging off it. If you have an architecture that means you have a 'type' object that lots of 'data' objects hang off, then every time you add a new data object, the type one is going to have a transaction written for it as well - but here's the kicker, because the iCloud Core Data transactions record the ENTIRE state of edited entities, not just the changes, EVERY relationship already recorded against it is also added to the log, not just the one indicating the new subordinate record. This can quickly spiral out of control as the amount of data written grows as the number of relationships between entities grows - it ends up taking longer and longer to save batches.

I've answered a question a bit like this before here on the Apple dev forums which might be useful as I never seem to be able to describe this succinctly.

The easiest option to improve seeding performance if this scenario is what is impacting you is to switch inverse relationships off, but this isn't always an option.

Brasher answered 17/5, 2013 at 21:15 Comment(0)

More information about your implementation would help. For example, do you run this on the main thread or are you implementing background threads? However, I have seen this behavior before. When performing extensive batch operations using Core Data, it can slow down if not memory managed properly. Have you checked memory usage? Have you checked for leaks? Another thing to try is to make sure you are using NSAutoreleasePool correctly if needed. By draining the pool periodically, that may help performance.

Fetiparous answered 16/6, 2011 at 22:53 Comment(4)

Thanks for the response. Please see above for edits, but in short, yes this entire operation is run on a background GCD thread with dispatch_async() and yes each chunk has its own NSAutoreleasePool and there are no reported leaks by Instruments, nor does memory usage grow in any significant way from one chunk to the next. – Glaser 17/6, 2011 at 14:22

Wow. I have not seen that before. That feels like a bug in iOS to me. Without seeing your code, it is hard to answer specifically. What specific iOS version are you running on the iPad (you mention 4.3, but 4.3.x?)? Have you tried resetting the iPad? – Fetiparous 17/6, 2011 at 14:48

Yeah I'm on 4.3.3 on the iPad. I haven't tried resetting the iPad in terms of restoring it to factory defaults, but I have definitely powered it down and up in between tests without impact on the performance. Also, the same trend is exhibited on the iOS Simulator (although everything runs about 10x faster) so I don't think it's something on the device that's buffering up or whatever. Thanks for the input thus far though! – Glaser 17/6, 2011 at 15:32

I agree with @Tita that you should probably file a bug report at this point. – Fetiparous 17/6, 2011 at 18:54

Edit - extra stats

Recommended topics

Hot tags