Memory leak with large Core Data batch insert in Swift
Asked Answered
P

1

12

I am inserting tens of thousands of objects into my Core Data entity. I have a single NSManagedObjectContext and I am calling save() on the managed object context every time I add an object. It works but while it is running, the memory keeps increasing from about 27M to 400M. And it stays at 400M even after the import is finished.

enter image description here

There are a number of SO questions about batch insert and everyone says to read Efficiently Importing Data, but it's in Objective-C and I am having trouble finding real examples in Swift that solve this problem.

Ptosis answered 16/8, 2015 at 10:34 Comment(0)
P
29

There are a few things you should change:

  • Create a separate NSPrivateQueueConcurrencyType managed object context and do your inserts asynchronously in it.
  • Don't save after inserting every single entity object. Insert your objects in batches and then save each batch. A batch size might be something like 1000 objects.
  • Use autoreleasepool and reset to empty the objects in memory after each batch insert and save.

Here is how this might work:

let managedObjectContext = NSManagedObjectContext(concurrencyType: NSManagedObjectContextConcurrencyType.PrivateQueueConcurrencyType)
managedObjectContext.persistentStoreCoordinator = (UIApplication.sharedApplication().delegate as! AppDelegate).persistentStoreCoordinator // or wherever your coordinator is

managedObjectContext.performBlock { // runs asynchronously

    while(true) { // loop through each batch of inserts

        autoreleasepool {
            let array: Array<MyManagedObject>? = getNextBatchOfObjects()
            if array == nil { break }
            for item in array! {
                let newObject = NSEntityDescription.insertNewObjectForEntityForName("MyEntity", inManagedObjectContext: managedObjectContext) as! MyManagedObject
                newObject.attribute1 = item.whatever
                newObject.attribute2 = item.whoever
                newObject.attribute3 = item.whenever
            }
        }

        // only save once per batch insert
        do {
            try managedObjectContext.save()
        } catch {
            print(error)
        }

        managedObjectContext.reset()
    }
}

Applying these principles kept my memory usage low and also made the mass insert faster.

enter image description here

Further reading

  • Efficiently Importing Data (old Apple docs link is broken. If you can find it, please help me add it.)
  • Core Data Performance
  • Core Data (General Assembly post)

Update

The above answer is completely rewritten. Thanks to @Mundi and @MartinR in the comments for pointing out a mistake in my original answer. And thanks to @JodyHagins in this answer for helping me understand and solve the problem.

Ptosis answered 16/8, 2015 at 10:34 Comment(8)
It seems in your code you are using the same managed object context, not a new one.Zarla
The managed object context gets recreated at every while loop in my example above. The while loop represents one batch of inserts, so a single batch uses the same managed object context, but the next batch creates a new one. My problem in the past was that I made the context a class property and never changed it.Ptosis
@Suragch: That depends on how the managedObjectContext property is implemented in the Application delegate, but the "usual" implementation is a lazy property which creates the context once for the lifetime of the app. In that case you are reusing the same context as Mundi said.Zebra
I wanted to ask more about the meaning of these comments so I opened a new question: Where should NSManagedObjectContext be created?Ptosis
Fantastic - I have everything except the autorelease pool. Thanks.Raynell
hi can you help me with this #50056453 thanksTinsley
@IraniyaNaynesh, I stopped using CoreData because I am developing for Android and iOS and it is easier to just use SQLite directly with both of them.Ptosis
"every block submitted through the perform(_:) method gets wrapped in a autorelease pool." You only need the autorelease pool for performAndWait.Painless

© 2022 - 2024 — McMap. All rights reserved.