Core Data: "Allows External Storage" performance on large files
Asked Answered
F

2

5

I'm trying to understand the behavior of Allows External Storage property of Core Data attributes and see if it would save me from storing files in the file system manually. I wanted to see how would it perform when dealing with really large files. For that, I created a dummy project and stored a large file (2 GB) using Core Data. Then, I monitored the memory usage as I fetch and process the data, and to my surprise, it did not exceed 48 MB! Why is that? Does it fetch the data in chunks? If so, how? Does the Data struct have APIs that allow Core Data to do that?

More details of what I did:

  1. Created an entity File with only two attributes, fileName(String) and data(Data).

enter image description here

  1. Checked the Allows External Storage property for the data attribute.
  2. Stored a 2 GB file in the File entity. I put this code in viewDidLoad method to do that.

    do {
        // Store file
        let fileURL = Bundle.main.url(forResource: "RawData/LargeFile", withExtension: nil)!
        let file = File(context: AppDelegate.viewContext)
        file.name = fileName
        file.data = try Data(contentsOf: fileURL)
        try AppDelegate.viewContext.save()
    
    } catch {
        print(error.localizedDescription)
    }
    
  3. Closed the app, and relaunch it with new code in viewDidLoad that fetches and processes the large file's data.

    let fileData = File.files(named: name).first!.data!
    DispatchQueue.global(qos: .userInteractive).async {
        let result = self.process(data: fileData)
        print("The result: \(result)") 
    }
    
    • The files static method of the File class returnes all files in the File entity.

    And here is the process method, which loopes through the data, byte by byte, reading and XORing then returning the result. It can be any method really, the important thing here is to read all the bytes of the data.

    private func process(data: Data) -> UInt8 {
        print("Processsing \(data.count) bytes")
        var accumulator: UInt8 = 0
        for byte in data {
            accumulator ^= byte
        }
        return accumulator
    }
    
  4. I monitored the memory usage.

I'm pretty sure it has something to do with Core Data and not Data since doing the same steps when loading the data from the disk (Data(contentsOf: URL)) will result in 3+ GB memory usage (also, why the additional 1 GB?).

Lastly, is there any reason to not use Allows External Storage feature and instead, store files manually in the file system? I know this question has been discussed a lot; but most of the points I have read that suggest using the manual way were mentioning performance issues with Core Data, even though my little experiment shows that Core Data performs well.

Any help would be appreciated!

Fargone answered 6/8, 2019 at 8:59 Comment(2)
If you are saving all of data to variable how do you expect to not get 3gb of RAM used?Spermiogenesis
@Spermiogenesis Well maybe you are right, but I expect a little more than 2 GB file for the overhead of managing the large data, not 1 GB more!Fargone
C
4

Relational databases in general are not good are storing and retrieving large blobs of data. Anything larger than a megabyte shouldn't be stored in a database. Core-data makes the problem even worse. If you were accessing a database directly you can fetch only particular columns, but as core-data turns rows into objects, and columns into properties you don't have such control over what you fetch. When you set a property to Allows External Storage Core-data will store the large blob of data in the file system and only load it when you access the property. This is great for many cases as it is easy and can greatly improve performance.

The problem is that accessing such a property may have a large unexpected cost of loading a large file that is not clear from just accessing a property. If instead you stored a filename and explicitly had a second step of loading the file from the disk, it would be clear when you are loading the data. Also, if this data can be recovered from the Internet (they are downloaded image from imageURL for example), it might be better to manage that outside of core data as you can manage a cache which would be hard to do in core-data.

Clorindaclorinde answered 18/8, 2019 at 16:14 Comment(2)
Thank you for your answer! The problem is that accessing such a property may have a large unexpected cost of loading a large file But as I showed in my question, loading and processing a 2GB file does not use more than 48 MB of memory.Fargone
I understood that you just faulted the entity. Did you access the property that contained this large data?Clorindaclorinde
Y
2

The observed behaviour of CoreData can be caused by:

  1. Memory-Mapped files
  2. Lazy loading

When you simply fetch a managed object entity from CoreData, it is not really read from the storage until you access it (CoreData calls it faulting)

But when you finally access the property that contains the large data, CoreData needs to load the data stored in the external file. I think CoreData is doing a smart optimisation here using Memory-Mapped file. Instead of reading the file byte by byte (or block by block) from disk to a large UInt8 array and then wrap a Data object around it, CoreData is asking the operating system to map the entire content of the file to a memory region. CoreData would then wrap a Data object around that memory region.

When memory-mapped files are created, operating system will not load the entire file into memory. Operating system will use the same paging mechanism it uses for Virtual Memory. At any point of time, operating system will only load those portion of the file that is being read/written to.

But from the view point of application or any code you write, the underlying buffer of the Data object behaves as if the entire file is in the buffer. Even the Data object will not be able to know that the buffer does not contain all bytes of the file.

For more information, read about Memory Mapped Files from Wikipedia and Mapping Files Into Memory from Apple documentation

Ylla answered 21/6, 2023 at 7:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.