Unexpectedly large Realm file size
Asked Answered
T

3

5

This question is about using two different ways to insert objects into a Realm. I noticed that the first method is a lot faster, but the size result is huge comparing with the second method. The diference between the two approaches is moving the write transaction outside vs inside of the for loop.

// Create realm file
let realm = try! Realm(fileURL: banco_url!)

When I add objects like this, the Realm file grows to 75.5MB:

try! realm.write {
    for i in 1...40000 {
        let new_realm_obj = realm_obj(value: ["id" : incrementID(),
                                              "a": "123",
                                              "b": 12.12,
                                              "c": 66,
                                              "d": 13.13,
                                              "e": 0.6,
                                              "f": "01100110",
                                              "g": DateTime,
                                              "h": 3])

        realm.add(new_realm_obj)
        print("🔹 \(i) Added")
    }
}

When I add objects like this, the Realm file only grows to 5.5MB:

for i in 1...40000 {
    let new_realm_obj = realm_obj(value: ["id" : incrementID(),
                                          "a": "123",
                                          "b": 12.12,
                                          "c": 66,
                                          "d": 13.13,
                                          "e": 0.6,
                                          "f": "01100110",
                                          "g": DateTime,
                                          "h": 3])
    try! realm.write {
        realm.add(new_realm_obj)
        print("🔹 \(i) Added")
    }
}

My Class to add to realm file

class realm_obj: Object {
    dynamic var id = Int()
    dynamic var a = ""
    dynamic var b = 0.0
    dynamic var c = Int8()
    dynamic var d = 0.0
    dynamic var e = 0.0
    dynamic var f = ""
    dynamic var g = Date()
    dynamic var h = Int8()
}

Auto increment function

func incrementID() -> Int {
    let realm = try! Realm(fileURL: banco_url!)
    return (realm.objects(realm_obj.self).max(ofProperty: "id") as Int? ?? 0) + 1
}

Is there a better or correct way to do this? Why do I get such different file sizes in these cases?

Titled answered 14/9, 2017 at 20:59 Comment(5)
Can you please add the code for ´incrementID()´? – Matt
@Matt just added – Titled
incrementID() edited... sorry for my copy and paste habit. – Titled
I guess you could try adding the items in batches of 5000 and see what happens – Strand
You can optimize it a bit if you call incrementID() once before the "for" and then use the result + i for the id when you create the object. Unless you're using multithreading and do more imports in the same time, this approach should save you a lot of calls to the realm dba. – Bangka
Z
9

The large file size when adding all of the objects in a single transaction is due to an unfortunate interaction between Realm's transaction log subsystem and Realm's memory allocation algorithm for large blobs. Realm's memory layout algorithm requires that the file size be at least 8x the size of the largest single blob stored in the Realm file. Transaction log entries, summarizing the modifications made during a single transaction, are stored as blobs within the Realm file.

When you add 40,000 objects in one transaction, you end up with a single transaction log entry that's around 5MB in size. This means that the file has to be at least 40MB in size in order to store it. (I'm not quite sure how it ends up being nearly twice that size again. It might be that the blob size is rounded up to a power of two somewhere along the line…)

When you add one object in 40,000 transactions, you still end up with a single transaction log entry only this time it's on a hundred or so bytes in size. This happens because when Realm commits a transaction, it attempts to first reclaim unused transaction log entries before allocating space for new entries. Since the Realm file is not open elsewhere, the previous entry can be reclaimed as each new commit is performed.

realm/realm-core#2343 tracks improving how Realm stores transaction log entries to avoid the significant overallocation you're seeing.

For now my suggestion would be to split the difference between the two approaches and add groups of objects per write transaction. This will trade off a little performance by increasing the number of commits but will reduce the impact of the memory layout algorithm by reducing the size of the largest transaction log entry you create. From a quick test, committing every 2,000 objects results in a file size of around 4MB, while being significantly quicker than adding each object in a separate write transaction.

Zahara answered 14/9, 2017 at 22:16 Comment(0)
T
1

You should in most cases try to minimize the number of write transactions. A write transaction has a significant overhead, hence if you start a new write transaction for every object you want to add to realm, your code will be significantly slower than if you added all objects using a single write transaction.

In my experience, the best way to add several elements to realm is to create the elements, add them to an array and then add the array as a whole to Realm using a single write transaction.

So this is what you should be doing:

var objects = [realmObj]()
for i in 1...40000{
    let newRealmObj = realmObj(value: ["id" : incrementID(), "a": "123","b": 12.12,"c": 66,"d": 13.13,"e": 0.6,"f": "01100110","g": DateTime, "h": 3])
    objects.append(newRealmObj)
}
try! realm.write {
    realm.add(objects)
}

As for the size issue, see the Limitations - File Size part of the Realm documentation. I am not 100% sure on the cause of the issue, but I would say that the issue is caused by writing code inside the write transaction that doesn't need to happen there and shouldn't happen inside the write transaction. I guess due to this, Realm creates a lot of intermediate versions of your objects and since releasing reserved storage capacity is quite an expensive operation, it doesn't happen by the time you are checking the file size.

Keep in mind, that the creation of objects doesn't need to happen inside a write transaction. You only need to create a write transaction for modifying persisted data in Realm (which includes adding new objects to Realm, deleting persisted objects and modifying persisted objects directly).

Twomey answered 14/9, 2017 at 21:12 Comment(1)
I did exactly the way you said. 75.5mb again! But it was even faster to write. I'm just wondering why so many mb to hold this. – Titled
T
0

Thanks everyone. I found an optimized way to do the task using your tips. I just did the .write, in batches instead of sending all the content in a single operation. Follows some data to compare:

Batch Size (Objects) | File Size (mb)

10.000 = 23.1mb
5.000 = 11.5mb
2.500 = 5.8mb
1.250 = 4.2mb
625 = 3.7mb
300 = 3.7mb
100 = 3.1mb
50 = 3.1mb
10 = 3.4mb
5 = 3.1mb

So in my humble opinion working with batches of 1000 is the best size / speed for this case.

Here is the code i used for this test. The only thing changed was the for 1...XXX interation.

    let realm = try! Realm(fileURL: banco_url!)

    var objects = [realm_obj]()
    var ids = incrementID()

    while (ids < 40000) {

        for i in 1...5{

            let new_realm_obj = realm_obj(value: ["id" : ids,
                                                "a": "123",
                                                "b": 12.12,
                                                "c": 66,
                                                "d": 13.13,
                                                "e": 0.6,
                                                "f": "01100110",
                                                "g": someDateTime,
                                                "h": 3])
            objects.append(new_realm_obj)
            ids += 1 
        }

        try! realm.write {
            realm.add(objects)
        }
    }
Titled answered 15/9, 2017 at 12:40 Comment(3)
It sounds like my answered solved your problem, so I'd ask that you please mark it as accepted. – Zahara
@bdash.. Thanks! – Titled
I think need some correction in code. You may be initiating the objects array again inside while or beginning of for loop, otherwise I think it will be adding and writing again same objects. – Reginiaregiomontanus

© 2022 - 2024 — McMap. All rights reserved.