Adding a PrimaryKey to a Realm with tons of duplicate data
Asked Answered
H

2

9

I need to add a @PrimaryKey to two Realm models that are missing it due to idiocy. The models are referenced in multiple other models via direct relationships or RealmLists, one of both models also references the other model.

My first thought was to rename the schemas in a migration and copy over the data by hand, but then Realm complains that the schema is linked in other schemas and can't be renamed.

Both schemas contain around 15000 objects that can be condensed to about 100, they are absolutely identical and have been duplicated due to the missing @PrimaryKey.

The models themselves are kinda simple:

class ModelA extends RealmObject {
     String primaryKey; // Is missing the @PrimaryKey annotation
     String someField;
     String someOtherField;
     Date someDate;
     ModelB relationToTheOtherProblematicModel;
}

class ModelB extends RealmObject {
    String primaryKey; // Is also missing the @PrimaryKey annotation
    // this class only contains String fields and one Date field
}

How can I migrate the data when I add @PrimaryKey to both classes' primaryKey field?

Edit to clarify:

Both schemas contain multiple completely identical items.

primaryKey | someField | someOtherField
------     | ------    | ------
A          | foo       | bar
A          | foo       | bar
A          | foo       | bar
A          | foo       | bar
B          | bar       | foo
B          | bar       | foo
B          | bar       | foo
C          | far       | boo
C          | far       | boo
C          | far       | boo

These duplicates can be removed since primaryKey uniquely identifies them. When I add the @PrimaryKey annotation and do a migration Realm obviously complains about the duplicate values. I need to remove those duplicates without destroying the links in other models.

Holmquist answered 26/8, 2016 at 8:38 Comment(2)
Did you solve this by anyway?Interpenetrate
Nope, I never found a way to solve this.Holmquist
H
0

Did you try something like this:

RealmConfiguration config = new RealmConfiguration.Builder(this)
            .schemaVersion(6) //the new schema version
            .migration(new RealmMigration() {
                @Override
                public void migrate(DynamicRealm realm, long oldVersion, long newVersion) {

                    RealmSchema schema = realm.getSchema();
                    schema.get("ClassA").addPrimaryKey("primaryKey");
                    schema.get("ClassB").addPrimaryKey("primaryKey");
                }
            })
            .build();
Realm.setDefaultConfiguration(config);

Edit:
I made an edit based on this. These are the following steps that should solve this:
1. Create the new field, do not mark it as a primary key yet.
2. Set the new field to a unique value for each instance using transform
3. Add an index to the new field.
4. Make the new field a primary key.

RealmConfiguration config = new RealmConfiguration.Builder(this)
            .schemaVersion(6) //the new schema version
            .migration(new RealmMigration() {
                @Override
                public void migrate(DynamicRealm realm, long oldVersion, long newVersion) {

                    RealmSchema schema = realm.getSchema();
                    schema.get("ClassA").addField("newKey", String.class)
                        .transform(new RealmObjectSchema.Function() {
                            @Override
                            public void apply(DynamicRealmObject obj) {
                                obj.set("newKey", obj.getString("primaryKey"));
                            }
                        })
                        .addIndex("newKey")
                        .addPrimaryKey("newKey");

                    schema.get("ClassB").addField("newKey", String.class)
                        .transform(new RealmObjectSchema.Function() {
                            @Override
                            public void apply(DynamicRealmObject obj) {
                                obj.set("newKey", obj.getString("primaryKey"));
                            }
                        })
                        .addIndex("newKey")
                        .addPrimaryKey("newKey");
                }
            })
            .build();
        Realm.setDefaultConfiguration(config);
Hellenistic answered 26/8, 2016 at 8:56 Comment(8)
This won't work because there is duplicate data: D/REALM: jni: ThrowingException 3, Field "primaryKey" cannot be a primary key, it already contains duplicate values: 001CAED7931EAB6C96CEC7DE4EC7F5C1FA08B7D0, .Holmquist
That is only relevant when each item in the schema can get a new, unique primary key when the existing field sometimes has null as a value. I already have items in my schema that have a filled "primary key" that can't be changed, but appears multiple times in completely identical items. Those duplicate items should be removed and only one instance of each should remain.Holmquist
but the goal here is to have the newKey now as a primary key, you can delete the primaryKey field afterwards. the obj.set("newKey", obj.getString("primaryKey")); gets the old primaryKey value and adds it to the newKey so it won't be changed. The only difference is that you'll have to use the newKey as a primary keyHellenistic
But how does that help? After transforming the first row, the second row will fail because of a duplicate value in "newKey".Holmquist
Based on this adding an index would solve it and it wont fail.Hellenistic
I don't see where you are going with this... .addPrimaryKey("newKey") at the end of the migration will fail because newKey will contain all the old primaryKey values, which are multiple duplicates. "3. Set the new field to a unique value for each instance using transform" <- I can't do that, because primaryKey contains duplicates.Holmquist
"Is there a way to update RealmObject and delete duplicate values generated by migration?" This was the question of the link that i provided you. This solution helped. Indexing the field somehow deleted the duplicates (don't know how). You can definitely give it a tryHellenistic
Okay, just to be safe I tried this. Sadly, Realm complains about duplicate values when doing addPrimaryKey("newKey"), so the indexing doesn't somehow delete duplicates. The last issue you linked doesn't have the problem with duplicate values, they just can't directly add a new field as a primary key because it would contain null for each row, causing the duplicate value error. Because of this they must add the field, then add the unique values and only then make it a primary key.Holmquist
S
0

Have you tried deleting the duplicated objects before migrating? For example in your migration class you could do something like...

RealmMigration migration = new RealmMigration() {
    @Override
    public void migrate(DynamicRealm realm, long oldVersion, long newVersion) {

        RealmResults<ModelA> modelAs = realm.where(ModelA.class)
                                            .equals("primaryKey", "whatever")
                                            .findAll();

        for (int i = 1; i < modelAs.size(); i++) {
            modelAs.get(i).deleteFromRealm();
        }

        // Migration code...
        if (oldVersion == 1) {
            ...
        }
    }
}

Then you would have only 1 element with each primaryKey so migration could be performed.

Suburbanize answered 26/8, 2016 at 9:50 Comment(2)
This would work when I have a distinct list of primary keys to query. Doable, but I guess this would destroy the relations in other models referencing those entries. I wasn't able to find any documentation on how Realm actually handles those relationships.Holmquist
I don't know realm but it seems you have a few steps for each set of duplicates. Pick one to keep. For each other find all references and update them to the keeper. Then delete the row that now has no references. When complete for all rows, set the primary key index.Krp

© 2022 - 2024 — McMap. All rights reserved.