Core data find-or-create most efficient way
Asked Answered
L

3

9

I have around 10000 objects of entity 'Message'. When I add a new 'Message' i want to first see whether it exists - and if it does just update it's data, but if it doesn't to create it.

Right now the "find-or-create" algorithm works with by saving all of the Message objects 'objectID' in one array and then filtering through them and getting the messages with existingObjectWithID:error:

This works fine but in my case when I fetch an 'Message' using existingObjectWithID: and then try to set and save a property by setting the property of the 'Message' object and calling save: on it's context it doesn't saves it properly. Has anyone come across a problem like this?

Is there a more efficient way to make find-or-create algorithm?

Locris answered 7/3, 2014 at 10:5 Comment(8)
So for each message you're looping through all objectIDs and calling existingObjectWithID:error:, then checking the message content until you find a match?Lophobranch
Do you save your context where you create the missing objects before you call existingObjectWithID: method?Astyanax
@Wain, yes I do it like this.Locris
@dariaa, I call existingObjectWithID: after I create the missing objects (that way I predict whether there is a missing object or it already exists)Locris
And do you use the same context to create objects and to fetch them with existingObjectWithID:?Astyanax
You should look at (batch) fetching messages rather than iterating and loading all messages.Lophobranch
@wain i'm fetching their objectIDs so it's fine.Locris
You said you were looping through all object ids. Fetching specific messages would be faster than faulting all messages to check contents...Lophobranch
L
11

First, Message is a "bad" name for a CoreData entity as apple use it internally and it cause problems later in development.
You can read a little more about it HERE

I've noticed that all suggested solutions here use an array or a fetch request.
You might want to consider a dictionary based solution ...

In a single threaded/context application this is accomplished without too much of a burden by adding to cache (dictionary) the newly inserted objects (of type Message) and pre-populating the cache with existing object ids and keys mapping.

Consider this interface:

@interface UniquenessEnforcer : NSObject

@property (readonly,nonatomic,strong) NSPersistentStoreCoordinator* coordinator;
@property (readonly,nonatomic,strong) NSEntityDescription* entity;
@property (readonly,nonatomic,strong) NSString* keyProperty;
@property (nonatomic,readonly,strong) NSError* error;

- (instancetype) initWithEntity:(NSEntityDescription *)entity
                    keyProperty:(NSString*)keyProperty
                    coordinator:(NSPersistentStoreCoordinator*)coordinator;

- (NSArray*) existingObjectIDsForKeys:(NSArray*)keys;
- (void) unregisterKeys:(NSArray*)keys;
- (void) registerObjects:(NSArray*)objects;//objects must have permanent objectIDs
- (NSArray*) findOrCreate:(NSArray*)keys
                  context:(NSManagedObjectContext*)context
                    error:(NSError* __autoreleasing*)error;
@end

flow:

1) on application start, allocate a "uniqueness enforcer" and populate your cache:

//private method of uniqueness enforcer
- (void) populateCache
{
    NSManagedObjectContext* context = [[NSManagedObjectContext alloc] init];
    context.persistentStoreCoordinator = self.coordinator;

    NSFetchRequest* r = [NSFetchRequest fetchRequestWithEntityName:self.entity.name];
    [r setResultType:NSDictionaryResultType];

    NSExpressionDescription* objectIdDesc = [NSExpressionDescription new];
    objectIdDesc.name = @"objectID";
    objectIdDesc.expression = [NSExpression expressionForEvaluatedObject];
    objectIdDesc.expressionResultType = NSObjectIDAttributeType;

    r.propertiesToFetch = @[self.keyProperty,objectIdDesc];

    NSError* error = nil;

    NSArray* results = [context executeFetchRequest:r error:&error];
    self.error = error;
    if (results) {
        for (NSDictionary* dict in results) {
            _cache[dict[self.keyProperty]] = dict[@"objectID"];
        }
    } else {
        _cache = nil;
    }
}

2) when you need to test existence simply use:

- (NSArray*) existingObjectIDsForKeys:(NSArray *)keys
{
    return [_cache objectsForKeys:keys notFoundMarker:[NSNull null]];
}

3) when you like to actually get objects and create missing ones:

- (NSArray*) findOrCreate:(NSArray*)keys
                  context:(NSManagedObjectContext*)context
                    error:(NSError* __autoreleasing*)error
{
    NSMutableArray* fullList = [[NSMutableArray alloc] initWithCapacity:[keys count]];
    NSMutableArray* needFetch = [[NSMutableArray alloc] initWithCapacity:[keys count]];

    NSManagedObject* object = nil;
    for (id<NSCopying> key in keys) {
        NSManagedObjectID* oID = _cache[key];
        if (oID) {
            object = [context objectWithID:oID];
            if ([object isFault]) {
                [needFetch addObject:oID];
            }
        } else {
            object = [NSEntityDescription insertNewObjectForEntityForName:self.entity.name
                                                   inManagedObjectContext:context];
            [object setValue:key forKey:self.keyProperty];
        }
        [fullList addObject:object];
    }

    if ([needFetch count]) {
        NSFetchRequest* r = [NSFetchRequest fetchRequestWithEntityName:self.entity.name];
        r.predicate = [NSPredicate predicateWithFormat:@"SELF IN %@",needFetch];
        if([context executeFetchRequest:r error:error] == nil) {//load the missing faults from store
            fullList = nil;
        }
    }

    return fullList;
}

In this implementation you need to keep track of objects deletion/creation yourself.
You can use the register/unregister methods (trivial implementation) for this after a successful save.
You could make this a bit more automatic by hooking into the context "save" notification and updating the cache with relevant changes.

The multi-threaded case is much more complex (same interface but different implementation altogether when taking performance into account).
For instance, you must make your enforcer save new items (to the store) before returning them to the requesting context as they don't have permanent IDs otherwise, and even if you call "obtain permanent IDs" the requesting context might not save eventually.
you will also need to use a dispatch queue of some sort (parallel or serial) to access your cache dictionary.

Some math:

Given:
10K (10*1024) unique key objects
average key length of 256[byte]
objectID length of 128[byte]
we are looking at:
10K*(256+128) =~ 4[MB] of memory

This might be a high estimate, but you should take this into account ...

Lowtension answered 11/3, 2014 at 18:55 Comment(0)
T
1

Ok, many things can go wrong here this is how to:

  1. Create NSManagedObjectContext -> MOC
  2. Create NSFetchRequest with the right entity
  3. Create the NSPredicate and attache it to the fetch request
  4. execute fetch request on newly created context
  5. fetch request will return an array of objects matching the predicate (you should have only one object in that array if your ids are distinct)
  6. cast first element of an array to NSManagedObject
  7. change its property
  8. save context

The most important thing of all is that you use the same context for fetching and saving, and u must do it in the same thread cause MOC is not thread safe and that is the most common error that people do

Tyree answered 7/3, 2014 at 12:57 Comment(0)
L
1

Currently you say you maintain an array of `objectID's. When you need to you:

filter through them and get the messages with existingObjectWithID:error:

and after this you need to check if the message you got back:

  1. exists
  2. matches the one you want

This is very inefficient. It is inefficient because you are always fetching objects back from the data store into memory. You are also doing it individually (not batching). This is basically the slowest way you could possibly do it.

Why changes to that object aren't saved properly isn't clear. You should get an error of some kind. But, you should really change your search approach:

Instead of looping and loading, use a single fetch request with a predicate:

NSFetchRequest *request = ...;
NSPredicate *filterPredicate = [NSPredicate predicateWithFormat:@"XXX == %@", YYY];

[request setPredicate:filterPredicate];
[request setFetchLimit:1];

where XXX is the name of the attribute in the message to test, and YYY is the value to test it against.

When you execute this fetch on the MOC you should get one or zero responses. If you get zero, create + insert a new message and save the MOC. If you get one, update it and save the MOC.

Lophobranch answered 9/3, 2014 at 19:53 Comment(5)
Thanks for the answer. That's how I'm doing it currently. Searching for specific properties with fetch request on a MOC. Having saved all the objectIDs is way more efficient in my case - more than 60% (measured with time profiler)Locris
So you're saving the ids against message info so you don't load all? What you're saying isn't clear...Lophobranch
I use MagicalRecord and here is my code: _allMessages = [[Message MR_findAll] valueForKeyPath:@"objectID"]; then here is how I 'find-or-create' - ` Message *message = nil; for (Message *cachedMsg in allMessages) { if ([cachedMsg.uid isEqualToString:uid] && [cachedMsg.folderTitle isEqualToString:folderId]) { message = cachedMsg; } }`Locris
moreover if i use what you say: which is a method like [Message MR_findFirstWithPredicate:] it's performing much slower.Locris
here is where I got the idea of saving objectIDs: objc.io/issue-4/importing-large-data-sets-into-core-data.htmlLocris

© 2022 - 2024 — McMap. All rights reserved.