How to prune history right in a CoreData+CloudKit app?
Asked Answered
P

1

8

My app uses CoreData with iCloud as backend. Multiple devices can access the iCloud database which is thus .public.
The local CoreData store is synchronized with iCloud using an NSPersistentCloudKitContainer.
I use history tracking according to Apple’s suggestions.
There, Apple suggests to prune history when possible. They say

Because persistent history tracking transactions take up space on disk, determine a clean-up strategy to remove them when they are no longer needed. Before pruning history, a single gatekeeper should ensure that your app and its clients have consumed the history they need.

Originally this was also suggested in the WWDC 2017 talk starting at 26:10.

My question is: How do I implement this single gatekeeper?

I assume the idea is that a single instance knows at what time every user of the app has last synchronized their device. If so the history of transactions before this date can be pruned.
But what if a user synchronized the local data and then does no longer use the app for a long time? In this instance the history cannot be pruned until this user again synchronizes the local data. So the history data could grow arbitrarily large. This seems to me as a central problem that I don’t know how to solve.

The Apple docs cited above suggest:

Similar to fetching history, you can use deleteHistory(before:) to delete history older than a token, a transaction, or a date. For example, you can delete all transactions older than seven days.

But this does not solve the problem to my mind.

Aside of this general problem, my idea is to have an iCloud record type in the public iCloud database that stores for every device directly (i.e. without CoreData) the last date when the local database was updated. Since all devices can read these records it is easy to identify the last time when all local databases have been updated and I could prune the history before this date.

Is this the right way to handle the problem?

EDIT:

The problem has recently been addressed in this post. The author demonstrates with tests with Apple's demo app that there is indeed a problem, if the history is purged too early. My answer there indicates that with the suggested delay of 7 days, an error is probably extremely rare.

Prospectus answered 29/9, 2020 at 17:48 Comment(0)
P
5

UPDATE:

In this post from a WWDC22 Core Data Lab, an Apple Core Data framework engineer answers the question "Do I ever need to purge the persistent history tracking data?" as follows:

No. We don’t recommend it. NSPersistentCloudKitContainer uses the persistent history token to track what to sync. If you delete history the cloud sync is reset and has to upload everything from scratch. It will recover but it’s not a good customer experience. It shouldn’t normally be necessary to delete history. For example, the Apple Photos app doesn’t trim its history, so unless you’re generating massive amounts of history don’t do it.

By now I think my question was partly based on a misunderstanding:

In CoreData, a persistent store is handled by one or more persistent store coordinators. If there is only one, the coordinator has complete control over the store, and there is no need for history tracking.

If there is more than one coordinator, one coordinator can change the store while another is not aware of the changes. Thus, persistent history tracking of the store records all transactions in the store. The store can then notify other users of the store by sending a NSPersistentStoreRemoteChange notification. Upon receiving this notification, the transaction history can be fetched and processed. After processing a transaction, it is no longer needed by the user that processed it.

In a CoreData + CloudKit situation, a persistent store is mirrored to iCloud.
This means there is in the simplest situation one persistent store coordinator of the app, and - invisible to the app - one persistent store coordinator that does the mirroring.
Since both coordinators can change the store independently, history tracking is required.

If the app changes the store, I assume that Apple’s mirroring software receives the NSPersistentStoreRemoteChange notifications, processes the transactions and forward them to iCloud. Normally, i.e. if there is an iCloud connection, this takes only seconds, so that the transaction history is only needed short time.
If iCloud changes are mirrored to the store, the app receives the NSPersistentStoreRemoteChange notifications, and has to process the transactions.
After they have been processed, they are no longer needed neither by the app nor by the mirroring software and can be pruned.
This means that ifs there is only one user of the persistent store on the app’s device, pruning can indeed be done short time after processing the notification.
If the device is offline, e.g. in flight mode or switched off, it will not receive NSPersistentStoreRemoteChange notifications, and will not prune the transaction history. So it is indeed safe to prune the persistent history after say seven days after it has been processed.

The situation is different if there is more than one user of the store on a device, e.g. an additional app extension. In this case one has to ensure that other targets than the app have also processed the transactions before the history is pruned. This can indeed be done by a single gatekeeper. How this can be done is e.g. described in this post.

Prospectus answered 28/7, 2021 at 16:33 Comment(3)
Hi, thank you for mentioning the invisible coordinator that does the mirroring. What if the user only launch the app again in 10th days? Then time difference between last & current NSPersistentStoreRemoteChange will be more than 7 days. I am able simulate the problem, by changing 7 days to 2 minutes for experimenting purpose - #72557560 Perhaps I am doing testing the wrong way?Parr
Hmmm... I have to think about it - my post is 1 year old... And thanks for your excellent post with your tests. It will take me a while to check it.Mimamsa
Instead of picking an arbitrary value like 7 days, I think we might be able to generalize it to something like max(current planned timestamp to run purge - last timestamp to run purge, N). N is duration taken by CloudKit coordinator to sync. Having N as 7 days is quite a comfortable value. By using such generalize function, we are also able to handle case like "user launches app in 10th day". I do a quick testing by using N as 1 minute. It seems to work so far...Parr

© 2022 - 2024 — McMap. All rights reserved.