Does the automatic use of caching in NDB, the Google App Engine Datastore library for Python, invalidate the transaction model?
Asked Answered
M

1

8

A major selling point of Google Cloud Datastore is that it provides strong consistency within an entity group.

Cloud Datastore ensures that entity lookups by key and ancestor queries always receive strongly consistent data.

[Datastore is good for] Transactions based on ACID properties, for example, transferring funds from one bank account to another.

The NDB library is the documented way to access the Datastore from Google App Engine for Python.

However, by default, the NDB library uses caching to speed up results. The caches used are an "in-context cache" and memcache. But neither of these caches can be updated transactionally with the datastore. It seems therefore that important consistency properties have to be given up (emphasis mine):

when the transaction is committed, its context will attempt to delete all such entities from memcache. Note, however, that some failures may prevent these deletions from happening.

Is my understanding of this correct? That is, when using the NDB library in the default configuration, there is no consistency guarantee for access even within an entity group?

If I am right, this is a big problem.

It sacrifices pretty much the biggest property of the Datastore. All this documentation about consistency and ACID transactions. Talks at Google IO about how to use entity groups to get consistency. Even research papers. And quietly, in a small corner of the documentation, in the most casual of sentences, I learn that I don't get these properties in the default configuration.

This is incredibly misleading. I'm sure most people have not seen this. Most implementations are probably expecting ACID transactions within entity groups, but they are not getting it. These are serious bugs in production code.

This is a major failure of implementation and documentation. The default should never have sacrificed consistency for speed. Consistency was the whole point of entity groups. And if the implementation did this unexpected thing that changes the semantics so dramatically, then the documentation should have made it deafeningly clear.

Modernistic answered 11/7, 2017 at 20:8 Comment(3)
That section indeed appears inconsistent with the rest of the documentation. We will try to get it updated.Peoria
@YannickMG Any updates on this? Is the documentation really going to be modified? At the moment I have turned off caching and I am evaluating the impact on performance. I would really like to know if I can keep caching on and still have some reasonable properties. Also it just boggles my mind that something like this was allowed to happen. It reflects very poorly on Google. It breaks trust.Modernistic
A review process has been started in for this documentation page but I do not have any ETA to share regarding it. As this site isn't the proper place to track and discuss fixing platform specific bugs I suggest you open an issue on the relevant Issue Tracker where it can be linked to internal efforts.I do not yet have an authoritative answer to provide but looking at the source code gives me the impression that the documentation is the issue here, more so than the library.Peoria
D
2

As far as I aware if you get entities withing transactions cache is not used so you are OK on data modifications.

Direct datastore reads by key are consistent. So if you want to get strongly consistent results on reads you would need to disable the ndb cache where needed. Otherwise you get eventual consistency e.g. if cache invalidation succeed or cache expires/evicted.

You also may want manually remove entities from cache after transaction completed with ndb.delete() and _use_datastore=False to make sure cache is clean.

Depolarize answered 19/7, 2017 at 13:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.