Simple explanation of Google App Engine NDB Datastore

Asked 9/2, 2014 at 5:53 Answered 9/2, 2014 at 12:24

Solved python google-app-engine app-engine-ndb

I'm creating a Google App Engine application (python) and I'm learning about the general framework. I've been looking at the tutorial and documentation for the NDB datastore, and I'm having some difficulty wrapping my head around the concepts. I have a large background with SQL databases and I've never worked with any other type of data storage system, so I'm thinking that's where I'm running into trouble.

My current understanding is this: The NDB datastore is a collection of entities (analogous to DB records) that have properties (analogous to DB fields/columns). Entities are created using a Model (analogous to a DB schema). Every entity has a key that is generated for it when it is stored. This is where I run into trouble because these keys do not seem to have an analogy to anything in SQL DB concepts. They seem similar to primary keys for tables, but those are more tightly bound to records, and in fact are fields themselves. These NDB keys are not properties of entities, but are considered separate objects from entities. If an entity is stored in the datastore, you can retrieve that entity using its key.

One of my big questions is where do you get the keys for this? Some of the documentation I saw showed examples in which keys were simply created. I don't understand this. It seemed that when entities are stored, the put() method returns a key that can be used later. So how can you just create keys and define ids if the original keys are generated by the datastore?

Another thing that I seem to be struggling with is the concept of ancestry with keys. You can define parent keys of whatever kind you want. Is there a predefined schema for this? For example, if I had a model subclass called 'Person', and I created a key of kind 'Person', can I use that key as a parent of any other type? Like if I wanted a 'Shoe' key to be a child of a 'Person' key, could I also then declare a 'Car' key to be a child of that same 'Person' key? Or will I be unable to after adding the 'Shoe' key?

I'd really just like a simple explanation of the NDB datastore and its API for someone coming from a primarily SQL background.

Bioclimatology answered 9/2, 2014 at 5:53 Comment(2)

Forgot everything you know about SQL when thinking about the datastore. The datastore stores whole entities by key, including property names. Seperately it creates indexes matching certain criteria - some automatic and some manually defined. Do some reading - developers.google.com/appengine/docs/python/datastore/entities – Yaekoyael 9/2, 2014 at 8:13

Personally I think it was a huge mistake to include GQL because people from a SQL background see "select * from MyEntity" and immediately start thinking in SQL terms. Which is really a recipe for poorly peforming applications. Thats just my opinion though. – Yaekoyael 9/2, 2014 at 8:15

I think you've overcomplicating things in your mind. When you create an entity, you can either give it a named key that you've chosen yourself, or leave that out and let the datastore choose a numeric ID. Either way, when you call put, the datastore will return the key, which is stored in the form [<entity_kind>, <id_or_name>] (actually this also includes the application ID and any namespace, but I'll leave that out for clarity).

You can make entities members of an entity group by giving them an ancestor. That ancestor doesn't actually have to refer to an existing entity, although it usually does. All that happens with an ancestor is that the entity's key includes the key of the ancestor: so it now looks like [<parent_entity_kind>, <parent_id_or_name>, <entity_kind>, <id_or_name>]. You can now only get the entity by including its parent key. So, in your example, the Shoe entity could be a child of the Person, whether or not that Person has previously been created: it's the child that knows about the ancestor, not the other way round.

(Note that that ancestry path can be extended arbitrarily: the child entity can itself be an ancestor, and so on. In this case, the group is determined by the entity at the top of the tree.)

Saving entities as part of a group has advantages in terms of consistency, in that a query inside an entity group is always guaranteed to be fully consistent, whereas outside the query is only eventually consistent. However, there are also disadvantages, in that the write rate of an entity group is limited to 1 per second for the whole group.

Squalid answered 9/2, 2014 at 12:24 Comment(3)

Ok, I think I'm getting it now. How can I assign a key to an entity before it is stored? If that doesn't make sense, the reason I ask is that I'm wondering how I can assign my own id to an entity's key if I cannot retrieve that key until I store the entity. – Bioclimatology 9/2, 2014 at 13:20

I just glanced at the API and it looks like the way I would do that is by going entity = Entity(properties=..., id='<my_id>'). Then the key that is returned by calling put() returns a key with this id. Is that correct? – Bioclimatology 9/2, 2014 at 13:26

Yes, that's right. Or you can pass in a full ndb.Key object as the key parameter. – Squalid 9/2, 2014 at 13:30

Datastore keys are a little more analogous to internal SQL row identifiers, but of course not entirely. Identifiers in Appengine are a bit like SQL primary keys. To support decentralised concurrent creation of new keys by many application instances in a cloud of servers, AppEngine internally generates the keys to guarantee uniqueness. Your application defines parameters (application identifier, optional namespace, kind and optional entity identifier) which AppEngine uses to seed its key generator. If you do not provide an identifier, AppEngine will generate a unique numeric identifier that you can read.

Eventual consistency takes time so it is occasionally more efficient to request multiple new keys in bulk. AppEngine then generates a range of numeric entity identifiers for you. You can read their values from keys as KeyProperty metadata.

Ancestry is used to group together writes of related entities of all kinds for the purpose of transactions and isolation. There is no predefined schema for this but you are limited to one parent per child.

In your example, one particular Shoe might have a particular Person as parent. Another particular Shoe could have a Horse as parent. And another Shoe might have no parent. Many entities of all kinds can have the same parent, so several Car entities could also have that initial Person as parent. The Datastore is schemaless, so it's up to your application to allow or forbid a Car to have a Horse as parent.

Note that a child knows its parent, but a parent does not know its children, because implementing that would impact scalability.

Victimize answered 9/2, 2014 at 7:54 Comment(0)

Recommended topics

Hot tags