Effective implementation of one-to-many relationship with Python NDB

Asked 6/2, 2013 at 21:22 Answered 7/2, 2013 at 10:37

I would like to hear your opinion about the effective implementation of one-to-many relationship with Python NDB. (e.g. Person(one)-to-Tasks(many))

In my understanding, there are three ways to implement it.

Use 'parent' argument
Use 'repeated' Structured property
Use 'repeated' Key property

I choose a way based on the logic below usually, but does it make sense to you? If you have better logic, please teach me.

Use 'parent' argument
- Transactional operation is required between these entities
- Bidirectional reference is required between these entities
- Strongly intend 'Parent-Child' relationship
Use 'repeated' Structured property
- Don't need to use 'many' entity individually (Always, used with 'one' entity)
- 'many' entity is only referred by 'one' entity
- Number of 'repeated' is less than 100
Use 'repeated' Key property
- Need to use 'many' entity individually
- 'many' entity can be referred by other entities
- Number of 'repeated' is more than 100

No.2 increases the size of entity, but we can save the datastore operations. (We need to use projection query to reduce CPU time for the deserialization though). Therefore, I use this way as much as I can.

I really appreciate your opinion.

Smiley answered 6/2, 2013 at 21:22 Comment(2)

Task has KeyProperty pointing to Person, you query to find tasks for person? – Explicable 6/2, 2013 at 21:46

It's #4 option @dragonx answered, isn't it? If I need to query tasks for persons and we need to assume persons have a lot of tasks, I use this option. Also I use it in the case of retrieving a part of property values. – Smiley 7/2, 2013 at 8:21

A key thing you are missing: How are you reading the data?

If you are displaying all the tasks for a given person on a request, 2 makes sense: you can query the person and show all his tasks.

However, if you need to query say a list of all tasks say due at a certain time, querying for repeated structured properties is terrible. You will want individual entities for your Tasks.

There's a fourth option, which is to use a KeyProperty in your Task that points to your Person. When you need a list of Tasks for a person you can issue a query.

If you need to search for individual Tasks, then you probably want to go with #4. You can use it in combination with #3 as well.

Also, the number of repeated properties has nothing to do with 100. It has everything to do with the size of your Person and Task entities, and how much will fit into 1MB. This is potentially dangerous, because if your Task entity can potentially be large, you might run out of space in your Person entity faster than you expect.

Entwistle answered 6/2, 2013 at 22:25 Comment(5)

Hi @dragonx, thank you for the answer! Yes, 'How read the data' is very important. If I need a part of the repeated structured property's values (e.g. tasks tagged by 'important'), I may want individual entities, due to CPU times for deserialization. Also, I didn't consider the 1MB limit. Thank you for pointing it out. I want to clarify one point. Querying 'with' repeated structured properties is NOT terrible, isn't it? To query the entities, App Engine uses index. Therefore, the query cost is same as querying with other properties. Am I wrong? – Smiley 7/2, 2013 at 8:29

The query cost isn't the issue. There's some odd behavior with querying on repeated structured properties. You'll have to be attentive and work around them. Read up carefully on the docs on ndb properties and queries. developers.google.com/appengine/docs/python/ndb/… developers.google.com/appengine/docs/python/ndb/… – Entwistle 7/2, 2013 at 18:56

Thank you for your answer, @dragonx! Definitely yes. Querying on repeated structured properties is a little bit tricky. I need to be careful, when the query is required. – Smiley 7/2, 2013 at 19:47

Sorry, there's one other thing I forgot to mention. Beyond the 1MB limit per entity, there's a limit on the number of indexed properties per entity, I think 5000. That's one other thing if you're using repeated structured properties, I believe you can chew through those 5000 indexed properties more quickly than you'd expect. Be careful of that too. – Entwistle 7/2, 2013 at 20:58

Thank you for the information, @dragonx. I have completely forgotten the limitation of the search index. I did not think it is consumed very quickly. I will recheck the documentation carefully. Your suggestion is really helpful. – Smiley 8/2, 2013 at 18:12

One thing that most GAE users will come to realize (sooner or later) is that the datastore does not encourage design according to the formal normalization principles that would be considered a good idea in relational databases. Instead it often seems to encourage design that is unintuitive and anathema to established norms. Although relational database design principles have their place, they just don't work here.

I think the basis for the datastore design instead falls into two questions:

How am I going to read this data and how do I read it with the minimum number of read operations?
Is storing it that way going to lead to an explosion in the number of write and indexing operations?

If you answer these two questions with as much foresight and actual tests as you can, I think you're doing pretty well. You could formalize other rules and specific cases, but these questions will work most of the time.

Zinck answered 7/2, 2013 at 10:37 Comment(1)

Thank you for the answer, @sudhir-jonathan. These two questions are very meaningful. I should keep them in mind always. Understanding the characteristic and usage of the data is absolutely important for the datastore modeling. – Smiley 7/2, 2013 at 19:55

Recommended topics

Hot tags