ndb modelling one-to-many: merits of repeated KeyProperty vs foreign key
Asked Answered
A

3

15

My question is about modelling one-to-many relations in ndb. I understand that this can be done in (at least) two different ways: with a repeated property or with a 'foreign key'. I have created a small example below. Basically we have an Article which can have an arbitrary number of Tags. Let's assume that a Tag can be removed but cannot be changed after it has been added. Let's also assume that we don't worry about transactional safety.

My question is: what is the preferred way of modelling these relationships?

My considerations:

  • Approach (A) requires two writes for every tag that is added to an article (one for the Article and one for the Tag) whereas approach (B) only requires one write (just the Tag).
  • Approach (A) leverages ndb's caching mechanism when fetching all Tags for an Article whereas in case of approach (B) a query is required (and additionally some custom caching)

Are there some things that I'm missing here, any other considerations that should be taken into account?

Thanks very much for your help.

Example (A):

class Article(ndb.Model):
    title = ndb.StringProperty()
    # some more properties
    tags = ndb.KeyProperty(kind="Tag", repeated=True)

    def create_tag(self):
        # requires two writes
        tag = Tag(name="my_tag")
        tag.put()
        self.tags.append(tag)
        self.put()

    def get_tags(self):
        return ndb.get_multi(self.tags)

class Tag(ndb.Model):
    name = ndb.StringProperty()
    user = ndb.KeyProperty(Kind="User") #  User that created the tag
    # some more properties

Example(B):

class Article(ndb.Model):
    title = ndb.StringProperty()
    # some more properties

    def create_tag(self):
        # requires one write
        tag = Tag(name="my_tag", article=self.key)
        tag.put()

    def get_tags(self):
        # obviously we could cache this query in memcache
        return Tag.gql("WHERE article :1", self.key)

class Tag(ndb.Model):
    name = ndb.StringProperty()
    article = ndb.KeyProperty(kind="Article")
    user = ndb.KeyProperty(Kind="User") #  User that created the tag
    # some more properties
Adam answered 18/12, 2012 at 10:10 Comment(5)
Consider checking performance with appstats, as while your specific question here might have a specific answer it probably more relates to your actual usage and so appstats can tell you which of the above options are more efficient in real life. developers.google.com/appengine/docs/python/tools/appstatsStarflower
would you create new tags for each article even if its the same tag? i would go for option A because you will be able to use the same Tag for each article and you will be able to query Articles by tag.Ultramontanism
@PaulC thanks. Indeed I checked with appstats and in my case option B is more efficient (1 write vs 2). However since the optimization is only small I'm unsure if it would be worth giving up on the documented way (ie. option A) to solve a one-to-many relation.Adam
@Ultramontanism Yes I would create a new Tag for each Article. This is unclear from the question and I will change it as such. Does that change your answer? Thanks.Adam
would still go for A but at this point it depends on how many Tags each Article will have. why would you create separate tags for each article even if they have the same name? also look at @kasavbere's answer...Ultramontanism
T
6

Have you looked at the following about using Structured Properties https://developers.google.com/appengine/docs/python/ndb/properties#structured . The short discussion there about Contact and Addresse may simplify your problem. Also look at https://developers.google.com/appengine/docs/python/ndb/queries#filtering_structured_properties. The discussions are very short.

Also, looking ahead to the fact that joins are not allowed, option A looks better.

Tutu answered 18/12, 2012 at 16:48 Comment(2)
Thanks for your answer, structured properties could do the job but in my specific case I don't think they would be the best solution. What do you mean by "looking ahead to the fact that joins are not allowed"? Is that a GAE policy?Adam
Yes, that is a limitation of the datastore. see developers.google.com/appengine/docs/python/datastore/queries. "in particular, joins and aggregate queries aren't supported within the Datastore query engine." The datastore has other restrictions that you should probably be familiar with: developers.google.com/appengine/docs/python/datastore/…Tutu
E
1

As stated before, there are no joins in Datastore, so all the "Foreign Key" notion doesn't apply. What can be done is to use the Query class to query your datastore for the correct Tag.

For example, if you are using Endpoints, then:

class Tag(ndb.model):
    user = ndb.UserProperty()

And the during the request do:

query.filter(Tag.user == endpoints.get_current_user())
Emsmus answered 25/4, 2014 at 21:20 Comment(0)
H
1

Approach (A) should be preferred in most situations. While there are two writes required to add a tag, this is probably much less frequent than reading the tags. As long as you don't have a huge number of tags, they should all fit into the repeated Key property.

As you mentioned, fetching the tags by their keys is much faster than performing a query. Also, if you only need the tag's name and the user, you could create the tag with the User as the parent key and the Name as the tag's id:

User -> Name -> Tag

To create this tag, you would use:

tag = Tag(id=name, parent=user, ...)
article.tags.push(tag)
ndb.put_multi([tag, article])

Then when you retrieve the tags,

for tag in article.tags:
    user = tag.parent()
    name = tag.id()

Then, each key you stored in Article.tags would contain the User key and the Tag name! This would save you from reading in the Tag to get those values.

Hove answered 18/9, 2015 at 23:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.