Most Efficient One-To-Many Relationships in Google App Engine Datastore?
Asked Answered
E

3

16

Sorry if this question is too simple; I'm only entering 9th grade.

I'm trying to learn about NoSQL database design. I want to design a Google Datastore model that minimizes the number of read/writes.

Here is a toy example for a blog post and comments in a one-to-many relationship. Which is more efficient - storing all of the comments in a StructuredProperty or using a KeyProperty in the Comment model?

Again, the objective is to minimize the number of read/writes to the datastore. You may make the following assumptions:

  • Comments will not be retrieved independently of their respective blog post. (I suspect that this makes the StructuredProperty most preferable.)
  • Comments will need to be sortable by date, rating, author, etc. (Subproperties in the datastore cannot be indexed, so perhaps this could affect performance?)
  • Both blog posts and comments may be edited (or even deleted) after they are created.

Using StructuredProperty:

from google.appengine.ext import ndb

class Comment(ndb.Model):
    various properties...

class BlogPost(ndb.Model):
    comments = ndb.StructuredProperty(Comment, repeated=True)
    various other properties...

Using KeyProperty:

from google.appengine.ext import ndb

class BlogPost(ndb.Model):
    various properties...

class Comment(ndb.Model):
    blogPost = ndb.KeyProperty(kind=BlogPost)
    various other properties...

Feel free to bring up any other considerations that relate to efficiently representing a one-to-many relationship with regards to minimizing the number of read/writes to the datastore.

Thanks.

Esbjerg answered 31/7, 2012 at 20:43 Comment(2)
Consider how you would solve the problem of the total size of comments and blogpost being greater than 1MB. Can it ever happen ? If it could and you don't have a good solution for dealing with it, then it would seem from a pure functionality point of view that you wouldn't even bother with a single entity containing both.Harod
An alternative might be to just store all of the keys of the comments in the blog post. Then you can retrieve all of the comments with a single ndb.get_multi(keys) , but would allow significantly more comments, and if you still go over 1MB, then you can devolve to retrieving comments individuallyHarod
S
14

I could be wrong, but from what I understand, a StructuredProperty is just a property within an entity, but with sub-properties.

This means reading a BlogPost and all its comments would only cost one read. So when you render your page, you only need one read op for your entire page.

Writes would be cheaper each too. You'll need one read op to get the BlogPost, and as long as you don't update any indexed properties, it'll just be one write op.

You can handle the comment sorting on your own after you read the entity out of the datastore.

You'll have to synchronize your comment updates/edits with transactions, to make sure one comment doesn't overwrite another, since they are both modifying the same entity. You may run into unsolveable problems if everyone is commenting and editing the same blog post at the same time.

In optimizing for cost though, you'll hit a wall with the maximum entity size of 1MB. This will limit the number of comments you can store per blog post.

Going with the KeyProperty would be quite a bit more expensive.

You'll need one read to get the blog post, plus 1 query plus 1 small read op for each comment.

Every comment is a new entity, so it'll be at least 4 write ops. You may want to index for sort order, so that'll end up costing even more write ops.

On the plus side, you'll have unlimited comments per blog post, you don't have to worry about synchronizing new comments. You might need to worry about synchronization for editing comments, but if you limit the edit to the creator, that shouldn't really be a problem. You don't have to do sorting yourself either.

It's a cost vs features tradeoff.

Sams answered 31/7, 2012 at 21:53 Comment(0)
P
3

What about:

from google.appengine.ext import ndb

class Comment(ndb.Model):
    various properties...

class BlogPost(ndb.Model):
    comments = ndb.KeyProperty(Comment, repeated=True)
    various other properties...

This way, you can store up to 5000 comments per blog post (the maximum number of repeated properties) independent of the size of each blog post. You won't need a query to fetch the blogs for a comment, you can just do ndb.get_multi(blog_post.comments). And for this operation, you can try to rely on ndb's memcache. Of course, it depends on your use case whether this is a good assumption or not.

Pastorate answered 24/5, 2013 at 10:45 Comment(0)
A
1

Be aware of this caveat when using a repeated StructuredProperty:

Do not use repeated properties if you have more than 100-1000 values. (1000 is probably already pushing it.) They weren't designed for such use.

See Guido's answer in GAE ndb design, performance and use of repeated properties.

So while you may not hit the 1 MB entity limit with StructuredProperty, you may easily hit the 100-1000 suggested max.

Artefact answered 1/10, 2015 at 21:38 Comment(2)
So what's the alternative? Just KeyProperty, like in the question? Or is there a better way?Phenomenology
A KeyProperty could work if below the 100-1000 value limit, but if you have more than that I'd consider rethinking your data model and needed features. In the case of the blog example, storing the comments as a single JSONProperty could work too. That way you're only updating/reading 1 entity (the BlogPost) and updating its JSONProperty for the comments. This is similar to the StructuredProperty solution without the 100-1000 values limitation. Of course if the entity gets larger than 1 MB, this won't work (unless you shard it into multiple entities...), but this might work for your use case.Artefact

© 2022 - 2024 — McMap. All rights reserved.