Should DynamoDB adjacency lists use discrete partition keys to model each type of relationship?
Asked Answered
C

1

10

Context

I am building a forum and investigating modeling the data with DynamoDB and adjacency lists. Some top-level entities (like users) might have multiple types of relationships with other top-level entities (like comments).

Requirements

For example, let's say we want be able to do the following:

  • Users can like comments
  • Users can follow comments
  • Comments can display users that like it
  • Comments can display users that follow it
  • User profiles can show comments they like
  • User profiles can show comments they follow

So, we essentially have a many-to-many (user <=> comment) to many (like or follow).

Note: This example is deliberately stripped down, and in practice there will be many more relationships to model, so i'm trying to think of something extensible here.

Baseline

The following top-level data would likely be common in any adjacency list representation:

First_id(Partition key)         Second_id(Sort Key)         Data
-------------                   ----------                  ------
User-Harry                      User-Harry                  User data
User-Ron                        User-Ron                    User data
User-Hermione                   User-Hermione               User data
Comment-A                       Comment-A                   Comment data
Comment-B                       Comment-B                   Comment data
Comment-C                       Comment-C                   Comment data

Furthermore, for each table below, there would be an equivalent Global Secondary Index with the partition and sort keys swapped.

Example Data

This is what I would like to model in DynamoDB:

  1. Harry likes comment A
  2. Harry likes comment B
  3. Harry follows comment A
  4. Ron likes comment B
  5. Hermione likes comment C

Option 1

Use a third attribute to define the type of relationship:

First_id(Partition key)         Second_id(Sort Key)         Data
-------------                   ----------                  ------
Comment-A                       User-Harry                  "LIKES"
Comment-B                       User-Harry                  "LIKES"
Comment-A                       User-Harry                  "FOLLOWS"
Comment-B                       User-Ron                    "LIKES"
Comment-C                       User-Hermione               "FOLLOWS"

The downside to this approach is that there is redundant information in query results, because they will return extra items you maybe don't care about. For example, if you want to query all the users that like a given comment, you're also going to have to process all the users that follow a that given comment. Likewise, if you want to query all the comments that a user likes, you need to process all the comments that a user follows.

Option 2

Modify the keys to represent the relationship:

First_id(Partition key)         Second_id(Sort Key)
-------------                   ----------
LikeComment-A                   LikeUser-Harry
LikeComment-B                   LikeUser-Harry
FollowComment-A                 FollowUser-Harry
LikeComment-B                   LikeUser-Ron
FollowComment-C                 FollowUser-Hermione

This makes it efficient to query independently:

  1. Comment likes
  2. Comment follows
  3. User likes
  4. User follows

The downside is that the same top-level entity now has multiple keys, which might make things complex as more relationships are added.

Option 3

Skip adjacency lists altogether and use separate tables, maybe one for Users, one for Likes, and one for Follows.

Option 4

Traditional relational database. While I'm not planning on going this route because this is a personal project and I want to explore DynamoDB, if this is the right way to think about things, I'd love to hear why.

Conclusion

Thanks for reading this far! If there is anything I can do to simplify the question or clarify anything, please let me know :)

I've looked at the AWS best practices and this many-to-many SO post and neither appears to address the many-to-many (with many) relationship, so any resources or guidance greatly appreciated.

Cardiganshire answered 18/11, 2018 at 6:17 Comment(0)
I
8

Your Option 1 is not possible because it does not have unique primary keys. In your sample data, you can see that you have two entries for (Comment-A, User-Harry).

Solution 1

The way to implement what you are looking for is by using slightly different attributes for your table and the GSI. If Harry likes Comment A, then your attributes should be:

hash_key: User-Harry
gsi_hash_key: Comment-A
sort_key_for_both: Likes-User-Harry-Comment-A

Now you have only one partition key value for your top level entities in both the table and the GSI, and you can query for a specific relationship type by using the begins_with operator.

Solution 2

You could make the relationship a top-level entity. For example, Likes-User-Harry-Comment-A would have two entries in the database because it is “adjacent to” both User-Harry and Comment A.

This allows you flexibility if you want to model more complex information about the relationships in the future (including the ability to describe the relationship between relationships, such as Likes-User-Ron-User-Harry Causes Follows-User-Ron-User-Harry).

However, this strategy requires more items to be stored in the database, and it means that saving a “like” (so that it can be queried) is not an atomic operation. (But you can work around that by only writing the relationship entity, and then use DynamoDBStreams + Lambda to write entries for two entries I mentioned at the beginning of this solution.)

Update: using DynamoDB Transactions, saving a "like" in this manner can actually be a fully ACID operation.

Intestinal answered 18/11, 2018 at 10:1 Comment(1)
This is great! Huge thanks for the detailed response. I really like the possibilities your Causes idea opens up!Cardiganshire

© 2022 - 2024 — McMap. All rights reserved.