DynamoDB adjacency list primary key

Asked 26/6, 2018 at 21:31 Answered 30/6, 2018 at 12:1

Solved nosql amazon-dynamodb adjacency-list

I am completing an exercise using DynamoDB to model a many to many relationship. I need to allow a many to many relationship between posts and tags. Each post can have many tags and each tag can have many posts.

I have a primary key on id and primary sort key on type and then another global index on id and data, I added another global index on id and type again but I think this is redundant.

Here is what I have so far.

id(Partition key)      type(Sort Key)       target       data
-------------          ----------           ------       ------
1                      post                 1            cool post
tag                    tag                  tag          n/a
1                      tag                  tag          orange

---------------------------------------------
---- inserting another tag will overwrite ---
---------------------------------------------

1                      tag                  tag          green

I am taking advice from this awesome talk https://www.youtube.com/watch?v=jzeKPKpucS0 and these not so awesome docs https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-adjacency-graphs.html

The issue I am having is that if I try to add another tag with an id "1" and type "tag" it will overwrite the existing tag because it would have the same composite key. What am I missing here? It seems like the suggestion is to make the primary key and sort key be the id and type. Should I have my type be more like "tag#orange"? In that case I could put a global index on the target with a sort key on the type. This way I could get all posts with a certain tag by querying target = "tag" and type starts with "tag".

Just looking for some advice on handling this sort of adjacency list data with Dynamo as it seems very interesting. Thanks!

Paediatrician answered 26/6, 2018 at 21:31 Comment(2)

I'm a bit confused. Does Tag have an integer ID or just Posts have? Do you need to save a tag as {"id": 1, "type": "tag", "data": "this is a tag"} or {"id": "this is a tag", "type": "tag", "data": "this is a tag"}? – Microscope 29/6, 2018 at 23:19

tag could have an integer id as well but I chose to use a string for clarity. – Paediatrician 30/6, 2018 at 1:2

Basic guidelines for an adjacency-list

You need a few modifications to the way you're modeling. In an adjacency-list you have two types of items:

Top-level (those are your Posts and Tags)
Association (expresses which Tags are associated with each Post and vice-versa)

To build this adjacency-list, you must follow two simple guidelines (which I think are missing in your example):

Each top-level item (in your case a Post or a Tag) must be represented using the primary-key. Also, those items should have the same value in the sort-key and the primary-key
For associations, use the primary-key to represent the source (or parent) and the sort-key to represent the target (or child).

From what I see in your examples, you set the primary-key of your Posts and Tags as just the item ID, while you should also use its type; e.g. Post-1 or Tag-3. In items that represent associations, I also don't see you storing the target ID.

Example

Let's say you have:

Three Posts: "hello world", "foo bar" and "Whatever..."
And three tags: "cool", "awesome", "great"
Post "hello world" has one tag: "cool"
Post "foo bar" has two tags: "cool" and "great"
Post "Whatever..." doesn't have any tags

You'd need to model this way in Dynamo:

PRIMARY-KEY   | SORT-KEY    | SOURCE DATA  | TARGET DATA
--------------|-------------|--------------|-------------
Post-1        | Post-1      | hello world  |
Post-2        | Post-2      | foo bar      |
Post-3        | Post-3      | Whatever...  |
Tag-1         | Tag-1       | cool         |
Tag-2         | Tag-2       | awesome      |
Tag-3         | Tag-3       | great        |
Post-1        | Tag-1       | hello world  | cool
Post-2        | Tag-1       | foo bar      | cool
Post-2        | Tag-3       | foo bar      | great
Tag-1         | Post-1      | cool         | hello world
Tag-1         | Post-2      | cool         | foo bar
Tag-3         | Post-2      | great        | foo bar

How you query this adjacency list

1) You need a particular item, say Post-1:

Query primary-key == "Post-1" & sort-key == "Post-1" - returns: only Post-1

2) You need all tags associated with Post-2:

Query by primary-key == "Post-2" & sort-key BEGINS_WITH "Tag-" - returns: Tag-1 and Tag-3 associations.

Check the documentation about the begin_with key condition expression.

3) You need all Posts associated with, say Tag-1:

Query by primary_key == "Tag-1" & sort-key BEGINS_WITH "Post-" - returns: Post-1 and Post-2 associations.

Note that, if you change the contents of a given post, you need to change the value in all association items as well.

You can also don't store the post and tag content in association items, which saves storage space. But, in this case, you'd need two queries in the example queries 2 and 3 above: one to retrieve associations, another to retrieve each source item data. Since querying is more expensive than storing data, I prefer to duplicate storage. But it really depends if your application is read-intensive or write-intensive. If read-intensive, duplicating content in associations gives you benefit of reducing read queries. If write-intensive, not duplicating content saves write queries to update associations when the source item is updated.

Hope this helps! ;)

Microscope answered 30/6, 2018 at 12:1 Comment(5)

In your example, if you add post-tag association then you will have to add tag-post association as well. Shouldn't we use GSI ? for eg. where partition key is TAG-1 and sort key is POST-1.. – Gownsman 8/4, 2019 at 7:57

Yes, you could use GSI for that. It would duplicate data the same way, though. To use a GSI, I'd suggest taking advantage of sparse indexes. Add an attribute only to the association item, so that the top-level items aren't replicated in the GSI. – Microscope 10/4, 2019 at 5:49

Obviously, you would also need to adjust your code to switch from querying the table or GSI appropriately. If we need to add one association item, I don't think it's too much trouble to go ahead and add two, with the advantage of having everything in a single table, which simplifies a few things. – Microscope 10/4, 2019 at 5:56

And of top of this, would you still have a separate table for Post and a separate table for Tags (if Posts and Tags were more complex objects that contained more data that were irrelevant to the associations?) Leaving this table to contain the data specific to the association? – Margarettmargaretta 29/4, 2020 at 19:23

I don't think it's necessary to separate in different tables. Unless you have different access patterns. For example: if access to Posts is variable and unpredictable, an on-demand table might help reducing throttling; if access to Tags is more predictable, reserved throughput will reduce costs. Even if that's the case, I'd hesitate to split them up. – Microscope 30/4, 2020 at 22:36

I don't think you are missing anything. The idea is that ID is unique for the type of item. Typically you would generate a long UUID for the ID rather than using sequential numbers. Another alternative is to use the datetime you created the item, probably with an added random number to avoid collisions when items are being created.

This answer I have previously provided may help a little DynamoDB M-M Adjacency List Design Pattern

Don't remove the sort key - this wont help make your items more unique.

Ahoy answered 27/6, 2018 at 7:10 Comment(2)

I am going by this diagram youtu.be/jzeKPKpucS0?t=2286, It seems like I would need the id to be 1 for the sports tag in order to know what post that is attached to. – Paediatrician 27/6, 2018 at 15:10

Looking at your answer it seems like it is suggesting that the type should be "tag:sports" to keep things unique. I suggested this in my question but wasn't sure. – Paediatrician 27/6, 2018 at 15:12