Is there a DynamoDB max partition size of 10GB for a single partition key value?
Asked Answered
U

1

51

I've read lots of DynamoDB docs on designing partition keys and sort keys, but I think I must be missing something fundamental.

If you have a bad partition key design, what happens when the data for a SINGLE partition key value exceeds 10GB?

The 'Understand Partition Behaviour' section states:

"A single partition can hold approximately 10 GB of data"

How can it partition a single partition key?

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html#GuidelinesForTables.Partitions

The docs also talk about limits with a local secondary index being limited to 10GB of data after which you start getting errors.

"The maximum size of any item collection is 10 GB. This limit does not apply to tables without local secondary indexes; only tables that have one or more local secondary indexes are affected."

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/LSI.html#LSI.ItemCollections

That I can understand. So does it have some other magic for partitioning the data for a single partition key if it exceeds 10GB. Or does it just keep growing that partition? And what are the implications of that for your key design?

The background to the question is that I've seen lots of examples of using something like a TenantId as a partition key in a multi-tentant environment. But that seems limiting if a specific TenantId could have more than 10 GB of data.

I must be missing something?

Urushiol answered 26/10, 2016 at 21:44 Comment(0)
G
67

TL;DR - items can be split even if they have the same partition key value by including the range key value into the partitioning function.


The long version:

This is a very good question, and it is addressed in the documentation here and here. As the documentation states, items in a DynamoDB table are partitioned based on their partition key value (which used to be called hash key) into one or multiple partitions, using a hashing function. The number of partitions is derived based on the maximum desired total throughput, as well as the distribution of items in the key space. In other words, if the partition key is chosen such that it distributes items uniformly across the partition key space, the partitions end up having approximately the same number of items each. This number of items in each partition is approximately equal to the total number of items in the table divided by the number of partitions.

The documentation also states that each partition is limited to about 10GB of space. And that once the sum of the sizes of all items stored in any partition grows beyond 10GB, DynamoDB will start a background process that will automatically and transparently split such partitions in half - resulting in two new partitions. Once again, if the items are distributed uniformly, this is great because each new sub-partition will end up holding roughly half the items in the original partition.

An important aspect to splitting is that the throughput of the split-partitions will each be half of the throughput that would have been available for the original partition.

So far we've covered the happy case.

On the flip side it is possible to have one, or a few, partition key values that correspond to a very large number of items. This can usually happen if the table schema uses a sort key and several items hash to the same partition key. In such case, it is possible that a single partition key could be responsible for items that together take up more than 10 GB. And this will result in a split. In this case DynamoDB will still create two new partitions but instead of using only the partition key to decide which sub-partition should an item be stored in, it will also use the sort key.

Example

Without loss of generality and to make things easier to reason about, imagine that there is a table where partition keys are letters (A-Z), and numbers are used as sort keys.

Imaging that the table has about 9 partitions, so letters A,B,C would be stored in partition 1, letters D,E,F would be in partition 2, etc.

In the diagram below, the partition boundaries are marked h(A0), h(D0) etc. to show that, for instance, the items stored in the first partition are the items who's partition key hashes to a value between h(A0) and h(D0) - the 0 is intentional, and comes in handy next.

[ h(A0) ]--------[ h(D0) ]---------[ h(G0) ]-------[ h(J0) ]-------[ h(M0) ]- ..
  |   A    B    C   |       E    F   |   G      I    |   J    K   L  |
  |   1    1    1   |       1    1   |   1      1    |   1    1   1  |
  |   2    2    2   |       2    2   |          2    |        2      |
  |   3         3   |            3   |          3    |               |
  ..                ..               ..              ..              ..
  |            100  |           500  |               |               |
  +-----------------+----------------+---------------+---------------+-- ..

Notice that for most partition key values, there are between 1 and 3 items in the table, but there are two partition key values: D and F that are not looking too good. D has 100 items while F has 500 items.

If items with a partition key value of F keep getting added, eventually the partition [h(D0)-h(G0)) will split. To make it possible to split the items that have the same hash key, the range key values will have to be used, so we'll end up with the following situation:

..[ h(D0) ]------------/ [ h(F500) ] / ----------[ h(G0) ]- ..
      |       E       F       |           F         |
      |       1       1       |          501        |
      |       2       2       |          502        |
      |               3       |          503        |
      ..                      ..                    ..
      |              500      |         1000        |
.. ---+-----------------------+---------------------+--- ..

The original partition [h(D0)-h(G0)) was split into [h(D0)-h(F500)) and [h(F500)-h(G0))

I hope this helps to visualize that items are generally mapped to partitions based on a hash value obtained by applying a hashing function to their partition key value, but if need be, the value being hashed can include the partition key + a sort key value as well.

Grapnel answered 27/10, 2016 at 6:0 Comment(10)
While this is a good answer, it does'cover the worst case scenario when you have only the partition key and no sort key for your skewed data. What would happen in this case regarding the split process?Intestate
Hi @Intestate - by design, in the case when you have no sort key, the partitioning scheme ensures uniform distribution of data across all partitions. As items are distributed uniformly, all partitions get filled uniformly, so when splits occur they will occur across all partitions. Does that make sense?Grapnel
I don't completely understand how the sort key is included into the hash function. If we do that, elements with consecutive sort keys may end up in different partitions. If that happens, we can't answer queries efficiently. Right?Eldoree
@GonzaloSolera correct. The sort key is not generally ibcluded in the hash function, but it may be included in the partition function in cases where a single item needs to be split.Grapnel
Thanks! What do you mean with a single item? Do you mean single partition key? And how is it used in the partition function? The only way I can think about is by using O(log(number of partitions for a partition key)) to find the correct partition in such cases, loosing the O(1) cost. Is that correct?Eldoree
@GonzaloSolera sorry - I meant single hash key. Exactly how this metadata is factored in the partition function is not publicly documented but it is definitely not as efficient as when the table consists of well distributed items (ie. Relatively low and uniform number of items per hash key). Having millions of items for the same hash key is essentially a degenerate case.Grapnel
@GonzaloSolera When using sorting, having the data split in two different tables including the sort key for hash would not impact the sorting within the split tables. Dynamodb can still find the larger/smallest etc... record by running a query on both table (which are sorted) in parallel, not unlike a merge sort would work. Also, this is only possible if you did not create an LSI on your table.Respite
Looking at the documentation linked, it doesn't seem to refer to splitting a partition by its sort key. Is this still accurate and is there a different link that details this?Deferential
Great answer! Follow up question, if a partition is split in 2 like that, and you query by just the partition key, would results still be ordered by the sort key? Or is there now a risk of getting two ordered sets back, without an overall order?Estellestella
@Otto, in that case, DDB will run separately on each partition, which will be less efficient, but the results would still be one ordered set.Hannigan

© 2022 - 2024 — McMap. All rights reserved.