How are consumed read capacity units calculated in DynamoDB query
Asked Answered
U

3

20

I've seen the page on amazon and understand that 1 RCU is a 4KB item.

If I have a table with 50 items, I've read that a scan will read the full 50 items and use 50 RCU. But lets say I did a query, my table is 10 by 5, will it still use 50 RCU?

Understanding answered 4/5, 2018 at 15:21 Comment(1)
Query will only consume the count of items there are returned (assuming there is no filter, which will be filtered after the reading, and total size is less than 1mb)Catmint
K
53

Scanning a table that contains 50 items will consume 50 RCU only if the total size of the 50 items combined equal 200KB (for a strongly consistent read, or 400KB for an eventual consistent read). Most items are not that big, so a 50 items typically only require about 10KB to store meaning a full scan for a table of 50 items, with eventual consistency, would only cost about 3 RCU.

The consumed Read Capacity Units (RCU) depends on multiple factors:

If an item is read using a GetItem operation than the consumed capacity is billed in increments of 4KB, based on the size of the item (ie. a 200B item and a 3KB item would each consume 1RCU, while a 5KB item would consume 2 RCU)

If you read multiple items using a Query or Scan operation, then the capacity consumed depends on the cumulative size of items being accessed (you get billed even for items filtered out of a query or scan when using filters). So, if your query or scan accesses 10 items, that are approximately 200 bytes each in size, then it will consume only 1 RCU. If you read 10 items but each item is about 5KB in size, then the total consumed capacity will be 13 RCU (50KB / 4KB = 12.5, rounded up, is 13)

What's more, if you perform an eventual consistent read, then you can double the size per capacity unit. So it would only cost 7 RCU to read the 10 5KB items.

You can read more about throughput capacity here.

A couple of things to note:

  • a single item may be as large as 400KB, so reading an item could consume as much as 100 RCU.
  • when calculating item size, attribute names count towards the item size as well, not just their values!
Ku answered 5/5, 2018 at 20:28 Comment(13)
Useful summary. However, its unclear to me what "accessed" mean. If I query based on the Hash Key, would my query access only items with that key? How about sort key?Loch
Correct. A query will only access items of a particular hash keyKu
Thanks. If I also set constraint on the sort key, would all items of the HashKey be accessed, or only the ones matching the constraint on the sort key as well?Loch
Not sure what you mean. A query requires a hash key. It is that hash key that gets accessed in that query.Ku
I'm asking about composite keys (consisting of a hash key + a sort key): multiple Items may have the same hash key. When running a query where I specify the hash key + a constraint on the sort key (e.g. a BETWEEN condition), which items get accessed? All items with the same Hash Key, or only the ones matching the constraint on the sort key?Loch
You can verify this by asking to return the consumed capacity in the query response but ionly the items returned by the key constraint should be counted towards the consumed capacityKu
@MikeDinescu if you performed 4 rapid queries in succession (as is often the case with geoqueries), are those 4 queries guaranteed to be calculated individually? Or might they be calculated twice, for example, if each query made it to DynamoDB within half a second? In other words, if the first and second query hit the API within 1 second, would the RCU calculation be on their combined item size and treated as one API call?Lamoree
This would be better asked as a separate question but the TL;DR is each query is a separate request therefore capacity utilization is billed per reqestKu
@MikeDinescu Good idea #54468874Lamoree
Most items are not that big, so a 50 items typically only require about 10KB to store meaning a full scan for a table of 50 items, with eventual consistency, would only cost about 3 RCU. Is this really correct? According to AWS' docs, "One read request unit represents one strongly consistent read request, or two eventually consistent read requests, for an item up to 4 KB in size." No where in docs does it say reads capacity is cumulative...Howbeit
Capacity consumed is for each operation(request), based on the amount of data accessed, not per item. Meke sense?!Ku
@Loch To answer your question, all the items with the same hash key are accessed and then filter it applied on top it. Capacity consumption also is for all items accessed, not just the ones returned.Forecast
This needs to be so much more clear in the documentation... Maybe the pricing pagePrisoner
S
7

Query—Reads multiple items that have the same partition key value. All items returned are treated as a single read operation, where DynamoDB computes the total size of all items and then rounds up to the next 4 KB boundary. For example, suppose your query returns 10 items whose combined size is 40.8 KB. DynamoDB rounds the item size for the operation to 44 KB. If a query returns 1500 items of 64 bytes each, the cumulative size is 96 KB.

Ref: https://docs.amazonaws.cn/en_us/amazondynamodb/latest/developerguide/ProvisionedThroughput.html

Shit answered 15/3, 2020 at 20:47 Comment(0)
F
4

Smoke tested this with following entries using composite primary key & provisioned capacity, and eventual consistency in place:

  • entry#1 (size ~ 200B): hash key = foo, range key = foobar

  • entry#2 (size ~ 5KB): hash key = foo, range key = foojar

Queries to the table & reported consumption of RCUs:

  1. hash key EQUALS "foo" AND range key BEGINS_WITH "foo" --> both entries returned and 1 consumed RCUs
  2. hash key EQUALS "foo" AND range key BEGINS_WITH "foobar" --> entry with size ~ 200B returned and 0.5 consumed RCUs
  3. hash key EQUALS "foo" AND range key BEGINS_WITH "fooojar" --> entry with size ~ 5KB returned and 1 consumed RCUs

As already being speculated, this would indicate, that the accessed items are those matching the whole composite key, not just the hash key.

Compared, if you just queried the items via hash key, and then filtered to down to single item --> it would access all items in the partition and still consume the 1 RCU.

Fracas answered 14/12, 2021 at 9:30 Comment(2)
Point 3 would be 2 RCU since the size is >4KBVolkan
Nope, the tests were performed using eventual consistency. docs.aws.amazon.com/amazondynamodb/latest/developerguide/…Fracas

© 2022 - 2024 — McMap. All rights reserved.