Reliability of atomic counters in DynamoDB
Asked Answered
J

3

36

I was considering to use Amazon DynamoDB in my application, and I have a question regarding its atomic counters reliability.

I'm building a distributed application that needs to concurrently, and consistently, increment/decrement a counter stored in a Dynamo's attribute. I was wondering how reliable the Dynamo's atomic counter is in an heavy concurrent environment, where the concurrency level is extremely high (let's say, for example, an average rate of 20k concurrent hits - to get the idea, that would be almost 52 billions increments/decrements per month).

The counter should be super-reliable and never miss a hit. Has somebody tested DynamoDB in such critical environments?

Thanks

Julio answered 20/2, 2012 at 20:56 Comment(0)
B
24

DynamoDB gets it's scaling properties by splitting the keys across multiple servers. This is similar to how other distributed databases like Cassandra and HBase scale. While you can increase the throughput on DynamoDB that just moves your data to multiple servers and now each server can handle total concurrent connections / number of servers. Take a look at their FAQ for an explanation on how to achieve max throughput:

Q: Will I always be able to achieve my level of provisioned throughput?

Amazon DynamoDB assumes a relatively random access pattern across all primary keys. You should set up your data model so that your requests result in a fairly even distribution of traffic across primary keys. If you have a highly uneven or skewed access pattern, you may not be able to achieve your level of provisioned throughput.

When storing data, Amazon DynamoDB divides a table into multiple partitions and distributes the data based on the hash key element of the primary key. The provisioned throughput associated with a table is also divided among the partitions; each partition's throughput is managed independently based on the quota allotted to it. There is no sharing of provisioned throughput across partitions. Consequently, a table in Amazon DynamoDB is best able to meet the provisioned throughput levels if the workload is spread fairly uniformly across the hash key values. Distributing requests across hash key values distributes the requests across partitions, which helps achieve your full provisioned throughput level.

If you have an uneven workload pattern across primary keys and are unable to achieve your provisioned throughput level, you may be able to meet your throughput needs by increasing your provisioned throughput level further, which will give more throughput to each partition. However, it is recommended that you considering modifying your request pattern or your data model in order to achieve a relatively random access pattern across primary keys.

This means that having one key that is incremented directly will not scale since that key must live on one server. There are other ways to handle this problem, for example in memory aggregation with a flush increment to DynamoDB (though this can have reliability issues) or a sharded counter where the increments are spread over multiple keys and read back by pulling all keys in the sharded counter (http://whynosql.com/scaling-distributed-counters/).

Blum answered 23/2, 2012 at 5:19 Comment(2)
Sadly link rot has set in for this answer on the second linkJimmy
web.archive.org/web/20120621195522/https://whynosql.com/…Tague
B
11

In addition to gigq's answer about scalability, DynamoDBs atomic increments are not idempotent and therefore are not reliable: If the connection drops after issuing an UpdateItem ADD request, you have no way of knowing if the add was committed or not, so you don't know if you should retry or not.

DynamoDB conditional updates fix this, at the cost of making the system even less scalable, because you have to retry every time two changes to the attribute are attempted simultaneously, even in the absence of an error.

Boschvark answered 23/2, 2012 at 23:6 Comment(3)
DynamoDB conditional updates fix this, not really: if the client have a network error when the write was applied but before it knows about it, what should the client do?Neeley
The docs say it must retry because conditional updates are idempotent, but I don't agree. E.g. the client reads a counter, its value is 10 and must be incremented by 1. It performs the first call: set counter to 11 if its value is 10. The update is executed and the connection drops. The client catches the network exception and retries: the condition is false. Then the client doesn't know if it should try to increment by 1 from 11 or not: the problem is that if a network error occurs the client has no way to distinguish between his own increment and an increment made by others concurrentlyDonadonadee
What if you used the ReturnValues from the update statement? This way you get the value once the update was done. The return value is strongly consistent. Then you do not need to read, and then update. If your network drops, you retry. Worst case is you skip a number in the sequence. docs.aws.amazon.com/amazondynamodb/latest/APIReference/…Burnette
I
4

if you are going to write a single dynamo db key, you will suffer from hot partition issue. Hot partition issue starts around 300 TPS per index. So, if you have 5 indexes in table, you may see hot partition issue around 300/5 ~ 60 TPS.

Otherwise, dynamo db is scalable to about 10-40K TPS, depending on your use case.

Imbecility answered 20/3, 2018 at 23:31 Comment(2)
There's a great article by Segment about their problems with hot-partitions called The Million Dollar Engineering ProblemPenchant
That's old issue and it was addressed with Adaptive capacity. Single partition can run now at 3000 RCU and 1000 WCU. Quote: If your application drives consistently high traffic to a single item, adaptive capacity might rebalance your data so that a partition contains only that single, frequently accessed item. In this case, DynamoDB can deliver throughput up to the partition maximum of 3,000 RCUs and 1,000 WCUs to that single item’s primary key. Docs: docs.aws.amazon.com/amazondynamodb/latest/developerguide/…Scat

© 2022 - 2024 — McMap. All rights reserved.