Aerospike Hot Key error
Asked Answered
U

2

5

Based on this link, I understood that hotkey error happens when there are too many concurrent operation requests for the same key.

My current scenario:

I have a record which will get updated in every 5-10 seconds interval and I will have around 20 machines each with 10K Queries per second tries to read that record

  • Question 1 : Hotkey error will happen only when there are concurrent update transactions or it can happen for concurrent reads also?
  • Question 2 : The transaction-pending-limit mentioned in the above link is it per node in the cluster or for the overall cluster limit?
  • Question 3 : Based on my reading we should not increase transaction-pending-limit because it will impact performance, can you tell me some performance numbers to compare? And what is the maximum value that can be used for transaction-pending-limit?
  • Question 4 : Is there any workaround for my scenario without impacting the performance other than caching the record in the server?
Ursola answered 16/4, 2016 at 5:2 Comment(0)
D
7

1- Both reads/updates.

2- Per node. All transactions will go to node holding the master partition for that record for update and for read it will also go to node holding the master partition for that record, unless if you have a client policy to also read from node holding replica(s) partition.

3- Hard to give numbers. It will cause more client connections to the nodes where the hotkey is, which in turn can degrade performance, depending on the setup.

4- Easiest, if use case permits, would be to use the read replica client policy to mitigate the reads across master and replica partitions. Otherwise, create multiple keys.

Demount answered 16/4, 2016 at 21:53 Comment(0)
T
2

Based on your question details, are you saying that you have 20 servers that each lookup the same record 10k times/sec?

We set the transaction-pending-limit to 0 to remove the limit on pending operations and were able to do roughly 30k operations/sec on the same key in-memory. If you want to have 200k ops/sec, you can use a cluster with more nodes and use read-from-replica settings to get the throughput.

If the record is only changing every 5-10 seconds though, then why not read the record once per second and cache the result within your application? Even if it's different keys, smart caching within your app will greatly reduce the operations and network traffic required and let your system scale much better. This is the best option.

Test answered 18/4, 2016 at 10:48 Comment(2)
Yes caching option is there, But I want to use when I don't have any other alternative, I was thinking is it possible to make some config changes on aerospike to fix this Hot Key errorUrsola
@Ursola Caching is the best option, not an alternative. This is proper application architecture. Why would you do 200k queries/sec when you can do 20/sec instead and have the same result with less resources and faster performance?Test

© 2022 - 2024 — McMap. All rights reserved.