Comparison : Aerospike vs Cassandra [closed]
Asked Answered
Z

4

43

Both Aerospike and Cassandra says they are better than the other in their own respective benchmarks.

Reference : http://java.dzone.com/articles/benchmarking-cassandra-right and a few others.

Has anyone used both of them?
Is Aerospike as good as claimed?
Finally is it advisable to replace Cassandra with Aerospike?

Zacharyzacherie answered 22/8, 2014 at 9:37 Comment(4)
Under what scenario? What level of consistency do you want? How much money will you throw at SSD? Are you read heavy or write heavy?Shavers
Well currently we have about a 100 nodes to maintain our data spread out in different datacenters. We have a read:write ratio of about 2:1. The answer below does throw some insight from financial point of view too. I guess it will be good to try out aerospike once. Thanks !Zacharyzacherie
Good answers. Thanks for question. I was just starting to evaluate Cassandra, but will look at Aerospike first now.Shavers
Hello. Did anybody compared Aerospike to Hyperdex in terms of speed or features?Appreciable
C
71

Choosing between Cassandra and Aerospike really depends on your use case more than anything. I have personally used both as a production system for the same project and for me Aerospike was the clear winner but that's because our use case is to have highly concurrent, low latency, transactional, small updates to billions of entries with ~10x more read than write volume. This is what Aerospike excels at, it has the minimal latency I have ever seen in a database of its kind even when using an SSD namespace. For these reasons Aerospike was the clear choice for us.

On the other hand, Cassandra is better for high write volume and can handle larger records. Everything is page based so it operates well on non-SSDs but can never give you the extreme low latency that Aerospike can unless your records fit into the cache. Its also worth noting that Cassandra is much harder to maintain from an operations perspective than Aerospike is. For us personally it was an operations nightmare and I know that Netflix has to employ a sizable team of operations engineers solely to manage their Cassandra clusters. Also while the system may have matured more by now, when we were using it (around the 1.0 version) we would hit strange occasional assert errors and exceptions that stop internal db actions from taking place and typically had to wipe the data from those nodes in order to fix it every time.

Another factor here is cost which may or may not play into your decision depending on your application. The larger the keyspace the more expensive your Aerospike cluster will be from a hardware perspective. All keys need to be stored in memory regardless of whether it is an in memory or ssd namespace. Once you get into the billions of keys range you will need terabytes of ram in your cluster to support that with a replication factor of 2. Cassandra obviously does not have this issue since the keys and values are both stores on disk.

To answer your second 2 questions, yes it is as good as it claims, we store about 5B keys and do ~1M TPS at peak load and it does it without breaking a sweat (although it takes almost 20 nodes per cluster to do this with 120GB ram each). And as for is it advisable to replace Cassandra with Aerospike, for us it was a definite win and the right decision. If your application fits the design of Aerospike and it works out to be cost effective then it is definitely advisable to make the switch. When it comes down to it though its about your use case. If its not clear which one is the better fit for you then try them both and see how they play out. Good luck.

Edit:

One of the reasons currently to choose Cassandra over Aerospike is for when applications need certain consistency guarantees. For applications such as counters for example, Aerospike can become in an inconsistent state due to a network partition whereas Cassandra is capable of these through the use of conflict free replicated data types (CRDT). On a good network and also for many use cases in general this isn't an issue, but as stated earlier the performance of Aerospike can't be beaten and that's typically why it is chosen.

Edit 2:

Aerospike v4 has now introduced their version of a consistent mode (verified by Jepsen: https://jepsen.io/analyses/aerospike-3-99-0-3). Additionally Aerospike has implemented it through strong consistency whereas Cassandra only has eventual consistency through the use of CRDTs so it's still possible to read stale data. Also from personal testing I can say that the performance during normal operation did not suffer for our use case when using their strongly consistent mode.

Crust answered 22/8, 2014 at 19:34 Comment(10)
Thanks ! Some great points which I was looking for. It will be great to reduce operational overhead. It really is difficult to maintain cassandra. I guess we will try out aerospike. Cheers !Zacharyzacherie
Very interesting, thank you. Hearing from people who have used things in anger is very valuable.Shavers
I've heard verbally that Netflix uses a small support team for Cassandra. Is there anywhere documented as to how many they're actually using?Commendatory
What issues did you run into supporting Cassandra that made it an "operations nightmare"? I'm headed down the Cassandra path, so I'd like to know. :)Commendatory
So its not that their support team is that large its that they have to employ a team purely to manage Cassandra and nothing else because it is a full time job. As for the Cassandra issues, I'm not personally in operations so I don't know specifics but I remember the migrates being a pain and degrading performance while they happened. And the worst was the internal errors we would hit that would stop compactions and I found myself digging into the source code often and using jmx constantly to force internal operations to happen. But like I said we abandoned it around 1.0/1.1 and its at 2.0 now.Crust
In 2012, NetFlix had 3 people managing 30 Cassandra clusters, with twelve of them spanning multiple DCs. They've added many more clusters since then, and the ratio of employee to cluster has only improved. Cassandra really can be the most boring system in your data-center requiring virtually no maintenance.Junto
I agree. I go in every few months to do maintenance on Cassandra IFF our monitoring system alerts me of a failure.Ables
Hello. Did you compare Aerospike vs Hyperdex in terms of speed, features or consistency?Appreciable
Nobody has mentioned the column data store advantage of Cassandra. If your use case is batch queries of large timeseries, Cassandra is a perfect and very cost effective fit, since it does not need super-expensive SSDs to give you high speed access to timeseries ranges. On this the key-value store model can work , but you're spending zillions on random-access optimization via SSDs / RAM that you don't need for series data like this. We ingest 10k ticks per second of financial markets data and I can guarantee you, I don't want to pay for RAM to index that when the queries are usually columns.Hedges
And no one has mentioned the benchmark from June 2016 in which Cassandra and Aerospike were both run for 12 hours. Aerospike showed 14x the throughput, and a 42x lower read latency vs. #Cassandra 3.5. So basically with Aerospike, you can reduce your cluster size by 14x and avoid the tuning and devops nightmare that so many talk about with Cassandra.Babbage
N
15

If you need stable predictable performance with low latency and no hassle with maintanence, go with Aerospike. Want to play games, go Cassandra. I've bring Cassandra more than 4 years ago to my company with no regret, but today for the reasons above I choose Aerospike, which is open source and more available than a year ago and biult like a russian tank - with reason.

You just have to know the limits of both platforms. Play with both, choose wisely.

Nefen answered 22/8, 2014 at 18:3 Comment(1)
Thanks ! We have been using Cassandra and have came to know about its merits and demerits. We will try out Aerospike and then come to a decision.Zacharyzacherie
S
10

While many people deploy Aerospike as a pure in-memory database, it also supports a hybrid memory configuration, spreading the database across RAM, SSD/Flash, and spinning disk. Here are some short and more long-answer links to address the issue. Certainly people want the best out of both worlds: more persistent data stored on cheaper disk, and faster, more ephemeral data being stored in more expensive-per-GB RAM or SSD.

https://www.aerospike.com/products/features/hybrid-memory-architecture/

http://www.aerospike.com/docs/architecture/storage.html

I'd be eager to hear feedback on folk's experience in terms of such deployments.

Sinapism answered 18/11, 2014 at 0:25 Comment(0)
S
9

Both products depends on the use case you are using, but I would definitely not hesitate to say, that Aerospike can scale better than Cassandra and in cost effective way with SSDs and having less number of nodes to maintain.

Also, regarding memory usage with large number of keys in Aerospike, you could bucket your records in different sets/bins in your namespace, for example, if you have 10 billion records, then you can bucket it in 5 sets and 5 bins inside namespace having hash value to the keys, which would serve as a lookup value. So, you can have just 2 billion of records in namespace and this would reduce your number of keys in memory.

Stunning answered 5/11, 2014 at 18:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.