Has anyone worked with Aerospike? How does it compare to MongoDB? [closed]
Asked Answered
P

4

57

Can anyone say if Aerospike is as good as they claim it to be? I'm a bit skeptical since it's a commercial enterprise. As far as I understand they just released a open source version, but the claims on their website could still be exaggerated.

I'm especially interested on how Aerospike compares to MongoDB.

Pinnatiped answered 8/8, 2014 at 17:23 Comment(1)
try tarantool, the ingest speed a bit slower than aerospike, but the read speed is the samePhotoplay
C
104

I have used Aerospike, MongoDB and Redis and have tested many other NoSQL databases. I would say Aerospike is very good at what it does but it is different than MongoDB. Everything depends on what you are planning on using a database for. I can give you an example of what I am using my different databases for. I can also go over the differences between them and discuss the benefits of Aerospike.

MongoDB

I am using MongoDB as a SQL alternative. In my MongoDB database I have many different fields. Often times the fields are changing and I will randomly need to query on various fields. It is a very unstructured database and MongoDB is amazing at that. I have also used MongoDB as a standard key-value store. It performs well but I have had MongoDB perform sub-optimally at both transaction scale and database size scale. Admittedly, the database might have been optimized a little better but I find it very hard to find documentation on configuring MongoDB correctly in different situations.

Redis

Redis is a pure key-value store. Redis' biggest problem is that it is purely in-memory (it will use disk as a backup but you cannot store more information than you have memory available). It is extremely fast for what it is used for. I personally use it for a small transactional database: I do very simple functions on keys like counting how many times an event happened for a certain user. I also do quick in-memory look ups that I need mapped to different values. Redis is a great tool for a small dataset and it is extremely fast. Configuration is very easy as well.

Aerospike

I personally use Aerospike to replace Redis when it's time to scale. From my understanding, it can be used for more. Like Redis, Aerospike is a key-value store. I believe the open source edition also supports secondary indexes which Redis does not (I have not used secondary indexes in production but have done little testing on them).

Aerospike's best feature is its ability to scale. The biggest problem I needed to solve when looking into Aerospike was scaling my system to handle large data sets while remaining extremely fast. The project I use Aerospike for has very stringent requirements on speed. I usually make 3-4 database lookups plus other processing and need to have sub-50ms transaction times. A few look-ups are on data sets which are 300GB+. I could not find a solution to hold this data and make it accessible in a reasonable amount of time. Redis obviously won't work unless I had a machine which had 300GB+ of RAM. MongoDB started to perform extremely poorly at a size much lower than 300GB. So I gave Aerospike a shot, and it was able to handle everything very well. The best thing about Aerospike: as my data set has grown I have not had to do much more than standing up a new box when needed. The speed has stayed consistent.

I also find Aerospikes documentation very good. It isn't too hard to configure and it's pretty easy to find answers for any issue that comes up.

Conclusion

So, is Aerospike is as good as they claim? Personally, I have seen nothing less than what has been claimed. I haven't had to scale to 1 million TPS but I do believe with enough hardware that would be possible. I also believe the numbers showing a speed difference between Aerospike and MongoDB. Aerospike is a much more "configured" and "planned out" database than MongoDB. Because of this Aerospike will be much faster at scale than MongoDB. It only has to worry about a single (or in case of secondary indices, a few hundred) indexes unlike MongoDB which can change dynamically. The question you really need to be asking is what you are trying to accomplish with your database. Then look into which database will fit your needs best. If you need a scalable, fast, key-value store database I would say Aerospike is probably the best out there.

Let me know if you have any specific questions or need anything clarified. I would probably be able to help you out.

Coreen answered 8/8, 2014 at 20:29 Comment(5)
Can you share some comments about transactions in Aerospike? I heard that they are not ACID compliant.Perithecium
I suggest taking a look at this, it explains everything much better than I can: aerospike.com/docs/architecture/assets/AerospikeACIDSupport.pdf But basically it supports ACID, but can be configured away from it if need be(at least for consistency).Coreen
@Coreen Can we use pagination for aerospike query?Sabol
My gut instinct says no, and I did some additional research and didn't find anything. Each primary key must be unique, so a key would return a single value. Secondary indexes may supply a case for pagination, but I have been unable to find any information on pagination support, I believe a query will return all matched results, I do not see any way to limit the number of results, and continue from where your results last left off.Coreen
@Coreen You can use range queries on the Large List data type: aerospike.com/docs/guide/llist.htmlPicture
P
57

Speed

Aerospike is faster. Almost any system will be quick with low load or simple data access but Aerospike has stayed consistently fast by optimizing for in-memory and SSD-based storage options. Mongo is fast when used with lots of RAM where for caching but is otherwise slow and has low write performance.

Reliability

Aerospike is very stable, although with simpler data access. MongoDB has historically been problematic with persisting data and failover but is much better now. Because Aerospike has better performance and easier management, it leads to less potential problems when scaling.

Setup/Configuration

The clustering with Aerospike is much easier to setup since all nodes are the same and the client drivers handle connections and failover automatically. MongoDB can be easier if you're setting up a single server as it runs on more platforms natively and you can start it without any configuration.

MongoDB has two major ways of clustering, replica sets (for availability) and sharding (for scalability). We had 5 shards and each shard had a replica-set of 3 servers. That's 15 servers to hold data. Then we had 3 config servers that maintained the cluster configuration and had to add 2 arbiter processes after our first major outage to deal with properly escalating a slave to master. That's a lot of moving pieces and also makes it incredibly hard to change your layout in the future.

In contrast, Aerospike has took much less effort but requires more configuration, most of which cannot be changed once the cluster has started whereas with MongoDB you can create and alter databases anytime.

Aerospike does have the ability to sync multiple clusters (which is complicated to setup) so you can have different active datacenters replicating data and accepting writes, something that MongoDB doesn't really support at all.

Data Access

MongoDB has database/collection/document where each document is just json. Aerospike has namespace/set/record where each record is a collection of key-value "bins", which can then have nested key/value structures. Namespaces are pre-configured and are not dynamic, and names for properties are limited to 14 characters which is annoying to work with.

Both have secondary indexes although MongoDB lets you query immediately by anything while Aerospike requires index setup or custom scripting. Both have built-in aggregation frameworks. Aerospike clients support LUA scripting but MongoDB supports map-reduce and custom javascript functions.

It really depends on what your application needs, but MongoDB wins in flexibility, easier querying and less restrictions.

Cost

Both are now open-source and free. Both have enterprise versions with extra features, but licensing is expensive if you have lots of data. Aerospike might be cheaper since it requires less machines for the same performance.

Overall

For most scenarios, I would recommend Aerospike. The document-store semantics and flexibility of MongoDB are great but scaling and maintaining it as a distributed database is painful. Aerospike is fast and reliable and can run with fewer nodes that are easier to scale.


January 2016: MongoDB has released MongoDB Cloud Manager which is a paid SaaS service that can provision and manage your clusters. This solves a lot of the trouble with configuring Mongo.

March 2017: Both databases have come a long way. Aerospike now has faster replication and more flexible config settings without restarting the whole cluster. MongoDB has new schema enforcement, better performance and even supports joins along with MongoDB Atlas managed service to take away all the scaling issues.


I now highly recommend ScyllaDB which is a Cassandra compatible open-source database with incredible performance, multi-datacenter replication, and no limits on usage.

Picture answered 8/9, 2014 at 2:4 Comment(13)
Thanks for the excellent and thorough answer, Mani. Would you say that Aerospike is flexible enough to be used as the data store for a CMS, a session store, and analytics for something like an ad network? Is Aerospike a candidate to replace all of Mongo, Redis, Elasticache?Perfectionism
Those are 3 very different things: ElasticCache is just Redis (or memcached) as a service by AWS so you're really just comparing Redis vs Mongo vs Aerospike. Redis and Aerospike are comparable in data structure support (just slightly different interfaces but you can do the same thing) but Aero comes with a way better clustering/availability system.Picture
Mongo I would only suggest if you need very quick setup on windows/mac/linux without a lot of performance demands or really really need that super flexible document-store JSON model. If not and you can model your data in key/value and lists/sets/stack then I'd go with Aerospike.Picture
Please remember this is still a narrow comparison and there are dozens of databases out there so I can't predict what's right for you (sometimes the answer can even be build-your-own depending on the circumstances). Please experiment/test/measure and make sure you make the right decision for your use case.Picture
Whoops. I meant Elasticsearch. Anyhow, thanks for the feedback. I'm going to be doing a tonne of experimenting with Aerospike to gauge how flexible the data model is. I make use of sub-docs and arrays of sub-docs in a few cases where it would be tricky to work around that. I'll spin up a sandbox cluster on AWS and see how it drives. Cheers.Perfectionism
Elasticsearch is great but its a json-document store (like mongo) designed to be a powerful search db. You can definitely use it but there are big differences in clustering, performance and other features. I've written more about it here: quora.com/…Picture
@ManiGandham how u manage replica set feature in Aerospike as i understand its architecture gives you multiple nodes with same data set so it transfer load to different node in case of a node fauliure but when it is in different geographical location then transferring load to far location will introduce latency i think. Thanks VirenAleasealeatory
@Aleasealeatory Aerospike has cross-datacenter replication which syncs data across multiple clusters which are in different locations or data centers. More info here: aerospike.com/docs/architecture/xdr.htmlPicture
@ManiGandham Any reasons for suggesting ScyllaDB as still Aerospike seems to outperform in their benchmarks with ScyllaDB? pages.aerospike.com/rs/229-XUE-318/images/…Firehouse
@Firehouse that benchmark is several years old and very skewed. both databases are very fast but Aerospike is completely custom with everything from storage to data model to replication. Scylla is a Cassandra clone and now has 100% feature support so it has a much bigger ecosystem and and much more flexible data model. Aerospike might win on pure k/v throughput but Scylla is also very fast while being much more functional with better replication, secondary indexes, large rows, analytics integrations, etc.Picture
Thanks for the details. Are multi record transactions possible with scylla?Firehouse
@Firehouse Scylla is a clone of Cassandra so you should research that data model and operations available. It doesn't have transactions in the typical sense but there are record batches, LWTs (read before write checks), and per-query consistency settings you can use. Scylla does implement these operations better than standard Cassandra.Picture
ScyllaDB would be better suited for high write workloads with AP model because it uses LSM for write path. For reads, it need to fetch multiple versions from different SSTables and merge them for single final row view and for strong consistency, request need to fan out to quorum nodes. Mix read/write or read intensive workload will strong consistency would perform much better with on aerospike with better predictable latencies because it does inplace write with row level locking, so read is sent to a single node and only one row is fetched and indexes are always in memory.Battlement
G
21

I have used MongoDB(2.4) and Aerospike 3 in our production systems. These are the few observation found by our team :-

1)Read/Write throughput by Aerospike is unbeatable. Usually Mongo db works up to certain scale if read requests are at higher side. If you need concurrent read/write as 95/5 percent ratio, Mongo degrades like anything. With Aerospike we have seen very little impact even if this ratio is 90/10. On AWS we have achieved 200k TPS using Aerospike.

2)In Aerospike latency is very low. Read latency was sub-millisecond for 99 percentile at server side. Write latency was sub-millisecond for 80 percentile and within 8ms for 100 percentile. Best thing was that we got almost similar number in different POC, so consistent performance.

3)Very few nodes are sufficient in Aerospike cluster compare to other solutions. Also SSD based data store gives quite impressive numbers, so very cost effective and little maintenance overhead.

4)Now Aerospike is open source, so hope for wider community support :-)

So we are using Aerospike for all the new systems and trying to migrate from MongoDB.

Gounod answered 10/8, 2014 at 12:54 Comment(1)
Hey Samir. Thank you for this great answer. Are you storing everything in Aerospike, or just data that you need available very fast? Have you tried implementing range queries etc?Intransigeance
N
0

MongoDB and Aerospike are not done for the same data management, as SQL is not dead too.

We have done cache systems with sharded clusters on Mongodb with TokuMX version (2.0.0 based on Mongodb 2.4.10), system is still running well with only 0,1% of queries taking more than 100ms on 65 millions queries per day and about 10 millions updates per day. We're now trying Aerospike wich seem's to be great and now open-source. There is only one problem with this open-source version,

DON'T USE IT IN PRODUCTION SERVERS !

The security management is only available in Enterprise distribution. It means that

YOU CAN NOT SECURE ANYTHING WITH PASSWORD AND USER !

Now, if you don't mind, you can use it on production, but don't remember any of your client can ask for a security audit and then you'll have to pay a lot.

Neau answered 21/12, 2014 at 7:50 Comment(3)
Whether you'll have to pay on a security audit completely depends on the certification you say to implement, or the security promises you made. I assume that in most production cases the cluster is ran on an internal private network so it's not like the open-source version is always accessible from the outside...Visigoth
Security can be achieve with an internal network and firewall rules. You dont have to run the cluster on machines completely open to the internet - nor should you.Picture
There is even more to this. You cannot even remove a record in community edition - it will be brought back after a restart. Durable deletes are also only supported in enterprise edition. Be aware that you probably will need to buy an enterprise edition of aerospike.Mu

© 2022 - 2024 — McMap. All rights reserved.