High Performance DB for Fast Read and Fast Write. No Update or Delete [closed]
Asked Answered
M

3

14

I am looking for the database/mechanism to store the data where I can write the data and read the data with high performance.

This storage is used to for storing the Logging like important information across multiple systems. Since it's critical data which will be logged, read performance should be pretty fast as these data will be used to show history. Since we never do update on them/delete on them/or do any kinda joins, I am looking for right solution. Probably we might archive the data in long time but that's something ok to deal with.

I tried looking at different sources to understand different NoSql databases, experts opinion is always better :)

Must Have:
1. Fast Read without fail
2. Fast Write without fail
3. Random access Performance
4. Replication kinda feature, one goes down, immediately another should be up and working
5. Concurrent write/read data

Good to Have:
1. Search content like analysing the data for auditing with/without Indexes

Don't required:
1. Transactions are not required at all
2. Update never happens
3. Delete never happens
4. Joins are not required

Referred: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

Mcdaniels answered 12/11, 2014 at 10:58 Comment(3)
Have you considered a flat file? I once consulted to a lottery company. They had very stringent requirements. They used flat files, for fast and reliable read, write, and seek.Hoick
Just don't understand how so folk just "off topic" legit questions....Lexical
You need something like Hadoop with streaming. A SAAS solution is BigQuery though I would recommend it for experimental purpose only.Hon
W
20

Disclosure: Kevin Porter is a Senior Software Engineer at Aerospike, Inc. since May 2013. (ref)

Be sure to consider Aerospike; Aerospike dominates in the adtech space where high throughput reads and writes are a required. Aerospike is frequently touted as having "the speed of Redis with the scalability of Cassandra." For searching/querying see Aerospike's secondary index documentation.

For more information see the discussion/articles below:

  1. Aerospike vs Cassandra
  2. Aerospike vs Redis and Mongo
  3. Aerospike Benchmarks

Lastly verify the performance for yourself with the One million TPS on EC2 Instructions.

[Edit 2024-07-23] The "One million TPS on EC2 Instructions" link no longer works but the information was ported to the AWS Deploy Guide

Warthman answered 13/11, 2014 at 16:21 Comment(5)
thanks for the suggestion. As I mentioned in my post, Read/Write/Search operations should be fast enough. But when I go through Aerospike, it's about in-memory type against Cassandra disk type. We won't be able to offer such huge ram for that as these data will be part of analytics.Mcdaniels
Actually Aerospike isn't only an in-memory database, the most widely deployed storage model is the Hybrid storage where there is a 64 byte index entry for each record in ram and the data is stored on flash storage (SSD).Warthman
As per SO rules, you are required to disclose your affiliation with Aerospike. Don't get me wrong, I love it and I'm sure it's the man for the job :)Eminence
@Warthman it's unfortunate that "One million TPS on EC2 Instructions" link is broken nowadays and I was not able to find it on archive.org 🙁, I'm not sure if configure on operations or deploy guide at aws would be enough to replicate it.Germano
@manus, The relevant tuning recommendations should have been ported to the AWS deploy guide. In particular, pay attention to the "Best Practices" section.Warthman
H
6

Let me be the Cassandra sponsor.

Disclaimer: I don't say Cassandra is better than the others because I don't even know so deeply mongo/redis/whatever and I don't want even come into this kind of stuffs.

The reason why I suggest Cassandra is because your needs match perfectly with what Cassandra offers and your "don't required list" is a set of feature that are either not supported in Cassandra (joins for instances) or considered an anti-pattern (deletes and in some situations updates).

From your "Must Have" list, point by point

  1. Fast Read without fail: Supported. You can choose the consistency level of each read operation deciding how much important is to retrieve the most fresh information and how much important is speed

  2. Fast Write without fail: Same as point 1

  3. Random access Performance: When coming in the Cassandra world you have to consider many parameters to get a random access performance but the most important that comes into my mind is the data model -- if you create a data model that scales horizontally (give a look here) and you avoid hotspots you get what you need. If you model your DB in a good way you should have O(1) for each operation since data are structured to be queried

  4. Replication: In this Cassandra is even better than what you might think. If one node goes down nothing changes to the cluster and everything(*) keep working perfectly. Cassandra spots no single point of failure. I can tell you with older Cassandra version I've had an uptime of more than 3 years

  5. Concurrent write/read data: Cassandra uses the lww policy (last-write-wins) to handle concurrent writes on the same key. The system supports multiple read-write and with newer protocols also async operations.

There are lots of other interesting features Cassandra offers: linear horizontal scaling is the one I appreciate more but there is also the fact that you can know the instant in which every piece of data has been updated (the timestamp of lww), counters features and so on.

(*) - if you don't use Consistency Level All which, imho, should NEVER be used in such a system.

Homeopathic answered 12/11, 2014 at 18:38 Comment(3)
presently I am looking at Elastic Search vs Cassandra. Both are made into final list. Can I get any article/info what are the limitations of each one of them so that I can look at future architecture and decide the choice.Mcdaniels
They're two different solutions possibly made to coexist rather than to compete. Cassandra is a storage system while es is a full text search engine based on lucene. Datastax enterprise is a solution similar to the one just described using solr as full text search engine and Cassandra to persist data and perform exact searches.Homeopathic
I used cassandra in my solution, but read performance for same data (fetching data using exact key) degrades as the data size increases. Which should not have happened.Algor
B
5

Here's a few more links on how you can span In-Memory with Disk (DRAM, SSM, and disk storage) w/ Aerospike:

http://www.aerospike.com/hybrid-memory/

http://www.aerospike.com/docs/architecture/storage.html

I think everyone is right in terms of matching the specific DB to your specific use case. For instance, Aerospike is optimal for key-value data. Other options might be better.

By way of analogy, I'll always remember how, decades ago, a sister of mine once borrowed my computer and wrote her term paper in Microsoft Excel. Line after line was a different row of a spreadsheet. It looked ugly as heck, but, uh, okay. She got the task done. She cursed and swore at how difficult it was to edit the thing. No kidding!

Choosing the right NoSQL database for the right task will either make your job a breeze, or could cause you to curse a blue streak if you decided on the wrong basic tool for the task at hand.

Of course, every vendor's going to defend their product. I think it's best the community answer the question. Here's another Stack Overflow thread answering a similar question:

Has anyone worked with Aerospike? How does it compare to MongoDB?

btw: Do you have any more specific insights for us on what type of problem you are trying to solve?

Balsaminaceous answered 18/11, 2014 at 17:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.