Picking a database technology
Asked Answered
F

3

5

We're setting out to build an online platform (API, Servers, Data, Wahoo!). For context, imagine that we need to build something like twitter, but with the comments (tweets) organized around a live event. Information about the live event itself must be delivered to clients as fast and consistently as possible, while comments about the event can probably wait a bit longer to be delivered. We'll be read-heavy after the live event finishes.

Scalability is very important. We want to start out renting VPS slices, and scale from there. I'm a big fan of the cloud, and would like to remain there as long as possible. We'll probably be using ruby.

I'm convinced that I want to try a document store instead of an RDBMS. I like the idea of schema-less storage and the promises of easier scalability by focusing on key-value.

The problem is I don't know which technology is the most appropriate for our platform. I've looked at Couch, Mongo, Tokyo Cabinet, Cassandra, and an RDBMS with blobbed documents. Any help picking the right tool for this particular job?

Farleigh answered 22/1, 2010 at 5:47 Comment(0)
P
7

Checkout the NO SQL alternatives comparison by BJ Clark.

Scalability is very important.

Then you need to consider the excerpts from his blog:

  1. Tokyo Cabinet - Doesn't scale
  2. Redis - Doesn't scale
  3. Project Voldemort - scales
  4. MongoDB - limted (sharding is been implemented)
  5. Cassandra - scales
  6. Amazon S3 - scales
  7. Couch - Doesn't scale (Clustering & replication)
  8. MySQL - Doesn't scale

And consider HyperTable. This is also a serious contender in No-SQL alternatives. It's an open source implementation of Google's BigTable concept. I believe it scales well because it's extensively used by the Chinese search engine Baidu and entertainment portal Rediff.

You were saying:

Information about the live event itself must be delivered to clients as fast and consistently as possible, while comments about the event can probably wait a bit longer to be delivered. We'll be read-heavy after the live event finishes.

This is something like Twitter's approach. Your programming language selection is also very important, because Twitter initially went with Ruby for back-end message delivery but they were saying it's not a correct choice and they have moved the entire message delivery system to the Scala language.

They are still using Ruby for their front-end. If you want to go with a highly reliable, fault tolerant system that is well suited for scalable environments, then you should consider Scala or Erlang.

Percutaneous answered 22/1, 2010 at 6:10 Comment(3)
Why point 7. Couch - doesn't scale? Take a look at cloudant.com and couchio.comSpectra
Yeah, I'm also confused about Couch. There seems to be some serious disagreement about the replication approach to scaling as a whole. The Couch guys list scalability as one of their main features, while the rest of the world seems to blow them off.Farleigh
CouchDB performance has increased an order of magnitude in each release. The current trunk performance is nothing like it was in August when that article was written. Your preferred scaling strategy will depend on your situation. You may need replication or sharding and CouchDB has builtin peer to peer replication that works great and with couchdb-lounge you can do sharding.Yawn
S
1

Ramesh has a good summary. I would add that Cassandra has a richer data model than vanilla Dynamo clones (like Voldemort or Dynomite): rows with named, sorted columns rather than just key/value. Cassandra is being used by Twitter, Mahalo, Ooyala, SimpleGeo, WebEx, and others (http://n2.nabble.com/Cassandra-users-survey-td4040068.html), at least some of which are running Cassandra clusters on EC2 or rackspace cloud servers.

Seventy answered 22/1, 2010 at 15:30 Comment(0)
R
1

If you want to scale horizontally (distribute your data over more than one node) you have to take the CAP theorem into account.

http://www.julianbrowne.com/article/viewer/brewers-cap-theorem

It is not easy stuff but you have to choose, there is always some kind of trade off.

Rejoice answered 22/1, 2010 at 22:24 Comment(1)
Thanks... That was the best article on the CAP theorem I'd read.Farleigh

© 2022 - 2024 — McMap. All rights reserved.