Scaling MongoDB on EC2 or should I just switch to DynamoDB?

Asked 19/2, 2012 at 22:57 Answered 29/2, 2012 at 3:54

perl mongodb amazon-simpledb amazon-dynamodb

I currently run my website on a single server with MongoDB. On my server I have two components (1) a crawler that runs hourly and appends data to my MongoDB instance (2) a web-site that reads from the crawler index and also writes to a user personalization DB. I am moving to Amazon EC2 for auto-scaling, so that web-server can auto-scale, so I can increase the number of servers as the web-traffic increases. I don't need auto-scaling for my crawler. This poses a challenge for how I use MongoDB. I'm wondering what my best option is to optimize on

Minimal changes to my code (the code is in perl)
Ability to seamlessly add/remove web-servers without worry about losing data in the DB
Low cost

In the short-term, the DB will certainly be able to fit in memory across all machies since it will be under 2 GB. The user personalization DB can't be rebuilt so its more important to have this, while the index can easily be re-built. The current MongoDB crawl index has about 100k entries that are keyed on ~15 different columns. This is built for speed, as I am working on an online dating site (that is searchable in many ways).

I can think of a few options

Use SimpleDB for the user personalization store, and MongoDB for the index. Have the index replicate across all machines, however, I don't know too much about MongoDB replication.
Move everything to SimpleDB
Move everything to DynamoDB

I don't know too much about SimpleDB and/or DynamoDB. Based on articles it seems like DynamoDB would bew a natural choice, but I'm not sure about good perl support, whether I can have all columns, index, etc. Anyone have experience or have any advice?

Evaporimeter answered 19/2, 2012 at 22:57 Comment(0)

You could host Mongo on a single server on EC2 which each of the boxes in the web farm connect to. You can then easily spin up another web instance that uses the same DB box.

We currently have three Mongo servers as we run a replica set and when we get to the point where we need to scale horizontally with Mongo we'll spin up some new instances and shard the larger collections.

Toluate answered 19/2, 2012 at 23:30 Comment(5)

Thanks Joe! This is very insightful. So to clarify, is your Mongo data spread across 3 machines or is it replicated across 3 machines? This is a very good idea of separating the scaling of my web servers with my Mongo scaling. Is there any downtime when you scale your mongo servers? Do you have any pointers/links on good ways of scaling Mongo when you need more capacity? – Evaporimeter 20/2, 2012 at 0:13

We use 3 separate machines with a replica set, one master, two replicas. This is primarily for automatic fail-over so if our master dies, one of the replicas gets promoted to be the master. The data in each replica data is the same. – Toluate 20/2, 2012 at 1:17

When it comes to scaling the data you can use what is known as sharding. This is where your spread the data from one or more collections across multiple Mongo instances. This allows you to scale horizontally. Your data is spread across multiple physical machines, and a proxy tells Mongo where to go based on a shard key. We set up the infrastructure for this to make sure it all works, but don't currently have enough data to warrant using it, once we do we'll just boot up those instances. – Toluate 20/2, 2012 at 1:20

Thanks Joe! What instance size do you have, and what is the total cost per month for this 3 machine setuP? – Evaporimeter 21/2, 2012 at 8:2

We've actually just upgraded the machines to an XL instance, we're expecting pretty serious load when we go live so I doubt you need the same. – Toluate 22/2, 2012 at 22:15

I currently run my website on a single server with MongoDB.

First off, this is a big red flag. When running on production, it is always recommended to run a replica set with at least three full nodes.

Replication provides automatic redundancy and fail-over.

Ability to seamlessly add/remove web-servers without worry about losing data in the DB

MongoDB supports a concept called sharding. Sharding provides a way to scale horizontally by automatically partioning data. The partitioning is done via a shard key.

If you plan to use sharding, please read that link very carefully and recognize the limitations. For MongoDB sharding you have to select the correct key that will allow queries to be evenly distributed across the shards.

The current MongoDB crawl index has about 100k entries that are keyed on ~15 different columns.

This is going to be a problem with sharding. Sharding can only scale queries that use the shard key. A query on the shard key can be routed directly to a single machine. A query on a secondary index goes to all machines.

You have 15 different indexes, so basically all of these queries will go to all shards. That will not "auto-scale" very well at all.

Paschasia answered 20/2, 2012 at 19:6 Comment(0)

Beware that at the moment EC2 does not have 64 bit small instances, making replication potentially expensive. Because MongoDB memory maps files, a 32 bit OS is not advised.

Brooksbrookshire answered 21/2, 2012 at 16:41 Comment(1)

Last week Amazon finally announced 64 bit for all instance sizes and a new medium size – Brooksbrookshire 12/3, 2012 at 22:50

I've had very bad experiences with SimpleDB and think it's fundamentally flawed, so I would avoid it.

Three is a good white paper on how to set up MongoDB on Amazon EC2: http://d36cz9buwru1tt.cloudfront.net/AWS_NoSQL_MongoDB.pdf

I suspect setting up MongoDB on EC2 is the fastest solution versus rewriting-for/migrating-to DynamoDB.

Best of luck!

Uzbek answered 29/2, 2012 at 3:54 Comment(0)

Recommended topics

Hot tags