Will using a Cloud PaaS automatically solve scalability issues?
Asked Answered
R

1

12

I'm currently looking for a Cloud PaaS that will allow me to scale an application to handle anything between 1 user and 10 Million+ users ... I've never worked on anything this big and the big question that I can't seem to get a clear answer for is that if you develop, let's say a standard application with a relational database and soap-webservices, will this application scale automatically when deployed on a Paas solution or do you still need to build the application with fall-over, redundancy and all those things in mind?

Let's say I deploy a Spring Hibernate application to Amazon EC2 and I create single instance of Ubuntu Server with Tomcat installed, will this application just scale indefinitely or do I need more Ubuntu instances? If more than one Ubuntu instance is needed, does Amazon take care of running the application over both instances or is this the developer's responsibility? What about database storage, can I install a database on EC2 that will scale as the database grow or do I need to use one of their APIs instead if I want it to scale indefinitely?

CloudFoundry allows you to build locally and just deploy straight to their PaaS, but since it's in beta, there's a limit on the amount of resources you can use and databases are limited to 128MB if I remember correctly, so this a no-go for now. Some have suggested installing CloudFoundry on Amazon EC2, how does it scale and how is the database layer handled then?

GAE (Google App Engine), will this allow me to just deploy an app and not have to worry about how it scales and implements redundancy? There appears to be some limitations one what you can and can't run on GAE and their price increase recently upset quite a large number of developers, is it really that expensive compared to other providers?

So basically, will it scale and what needs to be done to make it scale?

Ratliff answered 25/3, 2012 at 7:55 Comment(4)
I think that scalability is never magical, and always require a lot of work (especially when considering many millions of users).Hovey
Let's say you have two months to build such an application and take care of the scalability at the same time, which provider will allow you to demo the app on your mac / desktop and then the next day pull a switch to load 10M+ users seamlessly?Ratliff
@JanVladimirMostert - AFAIK only AppEngine will scale seamlessly, as in: you upload the code and it does the rest.Wirer
@BasileStarynkevitch - GAE does scalability automagically. To achieve this it forces you to write your code against their proprietary and restricted API.Wirer
W
12

That's a lot of questions for one post. Anyway:

  1. Amazon EC2 does not scale automatically with load. EC2 is basically just a virtual machine. You can achieve scaling of EC2 instances with Auto Scaling and Elastic Load Balancing.

  2. SQL databases scale poorly. That's why people started using NoSQL databases in the first place. It's best to see which database your cloud provider offers as a managed service: Datastore on GAE and DynamoDB on Amazon.

  3. Installing your own database on EC2 instances is very impractical as EC2 has ephemeral storage (it looses all data on "disk" when it reboots).

  4. GAE Datastore is actually a one big database for all applications running on it. So it's pretty scalable - your million of users should not be a problem for it. http://highscalability.com/blog/2011/1/11/google-megastore-3-billion-writes-and-20-billion-read-transa.html

  5. Yes App Engine scales automatically, both frontend instances and database. There is nothing special you need to do to make it scale, just use their API.

  6. There are limitations what you can do with AppEngine:

    A. No local storage (filesystem) - you need to use Datastore or Blobstore.

    B. Comet is only supported via their proprietary Channels API

    C. Datastore is a NoSQL database: no JOINs, limited queries, limited transactions.

  7. Cost of GAE is not bad. We do 1M requests a day for about 5 dollars a day. The biggest saving comes from the fact that you do not need a system admin on GAE ( but you do need one for EC2). Compared to the cost of manpower GAE is incredibly cheap.

Some hints to save money (an speed up) GAE:

A. Use get instead of query in Datastore (requires carefully crafting natiral keys).

B. Use memcache to cache data you got form datastore. This can be done automatically with objectify and it's @Cached annotation.

C. Denormalize data. Meaning you write data redundantly in various places in order to get to it in as few operations as possible.

D. If you have a lot of REST requests from devices, where you do not use cookies, then switch off session support ( or roll your own as we did). Sessions use datastore under the hood and for every request it does get and put.

E. Read about adjusting app settings. Try different settings (depending how tolerant your app is to request delay and your traffic patterns/spikes). We were able to cut down frontend instances by 70%.

Wirer answered 25/3, 2012 at 8:35 Comment(4)
Let's say I go for Elastic Load Balancing, how do I scale the database layer from which each instance gets their data or will RDS do exactly that for me?Ratliff
No, running your own database on EC2 is VERY hard: 1. Ephemeral storage is volatile, 2. Block storage is slow. 3. You have to setup database in distrubuted setup on your own.Wirer
Shorter: Amazon will not scale your database automatically. You need to use one of their database services to achieve this.Wirer
That explains it perfectly, so goodbye relational databases ... Guess Google App Engine is the better option to develop on even thought it appears to be more expensive and locks you into their platform. Until CloudFoundry goes out of beta, working via an API seems to be the only option. Thanks Peter!Ratliff

© 2022 - 2024 — McMap. All rights reserved.