Amazon EC2 On-Demand Workers for Short Tasks

Asked 10/6, 2012 at 13:39 Answered 3/8, 2015 at 8:51

Solved r amazon-ec2 amazon-emr amazon-swf

I am looking to build a web application which needs to run resource-intensive MCMC (Markov chain Monte Carlo) calculations on-demand in R to generate some probability graphs for the user.

Constraints:

Obviously I don't want to run the resource-intensive calculations on the same server as the web app front-end, so these tasks need to be handed off to a worker instance.
These calculations take a good amount of CPU to run and I'd like to keep latency as low as possible (hopefully seconds, not minutes), so I would prefer to run the calculations on beefier hardware.
I cannot afford to run a beefy EC2 instance at ~66¢/hr x 24hrs/day, so on-demand or spot request instances are probably necessary.

Here are the options I've come up with:

Run a cheap, affordable worker instance 24hrs a day which takes one task at a time managed by Amazon SWF (or SQS).

Cons:
- high latency - Cheaper hardware, longer wait times.
Spawn a beefier worker instance per-task (spun up whenever a job is added to the queue) and terminate the instance upon completion.

Cons:
- expensive/wasteful - I'd be paying for an hour on the server each time and only using seconds for my calculation
- startup overhead - Would spinning up a new EC2 instance on-demand introduce non-negligible latency (offsetting the whole purpose of utilizing beefier hardware)?
Like #2 but with low-bid EC2 spot requests.

Cons:
- startup overhead - See #2
- inconsistancy? - I've never worked with spot requests before, so I have no idea how volatile or hands-on such a solution would be... do I have to continually adjust my bids to make sure I can still get tasks done at peak hours? Also, I suppose I'd have to monitor my processes closely to make sure they aren't interrupted mid-calculation.
Some kind of hybrid solution where I actively monitor beefy-hardware worker instances and their loads and intelligently spin up and terminate instances on the hour to maintain an optimal balance of cost and availability

Cons:
- complicated and costly setup - Unless there's a good managed service out there to handle stuff like this, I'd have to set all all of that infrastructure up myself...

I wish there was some service where I could pay for a highly-available on-demand hardware on a minute to minute basis rather than hourly.

So my questions are the following:

How would you recommend solving this problem?
Is there a good EC2 instance managing solution that could sit on top of Amazon SWF and help me load balance and terminate idle workers?
Would spot-request bids solve my problem or are they more suited to tasks which don't necessarily need to be completed right away?

Loosejointed answered 10/6, 2012 at 13:39 Comment(6)

If it is possible to implement your MCMC in Python, you could use Google App Engine. App Engine billing is mostly based on resources your application actually uses, instead of EC2's instance renting. – Lazor 10/6, 2012 at 17:7

Unfortunately, I'm pretty stuck on R... it handles most of the heavy lifting and the guys I'm working with are math people, not programmers, so it would fall to me to maintain it. – Loosejointed 10/6, 2012 at 18:43

There is also the renjin project, which aims to be a JVM R interpreter. Eventually, you should be able to use it to run R on a Java App Engine, but I don't know how stable it is at the moment. – Lazor 10/6, 2012 at 18:57

Is it possible for you to prepare them in advance? Rather than on demand? (ie, scheduled release times of a battery of options)? – Combustor 11/6, 2012 at 3:38

Unfortunately, no. The way the application works is the user will enter all of their data for their latest calculation in the Markov chain, submit a form, and expect a result. I could always give them a message saying check back in 15 minutes, but I was hoping for something more immediate. I suppose I could analyze times during the day where the service is utilized heavily and schedule uptime for the worker instance around that, but it's usage is fairly unpredictable and sporadic. – Loosejointed 11/6, 2012 at 13:16

If going with #2, you might be able to ameliorate the second drawback by starting up an instance when the user visits the site. However, this makes the solution a bit more wasteful (if the user decides not to submit anything). – Gaiseric 18/6, 2012 at 18:22

There's another option that you may not be aware of. I actually just stumbled upon it: http://multyvac.com

I have no experience using it (so I can't vouch for it), but it looks like the first solution I've seen that actually offers true "utility computing". It began with just Python but now supports any language.

Sourdough answered 21/6, 2012 at 3:17 Comment(3)

This looks perfect! Computing tasks prorated to the millisecond... I'm gonna play around with it and see if it'll run my application. – Loosejointed 21/6, 2012 at 8:26

Excellent. I'd be interested in hearing a follow-up to see if it's any good. – Sourdough 29/6, 2012 at 18:20

I've edited this answer - it looks like PiCloud is shut down. I think I found the right replacement: docs.multyvac.com/introduction.html#relation-to-picloud – Highpriced 3/8, 2015 at 8:59

I wish there was some service where I could pay for a highly-available on-demand hardware on a minute to minute basis rather than hourly.

That service is AWS Lambda, which wasn't available when you asked the question:

Lambda runs your code on high-availability compute infrastructure and performs all the administration of the compute resources, including server and operating system maintenance, capacity provisioning and automatic scaling

Pricing:

You are charged based on the number of requests for your functions and the time your code executes

Duration is calculated from the time your code begins executing until it returns or otherwise terminates, rounded up to the nearest 100ms.

The Lambda free tier includes 1M free requests per month and 400,000 GB-seconds of compute time per month.

You can also wrap a Lambda function with an HTTP endpoint, possibly removing this layer from your application:

You can invoke a Lambda function over HTTPS by defining a custom RESTful API using Amazon API Gateway. This gives you an endpoint for your function which can respond to REST calls like GET, PUT and POST. Read more about using AWS Lambda with Amazon API Gateway.

Caveat: Lambda currently supports only JavaScript, Java, and Python, so I'm not sure how you would get R to work. You may need to host R in one of these runtimes.

Highpriced answered 3/8, 2015 at 8:51 Comment(0)

There's another option that you may not be aware of. I actually just stumbled upon it: http://multyvac.com

Sourdough answered 21/6, 2012 at 3:17 Comment(3)

This looks perfect! Computing tasks prorated to the millisecond... I'm gonna play around with it and see if it'll run my application. – Loosejointed 21/6, 2012 at 8:26

Excellent. I'd be interested in hearing a follow-up to see if it's any good. – Sourdough 29/6, 2012 at 18:20

I've edited this answer - it looks like PiCloud is shut down. I think I found the right replacement: docs.multyvac.com/introduction.html#relation-to-picloud – Highpriced 3/8, 2015 at 8:59

Recommended topics

Hot tags