I am looking to build a web application which needs to run resource-intensive MCMC (Markov chain Monte Carlo) calculations on-demand in R to generate some probability graphs for the user.
Constraints:
Obviously I don't want to run the resource-intensive calculations on the same server as the web app front-end, so these tasks need to be handed off to a worker instance.
These calculations take a good amount of CPU to run and I'd like to keep latency as low as possible (hopefully seconds, not minutes), so I would prefer to run the calculations on beefier hardware.
I cannot afford to run a beefy EC2 instance at ~66¢/hr x 24hrs/day, so on-demand or spot request instances are probably necessary.
Here are the options I've come up with:
Run a cheap, affordable worker instance 24hrs a day which takes one task at a time managed by Amazon SWF (or SQS).
Cons:- high latency - Cheaper hardware, longer wait times.
- high latency - Cheaper hardware, longer wait times.
Spawn a beefier worker instance per-task (spun up whenever a job is added to the queue) and terminate the instance upon completion.
Cons:- expensive/wasteful - I'd be paying for an hour on the server
each time and only using seconds for my calculation
- startup overhead - Would spinning up a new EC2 instance
on-demand introduce non-negligible latency (offsetting the
whole purpose of utilizing beefier hardware)?
- expensive/wasteful - I'd be paying for an hour on the server
each time and only using seconds for my calculation
Like #2 but with low-bid EC2 spot requests.
Cons:- startup overhead - See #2
- inconsistancy? - I've never worked with spot requests before,
so I have no idea how volatile or hands-on such a solution
would be... do I have to continually adjust my bids to make
sure I can still get tasks done at peak hours? Also, I suppose
I'd have to monitor my processes closely to make sure they aren't
interrupted mid-calculation.
- startup overhead - See #2
Some kind of hybrid solution where I actively monitor beefy-hardware worker instances and their loads and intelligently spin up and terminate instances on the hour to maintain an optimal balance of cost and availability
Cons:- complicated and costly setup - Unless there's a good managed
service out there to handle stuff like this, I'd have to set all
all of that infrastructure up myself...
- complicated and costly setup - Unless there's a good managed
service out there to handle stuff like this, I'd have to set all
all of that infrastructure up myself...
I wish there was some service where I could pay for a highly-available on-demand hardware on a minute to minute basis rather than hourly.
So my questions are the following:
How would you recommend solving this problem?
Is there a good EC2 instance managing solution that could sit on top of Amazon SWF and help me load balance and terminate idle workers?
Would spot-request bids solve my problem or are they more suited to tasks which don't necessarily need to be completed right away?