Scheduling A Job on AWS EC2
Asked Answered
C

7

15

I have a website running on AWS EC2. I need to create a nightly job that generates a sitemap file and uploads the files to the various browsers. I'm looking for a utility on AWS that allows this functionality. I've considered the following:

1) Generate a request to the web server that triggers it to do this task

  • I don't like this approach because it ties up a server thread and uses cpu cycles on the host

2) Create a cron job on the machine the web server is running on to execute this task

  • Again, I don't like this approach because it takes cpu cycles away from the web server

3) Create another EC2 instance and set up a cron job to run the task

  • This solves the web server resource issues, but why pay for an additional EC2 instance to run a job for <5 minutes? Waste of money!

Are there any other options? Is this a job for ElasticMapReduce?

Challis answered 10/1, 2012 at 23:21 Comment(2)
It look like a function of you App, not server solutionShelburne
Right, which is why I ruled out items 1 & 2Challis
C
9

Amazon has just released[1] new features for Elastic Beanstalk. You can now create a worker environment containing cron.yaml that configures scheduling tasks calling an URL with the CRON syntax: http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html#worker-periodictasks

[1] http://aws.amazon.com/about-aws/whats-new/2015/02/17/aws-elastic-beanstalk-supports-environment-cloning-periodic-tasks-and-1-click-iam-role-creation/

Crompton answered 24/2, 2015 at 10:54 Comment(1)
Finally I can select a right answer for this! The other answers were good and would work, but I was really looking for a service from AWS to solve the problem.Challis
L
16

If I were in your shoes, I'd probably start by trying to run the cron job on the web server each night at low tide and monitor the resource usage to make sure it doesn't interfere with the web server.

If you find that it doesn't play nicely, or you have high standards for the elegance of your architecture (I can admire that), then you'll probably need to run a separate instance.

I agree that it seems like a waste to run an instance 24 hours a day for a job you only need to run once a night.

Here's one aproach: The cron job on your primary machine (currently a web server) could fire up a new instance to run the task. It could pass in a user-data script that gets run when the instance starts, and the instance could shut itself down when it completes the task (where instance-initiated-shutdown-behavior was set to "terminate").

Unfortunately, this misses your desire to enforce separation of concerns, it gets complicated when you start scaling to multiple web servers, and it requires your web server to be alive in order for the job to run.

A couple months ago, I came up with a different approach to run an instance on a cron schedule, relying entirely on existing AWS features and with no requirement to have other servers running.

The basic idea is to use Amazon's Auto Scaling with a recurring action that scales the group from "0" to "1" at a specific time each night. The instance can terminate itself when the job is done, and the Auto Scaling can clean up much later to make sure it's terminated.

I've provided more details and a working example in this article:

Running EC2 Instances on a Recurring Schedule with Auto Scaling
http://alestic.com/2011/11/ec2-schedule-instance

Lavina answered 10/1, 2012 at 23:57 Comment(4)
Well, this seems like a lot of jumping through hoops just for a simple job to run. It seems to me that there is a need that amazon isn't addressing: a cron-like service that runs an arbitrary command line job on any machine. They could charge based on the CPU + memory resources used. Thanks for your answer.Challis
Thanks for the very helpful edit. This seems like a pretty decent approach. I'm still somewhat mystified as to why AWS doesn't have something to support one off jobs out of the box. I'm imagining an interface where I can identify a custom program/script to run on a set schedule. Seems so basic!Challis
+1 for "Amazon should support this kind of basic operation out of box"Otes
AWS Lambda has this feature on its roadmap - https://mcmap.net/q/142533/-aws-lambda-scheduled-tasksNairn
C
9

Amazon has just released[1] new features for Elastic Beanstalk. You can now create a worker environment containing cron.yaml that configures scheduling tasks calling an URL with the CRON syntax: http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html#worker-periodictasks

[1] http://aws.amazon.com/about-aws/whats-new/2015/02/17/aws-elastic-beanstalk-supports-environment-cloning-periodic-tasks-and-1-click-iam-role-creation/

Crompton answered 24/2, 2015 at 10:54 Comment(1)
Finally I can select a right answer for this! The other answers were good and would work, but I was really looking for a service from AWS to solve the problem.Challis
F
2

Assuming you are running on a *nix version of EC2, I would suggest that you run it in cron using the nice command.

nice changes the priority of the job. You can make it a much lower priority, so if your webserver is busy, the cron job will have to wait for the CPU.

The higher the nice number, the lower the priority. Nicenesses range from -20 (most favorable scheduling) to 19 (least favorable).

Felipafelipe answered 27/1, 2013 at 22:22 Comment(0)
V
2

AWS DataPipeline

You can use AWS Data Pipeline to schedule a task with a given period. The action can be any command when you configure your Pipeline with the ShellCommandActivity.

You can even use your existing EC2 instance to run the command: Setup Task Runner on your EC2 instance and set the workerGroup field when setting the ShellCommandActivity (doc) on your DataPipeline:

{
 "pipelineId": "df-0937003356ZJEXAMPLE",
 "pipelineObjects": [
    {
      "id": "Schedule",
      "name": "Schedule",
      "fields": [
        { "key": "startDateTime", "stringValue": "2012-12-12T00:00:00" }, 
        { "key": "type", "stringValue": "Schedule" }, 
        { "key": "period", "stringValue": "1 hour" }, 
        { "key": "endDateTime", "stringValue": "2012-12-21T18:00:00"  }
       ]
     }, {
      "id": "DoSomething",
      "name": "DoSomething",
      "fields": [
        { "key": "type", "stringValue": "ShellCommandActivity" },
        { "key": "command", "stringValue": "echo hello" },
        { "key": "schedule", "refValue": "Schedule" },
        { "key": "workerGroup", "stringValue": "yourWorkerGroup" }
      ]
    }
  ]
}

Limits: Minimum scheduling interval is 15 minutes.
Pricing: About $1.00 per month.

Volkan answered 21/8, 2015 at 14:22 Comment(0)
W
1

You should consider CloudWatch Event and Lambda (http://docs.aws.amazon.com/AmazonCloudWatch/latest/events/RunLambdaSchedule.html). You only pay for the actual runs. I assume the workers maintained by Elastic beanstalk still cost some money even when they are idle.

Update: found this nice article (http://brianstempin.com/2016/02/29/replacing-the-cron-in-aws/)

Whitman answered 25/9, 2017 at 14:22 Comment(0)
C
0

If this task can be accomplished with one machine, i recommend booting up an instance programmatically using the fog gem written in ruby.

After you start an instance, you can run a command via ssh. Once completed you can shutdown with fog as well.

Amazon EMR is also a good solution if your task can be written in a map reduce manner. EMR will take care of starting/stopping instances. The elastic-mapreduce-ruby cli tool can help you automate it

Cunctation answered 10/1, 2012 at 23:28 Comment(2)
I guess I should have mentioned that my app is written in Java?Challis
This is an acceptable solution for those running Rails, but not for other languages.Decumbent
S
0

You can use AWS Opswork to setup cron jobs for your application. For more information read their user guide on AWS OpsWork. I found a page explaining how to setup cron jobs: http://docs.aws.amazon.com/opsworks/latest/userguide/workingcookbook-extend-cron.html

Stringency answered 26/2, 2014 at 16:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.