AWS Elastic Beanstalk, running a cronjob
Asked Answered
C

19

97

I would like to know if there is a way to setup a cronjob/task to execute every minute. Currently any of my instances should be able to run this task.

This is what I have tried to do in the config files without success:

container_commands:
  01cronjobs:
    command: echo "*/1 * * * * root php /etc/httpd/myscript.php"

I'm not really sure if this is the correct way to do it

Any ideas?

Colorless answered 28/12, 2012 at 23:30 Comment(2)
Is the command right? I mean... it could be: command: echo "*/1 * * * * root php /etc/httpd/myscript.php" > /etc/cron.d/something Either way, I'd suggest you use the leader_only flag, otherwise all machines will fire up this cron job at onceDumpish
Yes! definitely using the leader_only flag, I'll try changing the command.Colorless
G
109

This is how I added a cron job to Elastic Beanstalk:

Create a folder at the root of your application called .ebextensions if it doesn't exist already. Then create a config file inside the .ebextensions folder. I'll use example.config for illustration purposes. Then add this to example.config

container_commands:
  01_some_cron_job:
    command: "cat .ebextensions/some_cron_job.txt > /etc/cron.d/some_cron_job && chmod 644 /etc/cron.d/some_cron_job"
    leader_only: true

This is a YAML configuration file for Elastic Beanstalk. Make sure when you copy this into your text editor that your text editor uses spaces instead of tabs. Otherwise you'll get a YAML error when you push this to EB.

So what this does is create a command called 01_some_cron_job. Commands are run in alphabetical order so the 01 makes sure it's run as the first command.

The command then takes the contents of a file called some_cron_job.txt and adds it to a file called some_cron_job in /etc/cron.d.

The command then changes the permissions on the /etc/cron.d/some_cron_job file.

The leader_only key ensures the command is only run on the ec2 instance that is considered the leader. Rather than running on every ec2 instance you may have running.

Then create a file called some_cron_job.txt inside the .ebextensions folder. You will place your cron jobs in this file.

So for example:

# The newline at the end of this file is extremely important.  Cron won't run without it.
* * * * * root /usr/bin/php some-php-script-here > /dev/null

So this cron job will run every minute of every hour of every day as the root user and discard the output to /dev/null. /usr/bin/php is the path to php. Then replace some-php-script-here with the path to your php file. This is obviously assuming your cron job needs to run a PHP file.

Also, make sure the some_cron_job.txt file has a newline at the end of the file just like the comment says. Otherwise cron won't run.

Update: There is an issue with this solution when Elastic Beanstalk scales up your instances. For example, lets say you have one instance with the cron job running. You get an increase in traffic so Elastic Beanstalk scales you up to two instances. The leader_only will ensure you only have one cron job running between the two instances. Your traffic decreases and Elastic Beanstalk scales you down to one instance. But instead of terminating the second instance, Elastic Beanstalk terminates the first instance that was the leader. You now don't have any cron jobs running since they were only running on the first instance that was terminated. See the comments below.

Update 2: Just making this clear from the comments below: AWS has now protection against automatic instance termination. Just enable it on your leader instance and you're good to go. – Nicolás Arévalo Oct 28 '16 at 9:23

Gemina answered 5/3, 2013 at 20:50 Comment(11)
I've been using your suggestion for some time, and recently ran into an issue where somehow the leader switched, resulting in multiple instances running the cron. To solve that issue, I changed 01_some_cron_job to 02_some_cron_job and added 01_remove_cron_jobs with the following: command: "rm /etc/cron.d/cron_jobs || exit 0". That way, after every deployment only the leader will have the cron_jobs file. If leaders change, you can just redeploy and the crons will be fixed to run just once again.Nightfall
I would suggest against relying on leader_only property. It is only used during deployment and if you scale down or your "leader" instance fails you are bound to have issues referenceRobbierobbin
I agree, I have yet to come up with a good automated solution to that problem. Currently, I have my cron jobs email me on an regular basis so I can be sure they are still running. Hardly ideal. The best solution is to realize that cron jobs are a means to an end, not a goal in and of themselves. Reworking your application logic to use something designed for scalable systems (such as SWF) is the best long term plan.Nightfall
Wouldn't you still need something like a cron just to trigger a call to SWF? How do you get a worker to poll SWF for tasks to run without user interaction? I'm either confused by AWS' offerings or they still just don't have a good way of running scheduled scripts without resorting to cron like this.Pitchblende
@James I still use crons to spawn the workers, but, once they are running, SWF takes care of making sure each task is only performed once. I have my cron set up to spawn a couple of each type of worker every minute, and has them poll for a minute - this makes it easy to shut them down (if they are short tasks anyway) and then start them back up. Other solutions depending on your programming languages and other resources work as well. I simply like the cron technique for my workers as it makes sure they keep running in case of instance replacement.Nightfall
Don't do this. It's too unreliable. The only way I got this to work is by running up a micro instance and running cron jobs from there using CURL. This guarantees that only one instance runs it and the leader that has crons installed isn't terminated.Daynadays
regarding the problem of multiple instances. You should really only use scaleable cron jobs in your instances (like sending logs files etc. from your instances). If you have a non scaleable job i.e. run one instance per application, then you may be better setting up a separate worker application with the cron jobs.Highams
I tried to fix this with a small ruby script, you can find it here: github.com/SocialbitGmbH/AWSBeanstalkLeaderManagerHobnob
Answer should be switched to: https://mcmap.net/q/217134/-aws-elastic-beanstalk-running-a-cronjobEructate
It seems a little OTT to set up a completely separate worker server to run simple cron jobs. The leader solution seems to work for scaling up, scaling back down is the real issue - is this solved by adding termination protection to your first server? Will it always remain the leader?Trattoria
AWS has now protection against automatic instance termination. Just enable it on your leader instance and you're good to go.Shelby
I
59

This is the official way to do it now (2015+). Please try this first, it's by far easiest method currently available and most reliable as well.

According to current docs, one is able to run periodic tasks on their so-called worker tier.

Citing the documentation:

AWS Elastic Beanstalk supports periodic tasks for worker environment tiers in environments running a predefined configuration with a solution stack that contains "v1.2.0" in the container name. You must create a new environment.

Also interesting is the part about cron.yaml:

To invoke periodic tasks, your application source bundle must include a cron.yaml file at the root level. The file must contain information about the periodic tasks you want to schedule. Specify this information using standard crontab syntax.

Update: We were able to get this work. Here are some important gotchas from our experience (Node.js platform):

  • When using cron.yaml file, make sure you have latest awsebcli, because older versions will not work properly.
  • It is also vital to create new environment (at least in our case it was), not just clone old one.
  • If you want to make sure CRON is supported on your EC2 Worker Tier instance, ssh into it (eb ssh), and run cat /var/log/aws-sqsd/default.log. It should report as aws-sqsd 2.0 (2015-02-18). If you don't have 2.0 version, something gone wrong when creating your environment and you need to create new one as stated above.
Inset answered 25/2, 2015 at 12:48 Comment(22)
About cron.yaml, there is a awesome blog post : Running cron jobs on Amazon Web Services (AWS) Elastic Beanstalk — MediumPhoney
I added some comments which might be useful when trying it.Inset
Thanks for this - rookie question - I need my cron to check my web app's database twice an hour for upcoming calendar events, and send a reminder email when it does. What's the best setup here, should I have the cron.yaml URL point to a route on my Web app? Or should I give my worker env app access to the database? So little out there on this!Vast
@Vast The way we do it, we have the same app running in two different environments (thus no special config needed) - worker and common web server one. The worker environment has some special routes enabled by setting an ENV variable which our app looks for. This way, you can set special worker-only routes in your cron.yaml while having the luxury of shared codebase with the normal app. Your worker app can easily access the same resources as web server one: database, models, etc.Inset
What happens when scaling up? Multiple instances will run the same task?Yulma
@Yulma No, since it's based on SQS, the first instance to pick the message is the one who will run it.Inset
@Inset I thought cron.yaml is totally independent from SQS, is it not?Yulma
@Yulma According to aws.amazon.com/blogs/aws/category/aws-elastic-beanstalk it uses SQS. "You can now configure Elastic Beanstalk to send messages to a queue periodically. The message is delivered as an HTTP POST to a configurable URL on the local host; the HTTP header will contain the name of the periodic task".Rural
Hey, I'm struggling for days to make it work without success. When creating the new environment. What does "environments running a predefined configuration with a solution stack that contains "v1.2.0" in the container name." means? Where do I have to fill this information? I'm using django and Python. Cron: version: 1 cron: - name: "delete_expired_files" url: "myapp/management/commands/delete_expired_files" schedule: "* * * * *" What might be wrong? Do I have to put the entire path on the url? Please help me with this... thank youEscritoire
@JaquelinePassos v1.2.0 is solution stack version. It should let you choose which version of solution stack you want to create when creating new environment. Anything newer than v1.2.0 should do. Regarding the URL, it should be the URL which your application listens on, not a file path. It is not possible to run Django management commands, it only does HTTP requests.Inset
I could set the cron to load, it's being scheduled by AWS, however it's giving me an 500 error... maybe that's the reason... but how can I do it, then?Escritoire
@JaquelinePassos Well, that really could be the reason why it doesn't work :) You need to set up the HTTP endpoints in your application as I described in previous post. And then debug what's going on, try to call these endpoints when running your app locally. But this is kind of out of scope for this SO question. You might create a new one if you believe it would help someone else too.Inset
The new topic is here: #34830928. I'll try to log something inside my function to find out what's going on. Thank you.Escritoire
@JaquelinePassos Great. Please upvote this answer if it pointed you to the right direction. I have submitted my suggestion how to resolve your problems to the new topic.Inset
One thing that's not clear to me is if there is a way to avoid having to allocate an extra EC2 machine just to run the cron jobs via cron.yaml. Ideally it would run on the same machine as the one that is servicing HTTP requests (i.e. web tier).Mallina
@WenzelJakob As far as I know, that is not possible. AWS cron is specifically designed for worker instances which on the other hand cannot serve normal HTTP requests as they do not face internet directly.Inset
But if this should be a publicly available endpoint ... doesn't raise questions about security? Or the workers are not available from the internet? (but the webservers hosting the same codebase will so ...)Feed
@Feed Workers do not face internet directly. For web tier with shared codebase, we use simple env variable switch to indicate when to enable worker routes (on worker tier instances) and when not to (on web tier instances).Inset
@Inset somewhat off topic, but for long-running jobs how do you generally handle the POST without TCP connection timeouts? I'm running Django, and I don't really want to go into the asynchronous rabbit hole that is Celery for a simple cron job.Tagalog
@Tagalog Sorry can't help with that, we are already in asynchronous rabbit hole :)Inset
I'm just a little confused about how to manage the worker tier once it's been built. Does that imply that when we deploy our app we now have to deploy twice? Once for our regular app and once for our worker tier?Ormandy
It's now 2019. I'd like to refer you to aws.amazon.com/premiumsupport/knowledge-center/…. Seems AWS has deprecated the cron.yaml approach for some reason.Aspa
G
32

Regarding jamieb's response, and as alrdinleal mentions, you can use the 'leader_only' property to ensure that only one EC2 instance runs the cron job.

Quote taken from http://docs.amazonwebservices.com/elasticbeanstalk/latest/dg/customize-containers-ec2.html:

you can use leader_only. One instance is chosen to be the leader in an Auto Scaling group. If the leader_only value is set to true, the command runs only on the instance that is marked as the leader.

Im trying to achieve a similar thing on my eb, so will update my post if I solve it.

UPDATE:

Ok, I now have working cronjobs using the following eb config:

files:
  "/tmp/cronjob" :
    mode: "000777"
    owner: ec2-user
    group: ec2-user
    content: |
      # clear expired baskets
      */10 * * * * /usr/bin/wget -o /dev/null http://blah.elasticbeanstalk.com/basket/purge > $HOME/basket_purge.log 2>&1
      # clean up files created by above cronjob
      30 23 * * * rm $HOME/purge*
    encoding: plain 
container_commands:
  purge_basket: 
    command: crontab /tmp/cronjob
    leader_only: true
commands:
  delete_cronjob_file: 
    command: rm /tmp/cronjob

Essentially, I create a temp file with the cronjobs and then set the crontab to read from the temp file, then delete the temp file afterwards. Hope this helps.

Geisler answered 1/1, 2013 at 13:44 Comment(5)
How would you ensure that the instance running this crontab does not get terminated by the auto scaling ? By default, it terminates the oldest instance.Krugersdorp
That's an issue I havent yet been able to solve. It strikes me as a flaw in amazon's functionality that leader_only commands are not applied to a new leader when the current one is terminated by EB. If you come up with something please do share!Geisler
So I (finally) discovered how to prevent the leader from being terminated by auto-scaling - custom auto-scaling termination policies. See docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/…Geisler
@Nate You've probably figured this out by now, but based on my reading of the order that these run in, "commands" run before "container_commands" so you would create the file, then delete it, then try to run the crontab.Pliable
@Krugersdorp in order to keep the oldest intance, here is what I do: 1 - change the termination protection of the intance to ENBABLE. 2 - Go to Auto Scale Group and find your EBS Environment ID, click EDIT and change the Termination Policies to "NewestInstance"Campy
C
14

I spoke to an AWS support agent and this is how we got this to work for me. 2015 solution:

Create a file in your .ebextensions directory with your_file_name.config. In the config file input:

files:
  "/etc/cron.d/cron_example":
    mode: "000644"
    owner: root
    group: root
    content: |
      * * * * * root /usr/local/bin/cron_example.sh

  "/usr/local/bin/cron_example.sh":
    mode: "000755"
    owner: root
    group: root
    content: |
      #!/bin/bash

      /usr/local/bin/test_cron.sh || exit
      echo "Cron running at " `date` >> /tmp/cron_example.log
      # Now do tasks that should only run on 1 instance ...

  "/usr/local/bin/test_cron.sh":
    mode: "000755"
    owner: root
    group: root
    content: |
      #!/bin/bash

      METADATA=/opt/aws/bin/ec2-metadata
      INSTANCE_ID=`$METADATA -i | awk '{print $2}'`
      REGION=`$METADATA -z | awk '{print substr($2, 0, length($2)-1)}'`

      # Find our Auto Scaling Group name.
      ASG=`aws ec2 describe-tags --filters "Name=resource-id,Values=$INSTANCE_ID" \
        --region $REGION --output text | awk '/aws:autoscaling:groupName/ {print $5}'`

      # Find the first instance in the Group
      FIRST=`aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names $ASG \
        --region $REGION --output text | awk '/InService$/ {print $4}' | sort | head -1`

      # Test if they're the same.
      [ "$FIRST" = "$INSTANCE_ID" ]

commands:
  rm_old_cron:
    command: "rm *.bak"
    cwd: "/etc/cron.d"
    ignoreErrors: true

This solution has 2 drawbacks:

  1. On subsequent deployments, Beanstalk renames the existing cron script as .bak, but cron will still run it. Your Cron now executes twice on the same machine.
  2. If your environment scales up, you get several instances, all running your cron script. This means your mail shots are repeated, or your database archives duplicated

Workaround:

  1. Ensure any .ebextensions script which creates a cron also removes the .bak files on subsequent deployments.
  2. Have a helper script which does the following: -- Gets the current Instance ID from the Metadata -- Gets the current Auto Scaling Group name from the EC2 Tags -- Gets the list of EC2 Instances in that Group, sorted alphabetically. -- Takes the first instance from that list. -- Compares the Instance ID from step 1 with the first Instance ID from step 4. Your cron scripts can then use this helper script to determine if they should execute.

Caveat:

  • The IAM Role used for the Beanstalk instances needs ec2:DescribeTags and autoscaling:DescribeAutoScalingGroups permissions
  • The instances chosen from are those shown as InService by Auto Scaling. This does not necessarily mean they are fully booted up and ready to run your cron.

You would not have to set the IAM Roles if you are using the default beanstalk role.

Cistercian answered 26/9, 2015 at 8:23 Comment(0)
T
12

As mentioned above, the fundamental flaw with establishing any crontab configuration is that it only happens at deployment. As the cluster gets auto-scaled up, and then back down, it is favored to also be the first server turned off. In addition there would be no fail-over, which for me was critical.

I did some research, then talked with our AWS account specialist to bounce ideas and valid the solution I came up with. You can accomplish this with OpsWorks, although it's bit like using a house to kill a fly. It is also possible to use Data Pipeline with Task Runner, but this has limited ability in the scripts that it can execute, and I needed to be able to run PHP scripts, with access to the whole code base. You could also dedicate an EC2 instance outside of the ElasticBeanstalk cluster, but then you have no fail-over again.

So here is what I came up with, which apparently is unconventional (as the AWS rep commented) and may be considered a hack, but it works and is solid with fail-over. I chose a coding solution using the SDK, which I'll show in PHP, although you could do the same method in any language you prefer.

// contains the values for variables used (key, secret, env)
require_once('cron_config.inc'); 

// Load the AWS PHP SDK to connection to ElasticBeanstalk
use Aws\ElasticBeanstalk\ElasticBeanstalkClient;

$client = ElasticBeanstalkClient::factory(array(
    'key' => AWS_KEY,
    'secret' => AWS_SECRET,
    'profile' => 'your_profile',
    'region'  => 'us-east-1'
));

$result = $client->describeEnvironmentResources(array(
    'EnvironmentName' => AWS_ENV
));

if (php_uname('n') != $result['EnvironmentResources']['Instances'][0]['Id']) {
    die("Not the primary EC2 instance\n");
}

So walking through this and how it operates... You call scripts from crontab as you normally would on every EC2 instance. Each script includes this at the beginning (or includes a single file for each, as I use it), which establishes an ElasticBeanstalk object and retrieves a list of all instances. It uses only the first server in the list, and checks if it matches itself, which if it does it continues, otherwise it dies and closes out. I've checked and the list returned seems to be consistent, which technically it only needs to be consistent for a minute or so, as each instance executes the scheduled cron. If it does change, it wouldn't matter, since again it only is relevant for that small window.

This isn't elegant by any means, but suited our specific needs - which was not to increase cost with an additional service or have to have a dedicated EC2 instance, and would have fail-over in case of any failure. Our cron scripts run maintenance scripts which get placed into SQS and each server in the cluster helps execute. At least this may give you an alternate option if it fits your needs.

-Davey

Telefilm answered 17/7, 2014 at 18:8 Comment(3)
I found that php_uname('n') returns the private DNS name (e.g. ip-172.24.55.66), which is not the instance ID that you're looking for. Instead of using php_uname(), I ended up using this: $instanceId = file_get_contents("http://instance-data/latest/meta-data/instance-id"); Then just use that $instanceId var to do the comparison.Underlinen
Is there any guarantee that the Instances array presents the same ordering on each Describe call? I would suggest to extract the ['Id'] field of each entry into an array, and sort them in PHP, before you check if the first sorted entry is your current instanceId.Tyburn
Based on this answer I made this solution: #14077595 - it's very similar but has NO chance of double execution.C
G
7

If you're using Rails, you can use the whenever-elasticbeanstalk gem. It allows you to run cron jobs on either all instances or just one. It checks every minute to ensure that there is only one "leader" instance, and will automatically promote one server to "leader" if there are none. This is needed since Elastic Beanstalk only has the concept of leader during deployment and may shut down any instance at any time while scaling.

UPDATE I switched to using AWS OpsWorks and am no longer maintaining this gem. If you need more functionality than is available in the basics of Elastic Beanstalk, I highly recommend switching to OpsWorks.

Gastrocnemius answered 24/6, 2013 at 21:58 Comment(3)
Would you mind telling us how you solved it using OpsWorks? Are you running custom layers that does the cron-jobs?Belgium
Yeah, I have an admin/cron layer that only runs on one server. I set up a custom cookbook that holds all of my cron jobs. AWS has a guide at docs.aws.amazon.com/opsworks/latest/userguide/….Gastrocnemius
@Gastrocnemius if you assign one server for run cron jobs using OpsWorks, the same thing using Elastic Beanstalk, I can use a environment with one server to run cron jobs. Even with Load Balancer, max and min instances set to one, to conserve always a server instance at least.Levana
K
6

You really don't want to be running cron jobs on Elastic Beanstalk. Since you'll have multiple application instances, this can cause race conditions and other odd problems. I actually recently blogged about this (4th or 5th tip down the page). The short version: Depending on the application, use a job queue like SQS or a third-party solution like iron.io.

Kirman answered 31/12, 2012 at 19:6 Comment(4)
SQS does not guaranty the code will only be run once. I like the iron.io site, I'm going to check it out.Rasure
Also in your blog post you recommend using InnoDB on RDS. I use a table on RDS to store my tasks and use the InnoDB "SELECT...FOR UPDATE" feature to make sure only one server runs those tasks. How does your app contact SQS without a cron job or user interaction?Pitchblende
@JamesAlday This SO question is pretty old. Since I wrote the above comment, AWS introduced an elegant way to handle cron jobs on Elastic Beanstalk by electing one of the running servers as a master. Having said that, it sounds like you're misusing cron + MySQL as a job queue. I would need to know a lot about your app before I could offer concrete recommendations though.Kirman
I have a script that runs on via cron which checks a table for jobs to be run. Using transactions prevents multiple servers from running the same job. I've looked into SQS but you need a master server that runs all scripts instead of distributing it and you still need to write logic to ensure you don't run the same script multiple times. But I'm still confused about how you get tasks to run without user interaction or cron - what triggers your app to run the tasks in queue?Pitchblende
R
6

2017: If you are using Laravel5+

You just need 2 minutes to configure it:

  • create a Worker Tier
  • install laravel-aws-worker

    composer require dusterio/laravel-aws-worker

  • add a cron.yaml to the root folder:

Add cron.yaml to the root folder of your application (this can be a part of your repo or you could add this file right before deploying to EB - the important thing is that this file is present at the time of deployment):

version: 1
cron:
 - name: "schedule"
   url: "/worker/schedule"
   schedule: "* * * * *"

That's it!

All your task in App\Console\Kernel will now be executed

Detailed instructions and explainations: https://github.com/dusterio/laravel-aws-worker

How to write tasks inside of Laravel: https://laravel.com/docs/5.4/scheduling

Rhody answered 24/2, 2017 at 11:25 Comment(2)
I think with a -> schedule: "* * * * *" <- you will kill your machineGoda
@jasson-rojas it’s equivalent to once per minute, check the dock into the link, worker servers are made for it.Rhody
P
3

A more readable solution using files instead of container_commands:

files:
  "/etc/cron.d/my_cron":
    mode: "000644"
    owner: root
    group: root
    content: |
      # override default email address
      MAILTO="[email protected]"
      # run a Symfony command every five minutes (as ec2-user)
      */10 * * * * ec2-user /usr/bin/php /var/app/current/app/console do:something
    encoding: plain
commands:
  # delete backup file created by Elastic Beanstalk
  clear_cron_backup:
    command: rm -f /etc/cron.d/watson.bak

Note the format differs from the usual crontab format in that it specifies the user to run the command as.

Pooler answered 27/3, 2014 at 13:11 Comment(1)
One issue here is that Elastic Beanstalk EC2 instances don't have SMTP services set up by default so the MAILTO option here might not work.Globose
C
3

My 1 cent of contribution for 2018

Here is the right way to do it (using django/python and django_crontab app):

inside .ebextensions folder create a file like this 98_cron.config:

files:
  "/tmp/98_create_cron.sh":
    mode: "000755"
    owner: root
    group: root
    content: |
      #!/bin/sh
      cd /
      sudo /opt/python/run/venv/bin/python /opt/python/current/app/manage.py crontab remove > /home/ec2-user/remove11.txt
      sudo /opt/python/run/venv/bin/python /opt/python/current/app/manage.py crontab add > /home/ec2-user/add11.txt 

container_commands:
    98crontab:
        command: "mv /tmp/98_create_cron.sh /opt/elasticbeanstalk/hooks/appdeploy/post && chmod 774 /opt/elasticbeanstalk/hooks/appdeploy/post/98_create_cron.sh"
        leader_only: true

It needs to be container_commands instead of commands

Campy answered 16/3, 2018 at 16:40 Comment(4)
hi Ronaldo, One question: /home/ec2-user/remove11.txt and the /home/ec2-user/add11.txt part of the code should be replaced by anything like my ec2 authorised user and/or any kind of other .txt file? Also... it duplicates the cron when other instance is added and deprecate the leader one?Lashkar
Hi @PolFrances it should replace, that's correct. It doesn't duplicate because only runs on leader instance. But I no longer user this method to run assent tasks. I've moved to celery which is much more reliable.Campy
Oh cool, do you have any tutorial or first steps to make celery work on elasticbeanstalk?Lashkar
@PolFrances this is good starting point: #41162191Campy
S
3

The latest example from Amazon is the easiest and most efficient (periodic tasks):

https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html

where you create a separate worker tier to execute any of your cron jobs. Create the cron.yaml file and place it in your root folder. One issue I had (after cron did not seem to be executing) was finding that my CodePipeline did not have authority to perform a dynamodb modification. Based on that after adding FullDynamoDB access under IAM -> roles -> yourpipeline and redeploying (elastic beanstalk) it worked perfectly.

Shalne answered 29/12, 2019 at 14:2 Comment(1)
Also see the cron.yaml in the AWS EB python sample application found here: python.zipAristaeus
E
2

Someone was wondering about the leader_only auto scaling problems when new leaders arise. I can't seem to figure out how to reply to their comments, but see this link: http://blog.paulopoiati.com/2013/08/25/running-cron-in-elastic-beanstalk-auto-scaling-environment/

Eructate answered 20/10, 2013 at 5:5 Comment(0)
D
2

So we've been struggling with this for a while and after some discussion with an AWS rep I've finally come up with what I think is the best solution.

Using a worker tier with cron.yaml is definitely the easiest fix. However, what the documentation doesn't make clear is that this will put the job at the end of the SQS queue you're using to actually run your jobs. If your cron jobs are time sensitive (as many are), this isn't acceptable, since it would depend on the size of the queue. One option is to use a completely separate environment just to run cron jobs, but I think that's overkill.

Some of the other options, like checking to see if you're the first instance in the list, aren't ideal either. What if the current first instance is in the process of shutting down?

Instance protection can also come with issues - what if that instance gets locked up / frozen?

What's important to understand is how AWS itself manages the cron.yaml functionality. There is an SQS daemon which uses a Dynamo table to handle "leader election". It writes to this table frequently, and if the current leader hasn't written in a short while, the next instance will take over as leader. This is how the daemon decides which instance to fire the job into the SQS queue.

We can repurpose the existing functionality rather than trying to rewrite our own. You can see the full solution here: https://gist.github.com/dorner/4517fe2b8c79ccb3971084ec28267f27

That's in Ruby, but you can easily adapt it to any other language that has the AWS SDK. Essentially, it checks the current leader, then checks the state to make sure it's in a good state. It'll loop until there is a current leader in a good state, and if the current instance is the leader, execute the job.

Dierdre answered 26/5, 2017 at 14:45 Comment(0)
E
2

The best way to do this is to use an Elastic Beanstalk Worker Environment (see "Option 1" below). However, this will add to your server costs. If you don't want to do this, see "Option 2" below for how to configure cron itself.

Option 1: Use Elastic Beanstalk Worker environments

Amazon has support for Elastic Beanstalk Worker Environments. They are Elastic Beanstalk managed environments that come with an SQS queue which you can enqueue tasks onto. You can also give them a cron config that will automatically enqueue the task on a recurring schedule. Then, rather than receiving requests from a load balancer, the servers in a worker environment each have a daemon (managed by Elastic Beanstalk) that polls the queue for tasks and calls the appropriate web endpoint when they get a message on the queue. Worker environments have several benefits over running cron yourself:

  1. Performance. Your tasks are now running on dedicated servers instead of competing for CPU and memory with web requests. You can also have different specs for the worker servers (ex. you can have more memory on just the worker servers).
  2. Scalability. You can also scale up your number of worker servers to more than 1 in order to handle large task loads.
  3. Ad-hoc Tasks. Your code can enqueue ad-hoc tasks as well as scheduled ones.
  4. Standardization. You write tasks as web endpoints rather than needing to configure your own task framework, which lets your standardize your code and tooling.

If you just want a cron replacement, all you need to do is make a file called cron.yaml at the top level of your project, with config like the following:

cron.yaml

version: 1
cron:
 - name: "hourly"
   url: "/tasks/hourly"
   schedule: "0 */1 * * *"

This will call the url /tasks/hourly once an hour.

If you are deploying the same codebase to web and worker environments, you should have the task URLs require an environment variable that you set on worker environments and not web environments. This way, your task endpoints are not exposed to the world (task servers by default do not accept incoming HTTP requests, as the only thing making calls to them is the on-server daemon).

The full docs are here: https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html

Option 2: Configure Cron

If you want to run cron, you can do it, but it will run on every server, rather than only once (i.e. if you have three servers with an hourly cron job, your job will run 3 times on the hour). For this to work, you need to make sure your environment has only one server, or that your cronjobs are idempotent and safe from race conditions. Note that the leader_only flag in .ebextensions config isn't sufficient for setting up cron on a single server because environments aren't guaranteed to have a leader at all times (so using leader_only can cause you to have no servers with cron configured). Here is an example .ebextensions config file that installs cron:

.ebextensions/cron.config

container_commands:
    01_remove_cron_jobs:
        command: "rm /etc/cron.d/cronjobs || exit 0"
    02_set_up_cron:
        command: "cat .ebextensions/cronjobs.txt > /etc/cron.d/cronjobs && chmod 644 /etc/cron.d/cronjobs"
        leader_only: true

This config file assumes the existence of a file .ebextensions/cronjobs.txt. This file contains your actual cron config. Note that in order to have environment variables loaded and your code in scope, you need to have code that does this baked into each command. The following is an example cron config that works on an Amazon Linux 2 based Python environment:

.ebextensions/cronjobs.txt

SHELL=/bin/bash
PROJECT_PATH=/var/app/current
ENV_PATH=/opt/elasticbeanstalk/deployment/env

# m h dom mon dow user command
0 * * * * ec2-user set -a; source <(sudo cat $ENV_PATH) && cd $PROJECT_PATH && python HOURLY_COMMAND > /dev/null
# Cron requires a newline at the end of the file
Essayistic answered 15/2, 2023 at 22:7 Comment(0)
H
1

Here is a full explanation of the solution:

http://blog.paulopoiati.com/2013/08/25/running-cron-in-elastic-beanstalk-auto-scaling-environment/

Hinton answered 25/8, 2013 at 23:49 Comment(1)
don't use leader_only to create a unique instance within ASG. ASG never guarantees that you will be able to retain that specific instance but it only guarantees that number of instances in service. The leader instance may terminate due to failed EB health check.Godroon
A
0

To control whether Auto Scaling can terminate a particular instance when scaling in, use instance protection. You can enable the instance protection setting on an Auto Scaling group or an individual Auto Scaling instance. When Auto Scaling launches an instance, the instance inherits the instance protection setting of the Auto Scaling group. You can change the instance protection setting for an Auto Scaling group or an Auto Scaling instance at any time.

http://docs.aws.amazon.com/autoscaling/latest/userguide/as-instance-termination.html#instance-protection

Amann answered 10/8, 2016 at 20:23 Comment(0)
C
0

I had another solution to this if a php file needs to be run through cron and if you had set any NAT instances then you can put cronjob on NAT instance and run php file through wget.

Coupling answered 22/8, 2016 at 9:25 Comment(0)
C
0

here is a fix incase you want to do this in PHP. You just need cronjob.config in your .ebextensions folder to get it to work like this.

files:
  "/etc/cron.d/my_cron":
    mode: "000644"
    owner: root
    group: root
    content: |
        empty stuff
    encoding: plain
commands:
  01_clear_cron_backup:
    command: "rm -f /etc/cron.d/*.bak"
  02_remove_content:
    command: "sudo sed -i 's/empty stuff//g' /etc/cron.d/my_cron"
container_commands:
  adding_cron:
    command: "echo '* * * * * ec2-user . /opt/elasticbeanstalk/support/envvars && /usr/bin/php /var/app/current/index.php cron sendemail > /tmp/sendemail.log 2>&1' > /etc/cron.d/my_cron"
    leader_only: true

the envvars gets the environment variables for the files. You can debug the output on the tmp/sendemail.log as above.

Hope this helps someone as it surely helped us!

Cocklebur answered 15/5, 2017 at 3:12 Comment(0)
C
0

Based on the principles of the answer from user1599237, where you let the cron jobs run on all instances but then instead in the beginning of the jobs determine if they should be allowed to run, I have made another solution.

Instead of looking at the running instances (and having to store your AWS key and secret) I'm using the MySQL database that I'm already connecting to from all instances.

It has no downsides, only positives:

  • no extra instance or expenses
  • rock solid solution - no chance of double execution
  • scalable - automatically works as your instances are scaled up and down
  • failover - automatically works in case an instance has a failure

Alternatively, you could also use a commonly shared filesystem (like AWS EFS via the NFS protocol) instead of a database.

The following solution is created within the PHP framework Yii but you can easily adapt it for another framework and language. Also the exception handler Yii::$app->system is a module of my own. Replace it with whatever you are using.

/**
 * Obtain an exclusive lock to ensure only one instance or worker executes a job
 *
 * Examples:
 *
 * `php /var/app/current/yii process/lock 60 empty-trash php /var/app/current/yii maintenance/empty-trash`
 * `php /var/app/current/yii process/lock 60 empty-trash php /var/app/current/yii maintenance/empty-trash StdOUT./test.log`
 * `php /var/app/current/yii process/lock 60 "empty trash" php /var/app/current/yii maintenance/empty-trash StdOUT./test.log StdERR.ditto`
 * `php /var/app/current/yii process/lock 60 "empty trash" php /var/app/current/yii maintenance/empty-trash StdOUT./output.log StdERR./error.log`
 *
 * Arguments are understood as follows:
 * - First: Duration of the lock in minutes
 * - Second: Job name (surround with quotes if it contains spaces)
 * - The rest: Command to execute. Instead of writing `>` and `2>` for redirecting output you need to write `StdOUT` and `StdERR` respectively. To redirect stderr to stdout write `StdERR.ditto`.
 *
 * Command will be executed in the background. If determined that it should not be executed the script will terminate silently.
 */
public function actionLock() {
    $argsAll = $args = func_get_args();
    if (!is_numeric($args[0])) {
        \Yii::$app->system->error('Duration for obtaining process lock is not numeric.', ['Args' => $argsAll]);
    }
    if (!$args[1]) {
        \Yii::$app->system->error('Job name for obtaining process lock is missing.', ['Args' => $argsAll]);
    }

    $durationMins = $args[0];
    $jobName = $args[1];
    $instanceID = null;
    unset($args[0], $args[1]);

    $command = trim(implode(' ', $args));
    if (!$command) {
        \Yii::$app->system->error('Command to execute after obtaining process lock is missing.', ['Args' => $argsAll]);
    }

    // If using AWS Elastic Beanstalk retrieve the instance ID
    if (file_exists('/etc/elasticbeanstalk/.aws-eb-system-initialized')) {
        if ($awsEb = file_get_contents('/etc/elasticbeanstalk/.aws-eb-system-initialized')) {
            $awsEb = json_decode($awsEb);
            if (is_object($awsEb) && $awsEb->instance_id) {
                $instanceID = $awsEb->instance_id;
            }
        }
    }

    // Obtain lock
    $updateColumns = false;  //do nothing if record already exists
    $affectedRows = \Yii::$app->db->createCommand()->upsert('system_job_locks', [
        'job_name' => $jobName,
        'locked' => gmdate('Y-m-d H:i:s'),
        'duration' => $durationMins,
        'source' => $instanceID,
    ], $updateColumns)->execute();
    // The SQL generated: INSERT INTO system_job_locks (job_name, locked, duration, source) VALUES ('some-name', '2019-04-22 17:24:39', 60, 'i-HmkDAZ9S5G5G') ON DUPLICATE KEY UPDATE job_name = job_name

    if ($affectedRows == 0) {
        // record already exists, check if lock has expired
        $affectedRows = \Yii::$app->db->createCommand()->update('system_job_locks', [
                'locked' => gmdate('Y-m-d H:i:s'),
                'duration' => $durationMins,
                'source' => $instanceID,
            ],
            'job_name = :jobName AND DATE_ADD(locked, INTERVAL duration MINUTE) < NOW()', ['jobName' => $jobName]
        )->execute();
        // The SQL generated: UPDATE system_job_locks SET locked = '2019-04-22 17:24:39', duration = 60, source = 'i-HmkDAZ9S5G5G' WHERE job_name = 'clean-trash' AND DATE_ADD(locked, INTERVAL duration MINUTE) < NOW()

        if ($affectedRows == 0) {
            // We could not obtain a lock (since another process already has it) so do not execute the command
            exit;
        }
    }

    // Handle redirection of stdout and stderr
    $command = str_replace('StdOUT', '>', $command);
    $command = str_replace('StdERR.ditto', '2>&1', $command);
    $command = str_replace('StdERR', '2>', $command);

    // Execute the command as a background process so we can exit the current process
    $command .= ' &';

    $output = []; $exitcode = null;
    exec($command, $output, $exitcode);
    exit($exitcode);
}

This is the database schema I'm using:

CREATE TABLE `system_job_locks` (
    `job_name` VARCHAR(50) NOT NULL,
    `locked` DATETIME NOT NULL COMMENT 'UTC',
    `duration` SMALLINT(5) UNSIGNED NOT NULL COMMENT 'Minutes',
    `source` VARCHAR(255) NULL DEFAULT NULL,
    PRIMARY KEY (`job_name`)
)
C answered 22/4, 2019 at 17:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.