Issues with celery daemon
Asked Answered
A

1

6

We're having issues with our celery daemon being very flaky. We use a fabric deployment script to restart the daemon whenever we push changes, but for some reason this is causing massive issues.

Whenever the deployment script is run the celery processes are left in some pseudo dead state. They will (unfortunately) still consume tasks from rabbitmq, but they won't actually do anything. Confusingly a brief inspection would indicate everything seems to be "fine" in this state, celeryctl status shows one node online and ps aux | grep celery shows 2 running processes.

However, attempting to run /etc/init.d/celeryd stop manually results in the following error:

start-stop-daemon: warning: failed to kill 30360: No such process

While in this state attempting to run celeryd start appears to work correctly, but in fact does nothing. The only way to fix the issue is to manually kill the running celery processes and then start them again.

Any ideas what's going on here? We also don't have complete confirmation, but we think the problem also develops after a few days (with no activity this is a test server currently) on it's own with no deployment.

Alicyclic answered 1/7, 2011 at 17:15 Comment(3)
We use a deployment script too, but not with fabric, we just execute the shell command celeryd restart from python, and everything works fine. I know some issues of the celeryd.sh script with olds version of ubuntu, less than 10.10 because of a bash instruction for get running process. On which OS you are running it ? Which version of celery ?Crackup
How exactly does your script restart the daemon? Is it just firing off a kill -9 or similar?Ornelas
It triggers the init.d script's stop command. This is the included init.d script available from celery's github contrib files. It used to trigger restart instead of stop then start, but I changed that as a shot in the dark. The init.d script calls the start-stop-daemon commandAlicyclic
R
5

I can't say that I know what's ailing your setup, but I've always used supervisord to run celery -- maybe the issue has to do with upstart? Regardless, I've never experienced this with celery running on top of supervisord.

For good measure, here's a sample supervisor config for celery:

[program:celeryd]
directory=/path/to/project/
command=/path/to/project/venv/bin/python manage.py celeryd -l INFO
user=nobody
autostart=true
autorestart=true
startsecs=10
numprocs=1
stdout_logfile=/var/log/sites/foo/celeryd_stdout.log
stderr_logfile=/var/log/sites/foo/celeryd_stderr.log

; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600

Restarting celeryd in my fab script is then as simple as issuing a sudo supervisorctl restart celeryd.

Rollet answered 1/8, 2011 at 6:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.