I ran into an issue where if the delayed job dies while it still has a job locked, that job will not be freed. I wrote a wrapper script around delayed job that will look at the pid file and free any jobs from the dead worker.
The script is for rubber/capistrano
roles/delayedjob/delayed_job_wrapper:
<% @path = '/etc/monit/monit.d/monit-delayedjob.conf' %>
<% workers = 4 %>
<% workers.times do |i| %>
<% PIDFILE = "/mnt/custora-#{RUBBER_ENV}/shared/pids/delayed_job.#{i}.pid" %>
<%= "check process delayed_job.#{i} with pidfile #{PIDFILE}"%>
group delayed_job-<%= RUBBER_ENV %>
<%= " start program = \"/bin/bash /mnt/#{rubber_env.app_name}-#{RUBBER_ENV}/current/script/delayed_job_wrapper #{i} start\"" %>
<%= " stop program = \"/bin/bash /mnt/#{rubber_env.app_name}-#{RUBBER_ENV}/current/script/delayed_job_wrapper #{i} stop\"" %>
<% end %>
roles/delayedjob/delayed_job_wrapper
#!/bin/bash
<% @path = "/mnt/#{rubber_env.app_name}-#{RUBBER_ENV}/current/script/delayed_job_wrapper" %>
<%= "pid_file=/mnt/#{rubber_env.app_name}-#{RUBBER_ENV}/shared/pids/delayed_job.$1.pid" %>
if [ -e $pid_file ]; then
pid=`cat $pid_file`
if [ $2 == "start" ]; then
ps -e | grep ^$pid
if [ $? -eq 0 ]; then
echo "already running $pid"
exit
fi
rm $pid_file
fi
locked_by="delayed_job.$1 host:`hostname` pid:$pid"
<%=" /usr/bin/mysql -e \"update delayed_jobs set locked_at = null, locked_by = null where locked_by='$locked_by'\" -u#{rubber_env.db_user} -h#{rubber_instances.for_role('db', 'primary' => true).first.full_name} #{rubber_env.db_name} " %>
fi
<%= "cd /mnt/#{rubber_env.app_name}-#{RUBBER_ENV}/current" %>
. /etc/profile
<%= "RAILS_ENV=#{RUBBER_ENV} script/delayed_job -i $1 $2"%>