How to monitor resque workers in New Relic when running on Heroku?

Asked 19/9, 2012 at 22:19 Answered 18/10, 2012 at 21:41

We've got an app that runs resque workers on Heroku. We've installed the New Relic add-on, and according to the docs the New Relic Agent should auto-instrument resque workers. However, we're seeing no output on the "Background Jobs" tab on the New Relic dashboard.

According to the same docs, we didn't touch the newrelic.yml file. We're neither sure what's wrong nor how to debug this effectively. What do we need to do?

Heatstroke answered 19/9, 2012 at 22:19 Comment(1)

In the logs for your Resque workers do you see the newrelic agent connecting? – Prerequisite 20/9, 2012 at 22:15

It turned out that our problem was caused by having our own custom Resque.before_fork and Resque.after_fork handlers.

NewRelic's RPM gem will automatically set up hooks with Resque.before_fork and Resque.after_fork to establish a communication channel for the workers. As a limitation of Resque, it runs only the last assigned block/Proc to the before_fork and after_fork hooks. So, if you have your own custom before_fork/after_fork hooks, you *must * set up the agent's communication channel by hand, e.g. in a config/initializers/custom_resque.rb file:

Resque.before_fork do |job|
  NewRelic::Agent.register_report_channel(job.object_id)

  # extra custom stuff here
end
  
Resque.after_fork do |job|
  NewRelic::Agent.after_fork(:report_to_channel => job.object_id)

  # extra custom stuff here
end

This code is directly taken from the RPM gem's file gems/newrelic_rpm-3.5.0/lib/new_relic/agent/instrumentation/resque.rb

RPM bug update 12/27/2012: After deploying the technique above, we found that the RPM gem leaks file handles when used in forked mode (e.g. Resque). We observed error messages of the kind ActiveRecord::StatementInvalid: ArgumentError: too large fdsets: SET client_min_messages TO ''. After a lot of digging we found that these are caused when ActiveRecord tries to open a database connection and can't because the number of file descriptors is exhausted. New Relic confirmed that there is a bug in the agent when sampling the explain plan. This occurs when lots of Resque jobs run that connect to the DB.

Bug update 1/28/2013: After much head scratching we found out that this bug was caused by an unsupported interaction with the resque-lonely_job gem which uses Resque's before_perform hook that may stop a Resque job with a Resque::Job::DontPerform exception. The RPM client doesn't clean up properly in this situation and leaks file descriptors. New Relic has been informed and is working on a fix.

Bug update 4/10/2013: This has been fixed. We're using 3.6.0.78 and it handles this case. No more file descriptor leaks! Thank you New Relic.

Heatstroke answered 18/10, 2012 at 21:41 Comment(5)

I work for New Relic, and this is completely correct. We'll be updating the documentation to make this clearer in the future. Thanks for the work on finding this. – Ami 26/10, 2012 at 21:18

We've got a doc describing the before_fork / after_fork hooks up here: newrelic.com/docs/ruby/resque-instrumentation Regarding the file descriptor leak - it's actually unrelated to the explain plan functionality, and only happens under certain conditions, but I think we understand it now and are working on a fix. – Gilbart 9/1, 2013 at 16:50

Just a note about the 1/28/2013 bug update: We've been in contact with New Relic support about this and they say it will take some time until they have a newer gem that fixes the issue. In the meantime, you can workaround the issue by monkeypatching any place that raises a Resque::Job::DontPerform to call NewRelic::Agent.shutdown right before the exception is raised. – Knotted 31/1, 2013 at 1:58

Further bug update: we've now fixed the issue with resque-lonely_job, as of agent version 3.6.0. – Gilbart 9/4, 2013 at 17:37

I can confirm that; the latest update to the New Relic gem fixes this. Thank you New Relic for being responsive on this! – Heatstroke 10/4, 2013 at 20:12

I was having the same problem because the New Relic agent wasn't starting within my Resque workers. So I updated my resque:setup rake task to start the agent manually:

task "resque:setup" => :environment do
  if ENV['NEW_RELIC_APP_NAME']
    NewRelic::Agent.manual_start :app_name => ENV['NEW_RELIC_APP_NAME']
  end
end

Turpin answered 2/10, 2012 at 3:9 Comment(1)

Yes, if you install the New Relic Heroku add-on, the ENV['NEW_RELIC_APP_NAME'] variable should be automatically set for you. – Turpin 5/10, 2012 at 1:4

Tried @trliner suggestion but I was keep getting this error:

rake aborted!
undefined local variable or method `establish_connection' for ActiveRecord::Base:Class

There is easier solution, just add NEWRELIC_ENABLE env to your heroku instance and everything should work:

heroku config:set NEWRELIC_ENABLE=true

Artichoke answered 7/10, 2012 at 12:49 Comment(3)

That's strange that my answer didn't work for you, but I think I like your solution better anyway. Out of curiosity, which Heroku stack are you on? – Turpin 8/10, 2012 at 17:40

I accepted too soon. That didn't do the trick. The New Relic agent also auto-instruments resque. We see events sent when we point the dev machines to same NR credentials as the Heroku instance. However, once the code is deployed, it won't send events... – Heatstroke 17/10, 2012 at 23:1

this solution worked for me, but i didn't have any of the before_fork or after_fork stuff – Doubletree 31/1, 2013 at 21:49

Recommended topics

Hot tags