Segmentation faults when running rails on ruby 1.9.3
Asked Answered
O

1

6

Running a pretty sizable rails app, we've recently gotten around to upgrading it to rails 3.

Our stack is ruby-1.9.3p484, rails 3.2.16 and passenger 4.0.23 running on top of apache.

After throwing some traffic at a couple of our machines, we started noticing a few really strange errors coming down.

Things like random methods not being defined on objects that would obviously have them, instance variables being nil inside AR associations, and objects just being randomly replaced with 'false'. Just all around strange behavior.

Inspecting apache's logs gave us another bit of info, namely that as these errors were coming in, more often than not, their respective processes would keel over as well, on random bits of the app.

Sometimes it would be just a ruby node getting passed in as null, other times it would be just some random string overflowing, just random stuff getting mangled about.

None of this happened during testing, so the only 'reliable' way of reproducing this thus far has been to just throw traffic at the respective machines and see when / if they start exhibiting this behavior.

Having gone through all of this, here's a list of things we've ruled out up to now:

  • passenger's oob garbage collection
  • rails 3 itself ( apparently we'd been getting these before as well, but they were far enough apart to not set off any alarms )
  • serialization / shoving things in and out of memcached
  • libxml - there were some reports about of version 2.5.0 causing memory corruption, upgrading to 2.7.0 didn't really make a difference
  • turning off prelinking ( this can cause memory corruption, per https://www.ruby-forum.com/topic/205897 )

Returning the GC settings to stock seems to have alleviated the problem, but we don't really have anything conclusive in that regard. It would seem though that more collections result in a lower occurrence rate for the issue.

Any thoughts on what might be causing this or we could use to help us pinpoint the issue?

Olomouc answered 31/1, 2014 at 16:1 Comment(3)
Try with Nginx and Unicorn and see if you still have issues. I had used Passenger for a good while and did not know the source of issues I was having on production (until I got rid of Passenger)Suilmann
@Suilmann yeah, seems like that was the fix for us as well. So weird.Olomouc
Yup.. I've never been happier since I've made the switch :-)Suilmann
D
1

I've had 1.9.3-p484 segfault at_exit on test runs as well but I haven't looked into it yet. In my case it seems to be triggered by certain dependencies for the test suite.

We've also had problems with another project on Rails 3 that ended up abandoning their port to Rails 3 and sticking with 2.3 ):

Have you tried running the app outside of Apache / Passenger?

Davisdavison answered 5/2, 2014 at 9:14 Comment(2)
Yeah, was just getting ready to update this. For some oddball reason the fix for us ended up being running this under unicorn. Still not sure how exactly that would affect it, but there you go.Olomouc
Well, there are a number of reasons (: For example: blog.phusion.nl/2012/05/09/…Davisdavison

© 2022 - 2024 — McMap. All rights reserved.