Running a pretty sizable rails app, we've recently gotten around to upgrading it to rails 3.
Our stack is ruby-1.9.3p484, rails 3.2.16 and passenger 4.0.23 running on top of apache.
After throwing some traffic at a couple of our machines, we started noticing a few really strange errors coming down.
Things like random methods not being defined on objects that would obviously have them, instance variables being nil inside AR associations, and objects just being randomly replaced with 'false'. Just all around strange behavior.
Inspecting apache's logs gave us another bit of info, namely that as these errors were coming in, more often than not, their respective processes would keel over as well, on random bits of the app.
Sometimes it would be just a ruby node getting passed in as null, other times it would be just some random string overflowing, just random stuff getting mangled about.
None of this happened during testing, so the only 'reliable' way of reproducing this thus far has been to just throw traffic at the respective machines and see when / if they start exhibiting this behavior.
Having gone through all of this, here's a list of things we've ruled out up to now:
- passenger's oob garbage collection
- rails 3 itself ( apparently we'd been getting these before as well, but they were far enough apart to not set off any alarms )
- serialization / shoving things in and out of memcached
- libxml - there were some reports about of version 2.5.0 causing memory corruption, upgrading to 2.7.0 didn't really make a difference
- turning off prelinking ( this can cause memory corruption, per https://www.ruby-forum.com/topic/205897 )
Returning the GC settings to stock seems to have alleviated the problem, but we don't really have anything conclusive in that regard. It would seem though that more collections result in a lower occurrence rate for the issue.
Any thoughts on what might be causing this or we could use to help us pinpoint the issue?