I have a production ruby sinatra app running on nginx/passenger, and I frequently see requests get inexplicably stalled. I wrote a script to call passenger-status on my cluster of machines every ten seconds and plot the results on a graph. This is what I see:
The blue line shows the global queue waiting spiking constantly to 60. This is an average across 4 machines, so when the blue line hits 60, it means every machine is maxed out. I have the current passenger_max_pool_size set to 20, so it's getting to 3x the max pool size, and then presumably dropping subsequent requests.
My app depends on two key external resources - an Amazon RDS mysql backend and a Redis instance. Perhaps one of these is periodically becoming slow or unresponsive and thereby causing this behavior?
Can anyone advise me on how to get a stack trace to see if the bottleneck here is Amazon RDS, Redis, or something else?
Thanks!