mod_jk losing connection to tomcat
Asked Answered
T

1

8

I am having an issue with my current production server which has just started over the last couple of days. I am running apache httpd-2.2.3 and tomcat-5.5.20, connected with mod_jk v1.3, and have a Spring MVC site hosted on the tomcat. What is happening is that after being up for around 12 hours the web site hangs for our users. When this first happened I could see several of the following errors in the catalina.out

WARN [org.apache.jk.core.MsgContext] Error sending end packet
java.net.SocketException: Broken pipe

After looking this up I came to understand that this meant that a user had cancelled a request before it had completed and so that return path was closed hence the data could not go back. From searching the web it looked like this could cause the thread to remain open in tomcat until it reached its timeout. This seemed to make sense since I got at the end of the catalina.out log when the tomcat fell over

All threads (200) are currently busy, waiting. Increase maxThreads (200) or check the servlet status

The suggestion was to make the following change to the JkModule settings in apache httpd.conf

JkOptions +DisableReuse

I did this after ensuring it caused no side effects to our site and it ran fine the next day but then yesterday the same symptoms appeared with the web site having frozen. This time however there were no errors at all in the catalina.out, we just stopped getting requests through to the tomcat. I can see from the application log that it received the last request at 17:31, and then in the mod_jk.log I can see the following

[Thu Sep 06 17:37:07 2012] [18784:53792] [error] ajp_connection_tcp_get_message::jk_ajp_common.c (947): (worker1) can't receive the response message from tomcat, network problems or tomcat is down (127.0.0.1:8009), err=-104
[Thu Sep 06 17:37:07 2012] [18784:53792] [error] ajp_get_reply::jk_ajp_common.c (1536): (worker1) Tomcat is down or refused connection. No response has been sent to the client (yet)

and then in my httpd error_log

[Thu Sep 06 17:38:39 2012] [error] server reached MaxClients setting, consider raising the MaxClients setting

So it was 6 minutes before I got any error and then after that it was 1 min 30 before the max clients error. Restarting the tomcat also fixed this particular problem.

There have been no changes to our apache, tomcat or connector config except the one I mentioned (current config below) but we have made changes to our site to perform more Ajax requests per user. So what I would like to understand is how am I best to analyse our system to understand what the correct settings changes I can make are to ensure that I don't overload our server but do stop this problem from happening.

Thanks Iain

Current Config

httpd.conf

Timeout 300
KeepAlive on
MaxKeepAliveRequests 100
KeepAliveTimeout 15

LoadModule jk_module modules/mod_jk.so
JkLogLevel    error
JkLogStampFormat "[%a %b %d %H:%M:%S %Y] "
JkOptions     +ForwardKeySize +ForwardURICompat -ForwardDirectories +DisableReuse

workers.properties

# Define 1 real worker using ajp13
worker.list=worker1
# Set properties for worker1 (ajp13)
worker.worker1.type=ajp13
worker.worker1.host=localhost
worker.worker1.port=8009
worker.worker1.lbfactor=50
worker.worker1.cachesize=10
worker.worker1.cache_timeout=600
worker.worker1.socket_keepalive=1
worker.worker1.recycle_timeout=300

httpd-mpm.conf

StartServers          5
MinSpareServers       5
MaxSpareServers      10
MaxClients          150 
MaxRequestsPerChild   0

Tomcat settings are just the standard tomcat settings

Tania answered 7/9, 2012 at 1:27 Comment(2)
Did you consider upgrading to Tomcat 6/7?Treatment
I did not setup the tech stack and I was hoping I would not have to upgrade. If it comes down to it then may be something to try but I have a feeling this can be fixed with the right configurationTania
T
3

Turns out the answer was to change the keepalive timeout. All I needed to stop this from happening was to change the KeepAliveTimeout from 15 to 2 and add MaxRequestsPerChild of 5000. I found this stopped this issue from recurring

Tania answered 12/6, 2013 at 0:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.