We're using haproxy in front of a netty-3.6-run backend. We are handling a huge number of connections, some of which can be longstanding.
Now the problem is that when haproxy closes a connection for means of rebalancing, it does so by sending a tcp-RST. When the sun.nio.ch-class employed by netty sees this, it throws an IOException: "Connection reset by peer".
Trace:
sun.nio.ch.FileDispatcherImpl.read0(Native Method):1 in ""
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39):1 in ""
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:225):1 in ""
sun.nio.ch.IOUtil.read(IOUtil.java:193):1 in ""
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375):1 in ""
org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64):1 in ""
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109):1 in ""
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312):1 in ""
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90):1 in ""
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178):1 in ""
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145):1 in ""
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615):1 in ""
java.lang.Thread.run(Thread.java:724):1 in ""
This causes the following problems per configuration:
option http-pretend-keepalive
This is what works best (as haproxy seems to close most connections with a FIN rather than RST), but still produces about 3 exceptions per server per second. Also, it effectively neuters loadbalancing, because some incoming connections are very longstanding whith very high throughput: with pretend-keepalive, they never get rebalanced to another server by haproxy.
option http-keep-alive
Since our backend expects keep-alive connections to really be kept alive (and hence does not close them on its own), this setting amounts to every connection eventually netting one exception, which in turn crashes our servers. We tried adding prefer-last-server, but it doesn't help much.
option http-server-close
This should theoretically work for both proper loadbalancing and no exceptions. However, it seems that after our backend-servers respond, there is a race as to which side sends its RST first: haproxy or our registered ChannelFutureListener.CLOSE. In practice, we still get too many exceptions and our servers crash.
Interestingly, the exceptions generally get more, the more workers we supply our channels with. I guess it speeds up reading more than writing.
Anyways, I've read up on the different channel- and socketoptions in netty as well as haproxy for a while now and didn't really find anything that sounded like a solution (or worked when I tried it).