Open connections via Spring Websocket STOMP cause our server to die

So we use Spring websocket STOMP + RabbitMQ on the backend and we are having troubles with the open file descriptors. After a certain time, we hit the limit on the server and the server does not accept any connections, including both websockets and API endpoints.

2018-09-14 18:04:13.605  INFO 1288 --- [MessageBroker-1] 
o.s.w.s.c.WebSocketMessageBrokerStats    : WebSocketSession[2 current WS(2)- 
HttpStream(0)-HttpPoll(0), 1159 total, 0 closed abnormally (0 connect 
failure, 0 send limit, 63 transport error)], stompSubProtocol[processed 
CONNECT(1014)-CONNECTED(1004)-DISCONNECT(0)], stompBrokerRelay[9 sessions, 
127.0.0.1:61613 (available), processed CONNECT(1015)-CONNECTED(1005)- 
DISCONNECT(1011)], inboundChannel[pool size = 2, active threads = 2, queued 
tasks = 2, completed tasks = 12287], outboundChannelpool size = 0, active 
threads = 0, queued tasks = 0, completed tasks = 4225], sockJsScheduler[pool 
size = 1, active threads = 1, queued tasks = 3, completed tasks = 683]

And we are getting the below exceptions:

2018-09-14 18:04:13.761 ERROR 1288 --- [http-nio-127.0.0.1-8443-Acceptor-0] 
org.apache.tomcat.util.net.NioEndpoint   : Socket accept failed

java.io.IOException: Too many open files
    at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
    at org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:455)
    at java.lang.Thread.run(Thread.java:748)

The default file descriptor limit for linux is 1024 and even if we increase it to something like 65000, it will reach the limit at some point no matter what.

We want to solve this problem from the backend side and preferably by the Spring without any workarounds. Any ideas?

UPDATE

RabbitMQ and the application reside on different servers. Actually, RabbitMQ works on Compose. We can reproduce this issue by not sending DISCONNECT messages from the client.

UPDATE 2

Today I realized that all the file descriptors and java threads always stay there, no matter what happens. I have implemented a workaround that includes sending DISCONNECT messages from Spring and closing the WebSocketSession objects and no changes. I have implemented these by checking the below links:

And as a side note, the server side sends the messages like this: simpMessagingTemplate.convertAndSend("/queue/" + sessionId, payload). In this way, we ensure that each client gets the corresponding message by the relevant sessionId.

Is this some sort of bug? Why aren't the file descriptors being closed? Did nobody encounter this issue before?

UPDATE 3

Every time a socket is being closed, I see the below exception. It doesn't matter how it's being closed, either by a DISCONNECT message from the client or webSocketSession.close() code from server.

[reactor-tcp-io-66] o.s.m.s.s.StompBrokerRelayMessageHandler : TCP connection failure in session 45r7i9u3: Transport failure: epoll_ctl(..) failed: No such file or directory
io.netty.channel.unix.Errors$NativeIoException: epoll_ctl(..) failed: No such file or directory
at io.netty.channel.unix.Errors.newIOException(Errors.java:122)
at io.netty.channel.epoll.Native.epollCtlMod(Native.java:134)
at io.netty.channel.epoll.EpollEventLoop.modify(EpollEventLoop.java:186)
at io.netty.channel.epoll.AbstractEpollChannel.modifyEvents(AbstractEpollChannel.java:272)
at io.netty.channel.epoll.AbstractEpollChannel.clearFlag(AbstractEpollChannel.java:125)
at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.clearEpollRdHup(AbstractEpollChannel.java:450)
at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollRdHupReady(AbstractEpollChannel.java:442)
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:417)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:310)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
at java.lang.Thread.run(Thread.java:748)

So I changed the level of logs to TRACE and I see that the websockets are really being closed, but immediately these exceptions are being thrown. So at this point, I am really suspicious about this exception. The number of hung java threads always go hand-in-hand with the number of websockets, i.e. creating 400 websockets always end up ~400 hung threads in the main process. And the memory resources are never being released.

Googling this exception ends up only with the 4 below results: (The rest is other exceptions)

Updating the netty library to the latest version (4.1.29.Final) didn't work either, so I changed the tags of the question accordingly. I am also considering to create an issue against netty. I have tried a lot of things and experimented several times on the application level but nothing seems to be working. I am open to any kind of ideas at this point.

Recommended topics

Hot tags