So we use Spring websocket STOMP + RabbitMQ on the backend and we are having troubles with the open file descriptors. After a certain time, we hit the limit on the server and the server does not accept any connections, including both websockets and API endpoints.
2018-09-14 18:04:13.605 INFO 1288 --- [MessageBroker-1]
o.s.w.s.c.WebSocketMessageBrokerStats : WebSocketSession[2 current WS(2)-
HttpStream(0)-HttpPoll(0), 1159 total, 0 closed abnormally (0 connect
failure, 0 send limit, 63 transport error)], stompSubProtocol[processed
CONNECT(1014)-CONNECTED(1004)-DISCONNECT(0)], stompBrokerRelay[9 sessions,
127.0.0.1:61613 (available), processed CONNECT(1015)-CONNECTED(1005)-
DISCONNECT(1011)], inboundChannel[pool size = 2, active threads = 2, queued
tasks = 2, completed tasks = 12287], outboundChannelpool size = 0, active
threads = 0, queued tasks = 0, completed tasks = 4225], sockJsScheduler[pool
size = 1, active threads = 1, queued tasks = 3, completed tasks = 683]
And we are getting the below exceptions:
2018-09-14 18:04:13.761 ERROR 1288 --- [http-nio-127.0.0.1-8443-Acceptor-0]
org.apache.tomcat.util.net.NioEndpoint : Socket accept failed
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:455)
at java.lang.Thread.run(Thread.java:748)
The default file descriptor limit for linux is 1024 and even if we increase it to something like 65000, it will reach the limit at some point no matter what.
We want to solve this problem from the backend side and preferably by the Spring without any workarounds. Any ideas?
UPDATE
RabbitMQ and the application reside on different servers. Actually, RabbitMQ works on Compose. We can reproduce this issue by not sending DISCONNECT messages from the client.
UPDATE 2
Today I realized that all the file descriptors and java threads always stay there, no matter what happens. I have implemented a workaround that includes sending DISCONNECT messages from Spring and closing the WebSocketSession
objects and no changes. I have implemented these by checking the below links:
- Disconnect client session from Spring websocket stomp server
- https://github.com/isaranchuk/spring-websocket-disconnect
- https://github.com/rstoyanchev/spring-websocket-portfolio
And as a side note, the server side sends the messages like this:
simpMessagingTemplate.convertAndSend("/queue/" + sessionId, payload)
. In this way, we ensure that each client gets the corresponding message by the relevant sessionId.
Is this some sort of bug? Why aren't the file descriptors being closed? Did nobody encounter this issue before?
UPDATE 3
Every time a socket is being closed, I see the below exception. It doesn't matter how it's being closed, either by a DISCONNECT message from the client or webSocketSession.close()
code from server.
[reactor-tcp-io-66] o.s.m.s.s.StompBrokerRelayMessageHandler : TCP connection failure in session 45r7i9u3: Transport failure: epoll_ctl(..) failed: No such file or directory
io.netty.channel.unix.Errors$NativeIoException: epoll_ctl(..) failed: No such file or directory
at io.netty.channel.unix.Errors.newIOException(Errors.java:122)
at io.netty.channel.epoll.Native.epollCtlMod(Native.java:134)
at io.netty.channel.epoll.EpollEventLoop.modify(EpollEventLoop.java:186)
at io.netty.channel.epoll.AbstractEpollChannel.modifyEvents(AbstractEpollChannel.java:272)
at io.netty.channel.epoll.AbstractEpollChannel.clearFlag(AbstractEpollChannel.java:125)
at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.clearEpollRdHup(AbstractEpollChannel.java:450)
at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollRdHupReady(AbstractEpollChannel.java:442)
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:417)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:310)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
at java.lang.Thread.run(Thread.java:748)
So I changed the level of logs to TRACE
and I see that the websockets are really being closed, but immediately these exceptions are being thrown. So at this point, I am really suspicious about this exception. The number of hung java threads always go hand-in-hand with the number of websockets, i.e. creating 400 websockets always end up ~400 hung threads in the main process. And the memory resources are never being released.
Googling this exception ends up only with the 4 below results: (The rest is other exceptions)
- https://github.com/netty/netty/issues/2414
- https://github.com/reactor/reactor-ipc/issues/16
- https://github.com/cloudfoundry/cf-java-client/issues/495
- https://github.com/cloudfoundry/cf-java-client/issues/480
Updating the netty
library to the latest version (4.1.29.Final) didn't work either, so I changed the tags of the question accordingly. I am also considering to create an issue against netty
. I have tried a lot of things and experimented several times on the application level but nothing seems to be working. I am open to any kind of ideas at this point.
spring-amqp
. – Vermicularspring-boot-starter-websocket
and I guess that doesn't includespring-amqp
. In my case the issue is not Java threads being spawn immediately, but they are not being closed. Maybe I should switch the version to the latest and retry again. – Jaehne