How can I make SocketIO more performant?
Asked Answered
R

1

7

We used SocketIO quite extensively in Etherpad(since very early on) and we're really grateful for all of the efforts of the team for providing such a useful thingy :)

Etherpad is a nodejs project.

My problem with SocketIO is probably due to me misconfiguration or understanding something but after quite a lot of test tool generation, tweaking of memory settings etc. we still get a frustratingly low maximum number of messages per second, hitting the 10k mark.

Etherpad latest simulated load test

Reading online it looks like switching to ws would be more performant but I fail to see how that could be the case in our scenario where our bottleneck is not negotiations (which end up being websockets) but messages per second handled by the server.

I'm reluctant to try other packages so I thought I'd come here and ask for some insight or things to try to see if we can improve performance by, well, a lot.. The usual node tricks(access to more hardware[ram/cpu]) help a bit but it still feels like we're getting really small gains and not the huge numbers you see on other module benchmarks.

A dream outcome of this question would be for someone to look at the Etherpad code and tell me why I'm an idiot and hopefully we can get Etherpad into the competitive ~100k changes per second but also I may be being misty eyed about other modules so if anyone has benchmarks that contradict the likes of ws then I'm all ears.

I feel like I should just add, we tested to see if it was internal Etherpad logic that is the cause and it's not, it really is the communication layer that ends up bottlenecking an operational transform algorithm, we're like 99.95% sure...

Throwing more hardware at this problem is not the solution, nor is any method of reverse proxy/passing the problem.

Ricercar answered 23/10, 2020 at 20:22 Comment(9)
I don't know if you're using node as your backend, but if you are then the net module may help with this.Wombat
@wlgfx how? I don't see the relationship to my query.Ricercar
I found some resources that might help. Check out this comment about how Trello hit a 10K cap like you are. Then this article about socket.io benchmarking. There is also this github issue that has some interesting info. You may have to look at multi-server solution with redis (or another pub/sub backend) to support your needs, but might be too costly for you to reach 100K msg/s.Deaconess
I ran into a problem with socket.io on a 512Mb server (very low) and had to find a different solution. The net module fixed this for me as the net sockets (not socket.io) are not permanently connected. Requests are made through net and then the connection is closed, causing the server much less stress. You can still grab the id from the browser for authenticity. I'm still working with my project so I don't have any proof of concept to add as an answer. When I do I will happily do so.Wombat
Hello John, you write that you tried "quite a lot of test tool generation, tweaking of memory settings etc". Can you give us a list of things you have tried to resolve your issue?Monserratemonsieur
@Monserratemonsieur Here is our load test tool: github.com/ether/etherpad-load-test We tried increase nodejs heap size, allocating more memory/cpu etc. We haven't tried changing how socket-io operates IE forking away from upstream.Ricercar
Can you please elaborate a bit more on "it really is the communication layer that ends up bottlenecking an operational transform algorithm". What part of the "communication layer" causes the bottleneck? My first thought was that the OT algorithm could be the biggest bottleneck. OT's are non very scalable in comparison to CRDT's. Although I don't think that going from OT to CRDT is an easy task...Hexangular
The communication layer being socket.io. OT is not the biggest bottleneck, we proved this doing node profiling and creating a similar communication stack that doesn't do the OT part and I can reproduce the bottleneck.Ricercar
I'd highly suggest trying out uWebsockets as a replacement. You'll find the bottleneck is very likely socket.io itself. I'm currently running 5k clients on a server and we're barely breaking 400MB of RAM and a single core. We were hitting 7GB of RAM and 4+ cores maxed out when using ws, I'm going to guess you're in a similar boat.Unstained
C
1

If you are blind to where the "problem" is, you don't have many options. You could be looking for a "misconfiguration" that does not exist. Which could waste you a lot of time and money and in the end you will probably still have to switch.

Maturity, one discovers, has everything to do with the acceptance of "not knowing".

Rewrite pieces of the code that are relevant for the load test, to test if using e.g. uWebSockets would help push the bondary. There are multiple sources stating that uWebSockets server is A LOT faster. I bet it will not take that much time and you will get very important information you need to help you decide if its worth switching. The new web technology is moving forward extremely fast and if you want to be able to make a right choice for future of the product, you have to be willing to experiment with it. Alex Hultman wrote an article

how-µwebsockets-achieves-efficient-pub-sub

where he encorages switching and explains why its worth a try.

Cliff answered 29/10, 2020 at 22:21 Comment(1)
I don't think this is the answer I'm looking for.Ricercar

© 2022 - 2024 — McMap. All rights reserved.