Node.js struggling with lots of concurrent connections
Asked Answered
V

2

7

I'm working on a somewhat unusual application where 10k clients are precisely timed to all try to submit data at once, every 3 mins or so. This 'ab' command fairly accurately simulates one barrage in the real world:

ab -c 10000 -n 10000 -r "http://example.com/submit?data=foo"

I'm using Node.js on Ubuntu 12.4 on a rackspacecloud VPS instance to collect these submissions, however, I'm seeing some very odd behavior from Node, even when I remove all my business logic and turn the http request into a no-op.

When the test gets about 90% done, it hangs for a long period of time. Strangely, this happens consistently at 90% - for c=n=10k, at 9000; for c=n=5k, at 4500; for c=n=2k, at 1800. The test actually completes eventually, often with no errors. But both ab and node logs show continuous processing up till around 80-90% of the test run, then a long pause before completing.

When node is processing requests normally, CPU usage is typically around 50-70%. During the hang period, CPU goes up to 100%. Sometimes it stays near 0. Between the erratic CPU response and the fact that it seems unrelated to the actual number of connections (only the % complete), I do not suspect the garbage collector.

I've tried this running 'ab' on localhost and on a remote server - same effect.

I suspect something related to the TCP stack, possibly involving closing connections, but none of my configuration changes have helped. My changes:

  • ulimit -n 999999
  • When I listen(), I set the backlog to 10000

Sysctl changes are:

net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_max_orphans = 20000
net.ipv4.tcp_max_syn_backlog = 10000
net.core.somaxconn = 10000
net.core.netdev_max_backlog = 10000

I have also noticed that I tend to get this msg in the kernel logs:

TCP: Possible SYN flooding on port 80. Sending cookies.  Check SNMP counters.

I'm puzzled by this msg since the TCP backlog queue should be deep enough to never overflow. If I disable syn cookies the "Sending cookies" goes to "Dropping connections".

I speculate that this is some sort of linux TCP stack tuning problem and I've read just about everything I could find on the net. Nothing I have tried seems to matter. Any advice?

Update: Tried with tcp_max_syn_backlog, somaxconn, netdev_max_backlog, and the listen() backlog param set to 50k with no change in behavior. Still produces the SYN flood warning, too.

Venosity answered 18/8, 2012 at 22:49 Comment(3)
On a side note I am very interested in the final solution to this as I need nodejs with high number of active connections also.Gaberlunzie
For what it's worth, as efficient as Node can be, 10k connections all doing something at once is more load than I would leave to a single VPS.Referential
If you're using the built-in Node.js cluster to spawn worker threads, be advised that passing a custom backlog to server.listen() will have no effect, even in the latest Node versions. There is a bug where the backlog you configure is not respected when server.listen() is called from worker processes, and instead the default 512 is used. More info: github.com/nodejs/node/pull/33827 and github.com/Unitech/pm2/issues/1786Alcoran
G
3

Are you running ab on the same machine running node? If not do you have a 1G or 10G NIC? If you are, then aren't you really trying to process 20,000 open connections?

Also if you are changing net.core.somaxconn to 10,000 you have absolutely no other sockets open on that machine? If you do then 10,000 is not high enough.

Have you tried to use nodejs cluster to spread the number of open connections per process out?

Gaberlunzie answered 18/8, 2012 at 22:59 Comment(4)
I have run ab both on localhost and from another machine. Both cases exhibit the same pause behavior, although the tests run faster on localhost. The pause behavior happens with 8k and 5k tests (and to a lesser extent with 2k tests), but I will re-run the tests with somaxconn set to 50k. I do plan to use a cluster of these machines for redundancy and load balancing, but this test is to figure how many instances I will need.Venosity
I've run the tests with parameters set to 50k and behavior is unchanged. The most surprising thing to me is the SYN flooding warning; with backlogs set to 50k I would not expect to overflow the queue.Venosity
@Venosity Take a look at this Answer serverfault.com/questions/294209/…Gaberlunzie
If you're using the built-in Node.js cluster to spawn worker threads, be advised that passing a custom backlog to server.listen() will have no effect, even in the latest Node versions. There is a bug where the backlog you configure is not respected when server.listen() is called from worker processes, and instead the default 512 is used. More info: github.com/nodejs/node/pull/33827 and github.com/Unitech/pm2/issues/1786Alcoran
A
2

I think you might find this blog post and also the previous ones useful

http://blog.caustik.com/2012/08/19/node-js-w1m-concurrent-connections/

Advantageous answered 19/8, 2012 at 10:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.