High Latency with NodeJS
Asked Answered
L

3

15

This problem pertains specifically to Nodejitsu, but similar effects seem to happen on other VPSes. I have a real time game using socket.io, and one thing I've noticed is that occasionally the server will wait an inordinate amount of time before responding. If multiple requests are sent during that timeframe, they behave as if they've all been queued up and processed at once. I suspect it's vaguely correlated to the presence of other users on the hardware shared (as is the case with any VPS).

Anyway, to test this out (and make sure that it wasn't due to my game's code), I built a minimal test case:

express = require('express')
http = require('http')

app = express()
server = http.Server(app)

io = require('socket.io').listen(server)

io.sockets.on('connection', function(sock){
    sock.on('perf', function(data, cb){
        cb([Date.now()]); //respond with the current time
    })
})

app.get('/', function(req, res){
    res.header("Access-Control-Allow-Origin", "*")
    res.header("Access-Control-Allow-Methods", "HEAD,GET,PUT,POST,DELETE")
    res.header("Access-Control-Allow-Headers", "X-Requested-With")

    res.end(JSON.stringify([Date.now().toString()])); //http equivalent of perf function
})

server.listen(process.env.PORT || 6655, function(){
    console.log('listening now')
})

I had a simple blank HTML page with socket.io which would periodically send a perf event and time how long it took for the callback to fire. And it still shows the same thing:

graph showing lag spike

Note that the bar length represents the square root of the amount of time, not the linear quantity.

When instead of relying on socket.io, I use XHR to do a similar measurement of the current response time, the result is pretty similar, a lot of low latency responses (though with a higher baseline than websockets, as expected) and some occasional spikes that appear to pile up.

The odd thing is that if you open it up in multiple browser windows and different browsers, there seems to be a correlation between the different browsers (and the fact that it's totally absent or significantly less frequent on some servers) which seems to imply that it's a server side phenomenon. However, there are latency spikes that happen for some browsers but not others, and the two Chrome windows which are of the same session appear to be virtually exact duplicates, which suggests that it's something that happens locally (per computer, or per browser, networking wise).

From Left to Right: Chrome Incognito, Chrome (regular), Firefox, Chrome (regular)

charts on four windows

Anyway, this has been confusing me for months and I'd really like to understand what is causing it and how to fix it.

Leith answered 22/6, 2013 at 2:56 Comment(5)
I'm curious if you could open a local connection directly on the server (perhaps with something like phantomjs) and perform the same measurements, if you would see similar spikes or not. I'm also curious what version of browsers you are using and whether any are falling back to flash, long polling or iframes. Looks like you're running express without sessions, so it doesn't appear to be session related GC or something like that, and you're positive that the server isn't restarting or anything (that would likely show spikes for all browsers at the same time, so probably not, but just to ask).Puissance
I'm guessing you've already been monitoring the server stats too during these? Curious if there are any correlated spikes or drops in memory or cpu at the same time. If you had access to the datacenter, you could plug into a local switch there and eliminate most of the network interference, but that's probably not an option... it would be nice if they offered a socket.io monitoring service from inside the data center.Puissance
Actually, you could write a local socket.io node client and run it locally on the same server, and measure that too. Sorry to spam you so much, performance problems can be like a needle in a haystack, so just trying to throw out everything I can think of that might help narrow down the problem to some specific area.Puissance
Have you tried running it on your own local server to see if it's an issue with node?Electorate
Have you tried a different implementation, possibly using pubnub or a service like that. You would be able to narrow down the scope of the problem. Also you didn't answer the above comment about trying it locally?Alleyway
S
2

I assume you checked if you have a cpu or ram issue.

The only thing that can slow down node in a "surprising" way is the garbage collector - try to run your node with the --trace* to see what is going on. (See node --v8-options.)

I personally assue that you don't find out anything from that, because - and thats just my feeling - the issue is somewhere else.

With that perfect delay of a multiply of 500ms I assume you have a packet loss. You can check with ifconfig if that is a general issue and then tcpdump the packets and see if they retransmit.

Surgeonfish answered 29/6, 2013 at 6:41 Comment(1)
While I have only done a bit with node, I have written a fair bit of Java server code. One of the main thing we worried about with latency, was garbage collection. The problem with many VM based languages (like Javascript/Node) is that latency is only as predicable as the VM allows it to be. In the case of Java, we would often work hard to minimize Garbage Collection, not because of latency, but because of latency spikes. I would definitely take a look at GC just in case.Brownout
R
1

The reason why you see this is because of Nagle's Algorithm. It's an algorithm used on I/O that buffers data for a while and then sends bigger chunks of data. It is used to save you transmissions (in sockets). You can read more about it here http://en.wikipedia.org/wiki/Nagle's_algorithm

To disable Nagle's algorithm (good when you want to send lots of small requests as fast as possible) you can do socket.setNoDelay(true); if you're using net.Socket() . In the case of socket.io I believe Nagle is already disabled by default for Websockets but not necessarily for other protocols. I would recommend running a test with net.Sockets from node.js, disable Nagle and see what you get.

Rheims answered 29/6, 2013 at 15:47 Comment(1)
Nagle works only on a per connection basis and concatenate small packages to big ones. That can't be the issue here as you have a new connection for every single call. Nagle only can become an issue when you have realtime streaming.Surgeonfish
M
0

I know this may sound strange but have you consider it is not an issue with node but with the OS setting. Have you checked your file handles and the number of connections the OS is showing to the socket? Have you also made sure the socket timeout in the OS is low enough? I have run into similar sounding performance issues with other code and it turned out to be the OS and not the code. Also check the package and see what it has for open allowed connections on the socket. I ave not looked at the node code but ran into a similar issue with the http client library in java. The application just backed up and it was just a configuration issue with number of connections.

Maroc answered 27/6, 2013 at 14:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.