Hard downsides of long polling?

Asked 10/2, 2014 at 11:45 Answered 31/12, 2022 at 22:30

node.js networking websocket socket.io long-polling

For interactive web apps, things like Websockets are getting more popular. However, as the client, and proxy world is not always fully compliant, one usually use a complex framework like 'Socket.IO', hiding several different mechanisms for any case that may disable the other ones.

I just wonder what the downsides of a properly implemented long polling are, because with today's servers like node.js it is quite easy to implement and relies on old http technology which is well supported (despite the long polling behaveiour itself may break it).

From an high level view, long polling (despite some additional overhead, feasable for medium traffic apps) resembles a true push behaviour as WebSockets do, as the server actually send it's answer whenever he likes (despite some timeout / heartbeat mechanism).

So we have some more overhead due to the more TCP/IP acknowledgements I guess, but no constant traffic like frequent polling would do.

And using an event driven server, we would have no thread overhead to keep the connections blocked.

So is there any else hard downside that forces medium-traffic apps like chats to use WebSockets rather than long polling?

Avelinaaveline answered 10/2, 2014 at 11:45 Comment(8)

Proxy servers aren't always very happy about long polling, some cut the server connection without (immediately) reporting back to the client. On the other hand, most of the same proxies are probably even less happy about websockets :) – Hyperesthesia 10/2, 2014 at 11:48

I suspect a study with success statistics and profilings would be needed here. – Fining 10/2, 2014 at 11:49

I think for me, there need to be some total incompatibility percentage for long polling until I would go to Socket.IO with all the headaches from such complex frameworks. – Avelinaaveline 10/2, 2014 at 12:9

There is considerable overhead in http I think. This is what makes websocket attractive. Why use AJAX to poll if you can do it better ? – Nemertean 10/2, 2014 at 12:17

If I understand right, WebSockets use additional ports on both endpoints (may fail to several configurations), and require a modern browser supporting it. So if I like to use only ONE method, long polling seems more reliable. – Avelinaaveline 10/2, 2014 at 12:46

If you must use only one then go with AJAX. Websockets still have to cover some ground. They are not as robust as HTTP, for now. They even use HTTP for initiating connection. But I don't understand how they use additional ports though. – Nemertean 10/2, 2014 at 14:58

@Avelinaaveline I'm afraid you are wrong :) WebSockets use additional protocols (ws:// and wss://), not ports – Smoothtongued 10/2, 2014 at 21:3

So websockets only resemble a usual http connection in the way that they are initiated by the client calling port 80 on the server, assigning a destination port on itself? – Avelinaaveline 12/2, 2014 at 20:50

Overhead

It will create a new connection each time, so it will send the HTTP headers... including the cookie header that may be large.

Also, just "check if there is something new" is another connection for nothing. Connections implies the work of many items like firewalls, load balancers, web servers ... etc.. Probably, establish the connection is most time consuming thing as soon your IT infrastructure have several inspectors.

If you are using HTTPS, you are doing again and again the most expensive operation, the TLS handshake. TLS performance is good once the connection is established and the symmetric encryption is working, but the process of establishing the connection, key exchange and all that jazz is not fast.

Also, when connections are done, log entries are written somewhere, counters are incremented somewhere, memory is consumed, objects are created... etc... etc.. For example, the reason why we have different logging configurations when in production and in development, is because writing log entries also affect performance.

Presence

When is a long polling user connected or disconnected? If you check for this at a given moment of time... what would be the reliable amount of time you should wait to double check, to ensure it is disconnected or connected?

This may be totally irrelevant if your application just broadcast stuff, but it may be very relevant if your application is a game.

Not persistent

This is the big deal.

Since a new connection is created each time, if you have load balanced servers, in a round robbin scenario you cannot know in which server the next connection is going to fall.

When a user's server is known, like when using a WebSocket, you can push the events to that server straight away, and the server will relay them to the connection. If the user disconnects, the server can notify straight away that the user is not connected anymore, and when connect again can subscribe again.

If the server where the user is connected at the moment that an event for him is generated is unknown, you have to wait for the user to connect so then you can say "hey, user 123 is here, give me all the news since this timestamp", what make it a little bit more cumbersome. Long polling is not really push technology, but request-response, so if you plan for a EDA architecture, at some point you are going to have some level of impedance you have to address, like for example, you need a event aggregator that can give you all the events from a given timestamp (the last time that user connected to ask for news).

SignalR (I guess it is the equivalent in .NET to socket.io) for example, has a message bus named backplane, that relay all the messages to all the servers, as key for scaling out. Therefore, when a user connect to other server, "his" pending events are there "as well"(!) It is a "not too bad approach", but as you can guess, affects the throughput:

Limitations

Using a backplane, the maximum message throughput is lower than it is when clients talk directly to a single server node. That's because the backplane forwards every message to every node, so the backplane can become a bottleneck. Whether this limitation is a problem depends on the application. For example, here are some typical SignalR scenarios:

Server broadcast (e.g., stock ticker): Backplanes work well for this scenario, because the server controls the rate at which messages are sent.

Client-to-client (e.g., chat): In this scenario, the backplane might be a bottleneck if the number of messages scales with the number of clients; that is, if the rate of messages grows proportionally as more clients join.

High-frequency realtime (e.g., real-time games): A backplane is not recommended for this scenario.

For some projects, this may be a showstopper.

Some applications just broadcast general data, but others have a connection semantics, like for example a multiplayer game, and it is important to send the right events to the right connections.

IMHO

Long polling is a good solution for small projects, but became a big burden for high scalable apps that need high frecuency and/or very segmented event sending.

Claman answered 16/4, 2014 at 11:38 Comment(0)

I implemented a Node.js Express server that supported long polling. The biggest mistake I made was not cleaning up the requests which caused slowing down the server. If your server doesn't support concurrency or threads, one of the essential tasks is to set the appropriate timeouts for the requests/responses to release them from the loop, which you have to do by yourself.

Edit: Also you need to keep in mind that browsers have their specific limit for the number of connections (i.e. 6 per hostname for Google Chrome). So if you have too many long polling connections at the same time, you will probably block yourself.

Parsons answered 31/12, 2022 at 22:30 Comment(0)

Overhead

Presence

Not persistent

IMHO

Recommended topics

Hot tags