TL;DR - The protocol will handle what is called "thin streams" and they are quite well documented, if my answer will not be enough. The biggest advantage should come from no_delay(true)
and async
reads/writes (for normal operation) and dupACK and linear timeouts (for failure-recovery). For more details (including static/server TCP options) and additional remarks see below.
In general I would go about choosing these options by considering the following:
- What is my usecase? In your case a long lasting (how long?), connection over which small messages will be sent w/o buffering. A small keep-alive footprint is needed. This seems to the classic "thin streams" example.
- What would be the best transport layer protocol to use? https://en.wikipedia.org/wiki/Transport_layer#Protocols - there is a bunch each with their own use cases. At this point, I assume, you really need TCP for reliability and connection-orientation, otherwise, udp-based protocols might be better (an example would be UDP-lite, which allows partial checksums and underlying reliability decision at the application layer (or the layer that you, as a developer would implement).
Having chosen the underlying protocol on which I want to build - investigate tuning options 4 that protocol. For TCP those are:
- Nagle's algorithm - data buffering, you correctly turned it OFF.
- Delayed ACK - combines ACK, useful for telnet-like applications, where it is not necessary to send ACK's for every character transmitted. TCP_QUICKACK if you need the opposite - ACK is sent immediately. In case you send data very rarely it might be useful.
- Keepalive probes - I see you use quite short values. Not sure how you decided on those particular values, but you might consider extending them, to keep "minimal overhead as possible to "keep alive". The defaults for linux: 7200, 75, 9.
- PSH flag - useful for understanding, largely unused / ignored / irrelevant.
- URG flag - forwards the urgent data on a separate channel to the application, usefull if you are planning to receive data out-of-band (some control data, like cancellation). Probably not useful in your case, since there is little room for OOB data in case of "thin streams".
- TCP Windows (RWND/CWND) - not applicable for small, rarely sent messages. The windows should be enough to accommodate the data.
- Window size after idle (SSR) -
Not surprisingly, SSR can have a significant impact on performance of long-lived TCP connections that may idle for bursts of time — e.g., due to user inactivity. As a result, it is generally recommended to disable SSR on the server to help improve performance of long-lived HTTP connections.
Taken from here. The option: sysctl -w tcp_slow_start_after_idle=0
- TCP fast re-transmit -
tcp_thin_dupack
should be ON. It reduces the time a sender waits before re-transmitting a lost segment. Be careful to read and experiment with the precautions (can be specified per socket, see point immediatelly below).
tcp_thin_linear_timeouts
- this allows for faster recovery on packet loss, it can be specified per socket: https://nnc3.com/mags/LJ_1994-2014/LJ/219/11180.html
TFO_FASTOPEN
(TFO): - shortens the initial connection establishment. Not very applicable for long lived connections, but could be considered.
- Compression - according to the information I see, it should not be used in your case (not a TCP option, can be added on top of TCP) since it will add latency which I believe you are avoiding. Adding this options in case that it untrue.
Some infrastructure details that the application should handle or the protocol documentation could specify.
- For long lasting connections if they are terminated by the server side TIME_WAIT state will be important. TIME_WAIT penalty is incurred by the side that starts the connection termination, so depending on your application / protocol usage this might be a consideration. This is dependent on how you will handle connection termination.
- Ephemeral Ports - maybe increasing ephemeral port count to accommodate those long lasting connections will be useful, not sure. This is a possible documentation bullet point for your protocol.
If your protocol is tuned for telnet like communication, you can see this telnet implementation. Basically it's full of async writes and reads:
https://lists.boost.org/boost-users/att-40895/telnet.cpp
Some nice reads:
https://www.extrahop.com/company/blog/2016/tcp-nodelay-nagle-quickack-best-practices/
https://sourceforge.net/p/asio/mailman/asio-users/?page=257 - for additional help.