How do SO_REUSEADDR and SO_REUSEPORT differ?
Asked Answered
T

2

863

The man pages and programmer documentations for the socket options SO_REUSEADDR and SO_REUSEPORT are different for different operating systems and often highly confusing. Some operating systems don't even have the option SO_REUSEPORT. The WWW is full of contradicting information regarding this subject and often you can find information that is only true for one socket implementation of a specific operating system, which may not even be explicitly mentioned in the text.

So how exactly is SO_REUSEADDR different than SO_REUSEPORT?

Are systems without SO_REUSEPORT more limited?

And what exactly is the expected behavior if I use either one on different operating systems?

Thurmond answered 17/1, 2013 at 21:45 Comment(0)
T
2158

Welcome to the wonderful world of portability... or rather the lack of it. Before we start analyzing these two options in detail and take a deeper look how different operating systems handle them, it should be noted that the BSD socket implementation is the mother of all socket implementations. Basically all other systems copied the BSD socket implementation at some point in time (or at least its interfaces) and then started evolving it on their own. Of course the BSD socket implementation was evolved as well at the same time and thus systems that copied it later got features that were lacking in systems that copied it earlier. Understanding the BSD socket implementation is the key to understanding all other socket implementations, so you should read about it even if you don't care to ever write code for a BSD system.

There are a couple of basics you should know before we look at these two options. A TCP/UDP connection is identified by a tuple of five values:

{<protocol>, <src addr>, <src port>, <dest addr>, <dest port>}

Any unique combination of these values identifies a connection. As a result, no two connections can have the same five values, otherwise the system would not be able to distinguish these connections any longer.

The protocol of a socket is set when a socket is created with the socket() function. The source address and port are set with the bind() function. The destination address and port are set with the connect() function. Since UDP is a connectionless protocol, UDP sockets can be used without connecting them. Yet it is allowed to connect them and in some cases very advantageous for your code and general application design. In connectionless mode, UDP sockets that were not explicitly bound when data is sent over them for the first time are usually automatically bound by the system, as an unbound UDP socket cannot receive any (reply) data. Same is true for an unbound TCP socket, it is automatically bound before it will be connected.

If you explicitly bind a socket, it is possible to bind it to port 0, which means "any port". Since a socket cannot really be bound to all existing ports, the system will have to choose a specific port itself in that case (usually from a predefined, OS specific range of source ports). A similar wildcard exists for the source address, which can be "any address" (0.0.0.0 in case of IPv4 and :: in case of IPv6). Unlike in case of ports, a socket can really be bound to "any address" which means "all source IP addresses of all local interfaces". If the socket is connected later on, the system has to choose a specific source IP address, since a socket cannot be connected and at the same time be bound to any local IP address. Depending on the destination address and the content of the routing table, the system will pick an appropriate source address and replace the "any" binding with a binding to the chosen source IP address.

By default, no two sockets can be bound to the same combination of source address and source port. As long as the source port is different, the source address is actually irrelevant. Binding socketA to ipA:portA and socketB to ipB:portB is always possible if ipA != ipB holds true, even when portA == portB. E.g. socketA belongs to a FTP server program and is bound to 192.168.0.1:21 and socketB belongs to another FTP server program and is bound to 10.0.0.1:21, both bindings will succeed. Keep in mind, though, that a socket may be locally bound to "any address". If a socket is bound to 0.0.0.0:21, it is bound to all existing local addresses at the same time and in that case no other socket can be bound to port 21, regardless which specific IP address it tries to bind to, as 0.0.0.0 conflicts with all existing local IP addresses.

Anything said so far is pretty much equal for all major operating system. Things start to get OS specific when address reuse comes into play. We start with BSD, since as I said above, it is the mother of all socket implementations.

BSD

SO_REUSEADDR

If SO_REUSEADDR is enabled on a socket prior to binding it, the socket can be successfully bound unless there is a conflict with another socket bound to exactly the same combination of source address and port. Now you may wonder how is that any different than before? The keyword is "exactly". SO_REUSEADDR mainly changes the way how wildcard addresses ("any IP address") are treated when searching for conflicts.

Without SO_REUSEADDR, binding socketA to 0.0.0.0:21 and then binding socketB to 192.168.0.1:21 will fail (with error EADDRINUSE), since 0.0.0.0 means "any local IP address", thus all local IP addresses are considered in use by this socket and this includes 192.168.0.1, too. With SO_REUSEADDR it will succeed, since 0.0.0.0 and 192.168.0.1 are not exactly the same address, one is a wildcard for all local addresses and the other one is a very specific local address. Note that the statement above is true regardless in which order socketA and socketB are bound; without SO_REUSEADDR it will always fail, with SO_REUSEADDR it will always succeed.

To give you a better overview, let's make a table here and list all possible combinations:

SO_REUSEADDR       socketA        socketB       Result
---------------------------------------------------------------------
  ON/OFF       192.168.0.1:21   192.168.0.1:21    Error (EADDRINUSE)
  ON/OFF       192.168.0.1:21      10.0.0.1:21    OK
  ON/OFF          10.0.0.1:21   192.168.0.1:21    OK
   OFF             0.0.0.0:21   192.168.1.0:21    Error (EADDRINUSE)
   OFF         192.168.1.0:21       0.0.0.0:21    Error (EADDRINUSE)
   ON              0.0.0.0:21   192.168.1.0:21    OK
   ON          192.168.1.0:21       0.0.0.0:21    OK
  ON/OFF           0.0.0.0:21       0.0.0.0:21    Error (EADDRINUSE)

The table above assumes that socketA has already been successfully bound to the address given for socketA, then socketB is created, either gets SO_REUSEADDR set or not, and finally is bound to the address given for socketB. Result is the result of the bind operation for socketB. If the first column says ON/OFF, the value of SO_REUSEADDR is irrelevant to the result.

Okay, SO_REUSEADDR has an effect on wildcard addresses, good to know. Yet that isn't its only effect it has. There is another well known effect which is also the reason why most people use SO_REUSEADDR in server programs in the first place. For the other important use of this option we have to take a deeper look on how the TCP protocol works.

If a TCP socket is being closed, normally a 3-way handshake is performed; the sequence is called FIN-ACK. The problem here is, that the last ACK of that sequence may have arrived on the other side or it may not have arrived and only if it has, the other side also considers the socket as being fully closed. To prevent re-using an address+port combination, that may still be considered open by some remote peer, the system will not immediately consider a socket as dead after sending the last ACK but instead put the socket into a state commonly referred to as TIME_WAIT. It can be in that state for minutes (system dependent setting). On most systems you can get around that state by enabling lingering and setting a linger time of zero1 but there is no guarantee that this is always possible, that the system will always honor this request, and even if the system honors it, this causes the socket to be closed by a reset (RST), which is not always a great idea. To learn more about linger time, have a look at my answer about this topic.

The question is, how does the system treat a socket in state TIME_WAIT? If SO_REUSEADDR is not set, a socket in state TIME_WAIT is considered to still be bound to the source address and port and any attempt to bind a new socket to the same address and port will fail until the socket has really been closed. So don't expect that you can rebind the source address of a socket immediately after closing it. In most cases this will fail. However, if SO_REUSEADDR is set for the socket you are trying to bind, another socket bound to the same address and port in state TIME_WAIT is simply ignored, after all its already "half dead", and your socket can bind to exactly the same address without any problem. In that case it plays no role that the other socket may have exactly the same address and port. Note that binding a socket to exactly the same address and port as a dying socket in TIME_WAIT state can have unexpected, and usually undesired, side effects in case the other socket is still "at work", but that is beyond the scope of this answer and fortunately those side effects are rather rare in practice.

There is one final thing you should know about SO_REUSEADDR. Everything written above will work as long as the socket you want to bind to has address reuse enabled. It is not necessary that the other socket, the one which is already bound or is in a TIME_WAIT state, also had this flag set when it was bound. The code that decides if the bind will succeed or fail only inspects the SO_REUSEADDR flag of the socket fed into the bind() call, for all other sockets inspected, this flag is not even looked at.

SO_REUSEPORT

SO_REUSEPORT is what most people would expect SO_REUSEADDR to be. Basically, SO_REUSEPORT allows you to bind an arbitrary number of sockets to exactly the same source address and port as long as all prior bound sockets also had SO_REUSEPORT set before they were bound. If the first socket that is bound to an address and port does not have SO_REUSEPORT set, no other socket can be bound to exactly the same address and port, regardless if this other socket has SO_REUSEPORT set or not, until the first socket releases its binding again. Unlike in case of SO_REUSEADDR the code handling SO_REUSEPORT will not only verify that the currently bound socket has SO_REUSEPORT set but it will also verify that the socket with a conflicting address and port had SO_REUSEPORT set when it was bound.

SO_REUSEPORT does not imply SO_REUSEADDR. This means if a socket did not have SO_REUSEPORT set when it was bound and another socket has SO_REUSEPORT set when it is bound to exactly the same address and port, the bind fails, which is expected, but it also fails if the other socket is already dying and is in TIME_WAIT state. To be able to bind a socket to the same addresses and port as another socket in TIME_WAIT state requires either SO_REUSEADDR to be set on that socket or SO_REUSEPORT must have been set on both sockets prior to binding them. Of course it is allowed to set both, SO_REUSEPORT and SO_REUSEADDR, on a socket.

There is not much more to say about SO_REUSEPORT other than that it was added later than SO_REUSEADDR, that's why you will not find it in many socket implementations of other systems, which "forked" the BSD code before this option was added, and that there was no way to bind two sockets to exactly the same socket address in BSD prior to this option.

Connect() Returning EADDRINUSE?

Most people know that bind() may fail with the error EADDRINUSE, however, when you start playing around with address reuse, you may run into the strange situation that connect() fails with that error as well. How can this be? How can a remote address, after all that's what connect adds to a socket, be already in use? Connecting multiple sockets to exactly the same remote address has never been a problem before, so what's going wrong here?

As I said on the very top of my reply, a connection is defined by a tuple of five values, remember? And I also said, that these five values must be unique otherwise the system cannot distinguish two connections any longer, right? Well, with address reuse, you can bind two sockets of the same protocol to the same source address and port. That means three of those five values are already the same for these two sockets. If you now try to connect both of these sockets also to the same destination address and port, you would create two connected sockets, whose tuples are absolutely identical. This cannot work, at least not for TCP connections (UDP connections are no real connections anyway). If data arrived for either one of the two connections, the system could not tell which connection the data belongs to. At least the destination address or destination port must be different for either connection, so that the system has no problem to identify to which connection incoming data belongs to.

So if you bind two sockets of the same protocol to the same source address and port and try to connect them both to the same destination address and port, connect() will actually fail with the error EADDRINUSE for the second socket you try to connect, which means that a socket with an identical tuple of five values is already connected.

Multicast Addresses

Most people ignore the fact that multicast addresses exist, but they do exist. While unicast addresses are used for one-to-one communication, multicast addresses are used for one-to-many communication. Most people got aware of multicast addresses when they learned about IPv6 but multicast addresses also existed in IPv4, even though this feature was never widely used on the public Internet.

The meaning of SO_REUSEADDR changes for multicast addresses as it allows multiple sockets to be bound to exactly the same combination of source multicast address and port. In other words, for multicast addresses SO_REUSEADDR behaves exactly as SO_REUSEPORT for unicast addresses. Actually, the code treats SO_REUSEADDR and SO_REUSEPORT identically for multicast addresses, that means you could say that SO_REUSEADDR implies SO_REUSEPORT for all multicast addresses and the other way round.


FreeBSD/OpenBSD/NetBSD

All these are rather late forks of the original BSD code, that's why they all three offer the same options as BSD and they also behave the same way as in BSD.


macOS (MacOS X)

At its core, macOS is simply a BSD-style UNIX named "Darwin", based on a rather late fork of the BSD code (BSD 4.3), which was then later on even re-synchronized with the (at that time current) FreeBSD 5 code base for the Mac OS 10.3 release, so that Apple could gain full POSIX compliance (macOS is POSIX certified). Despite having a microkernel at its core ("Mach"), the rest of the kernel ("XNU") is basically just a BSD kernel, and that's why macOS offers the same options as BSD and they also behave the same way as in BSD.

iOS / watchOS / tvOS

iOS is just a macOS fork with a slightly modified and trimmed kernel, somewhat stripped down user space toolset and a slightly different default framework set. watchOS and tvOS are iOS forks, that are stripped down even further (especially watchOS). To my best knowledge they all behave exactly as macOS does.


Linux

Linux < 3.9

Prior to Linux 3.9, only the option SO_REUSEADDR existed. This option behaves generally the same as in BSD with two important exceptions:

  1. As long as a listening (server) TCP socket is bound to a specific port, the SO_REUSEADDR option is entirely ignored for all sockets targeting that port. Binding a second socket to the same port is only possible if it was also possible in BSD without having SO_REUSEADDR set. E.g. you cannot bind to a wildcard address and then to a more specific one or the other way round, both is possible in BSD if you set SO_REUSEADDR. What you can do is you can bind to the same port and two different non-wildcard addresses, as that's always allowed. In this aspect Linux is more restrictive than BSD.

  2. The second exception is that for client sockets, this option behaves exactly like SO_REUSEPORT in BSD, as long as both had this flag set before they were bound. The reason for allowing that was simply that it is important to be able to bind multiple sockets to exactly to the same UDP socket address for various protocols and as there used to be no SO_REUSEPORT prior to 3.9, the behavior of SO_REUSEADDR was altered accordingly to fill that gap. In that aspect Linux is less restrictive than BSD.

Linux >= 3.9

Linux 3.9 added the option SO_REUSEPORT to Linux as well. This option behaves exactly like the option in BSD and allows binding to exactly the same address and port number as long as all sockets have this option set prior to binding them.

Yet, there are still two differences to SO_REUSEPORT on other systems:

  1. To prevent "port hijacking", there is one special limitation: All sockets that want to share the same address and port combination must belong to processes that share the same effective user ID! So one user cannot "steal" ports of another user. This is some special magic to somewhat compensate for the missing SO_EXCLBIND/SO_EXCLUSIVEADDRUSE flags.

  2. Additionally the kernel performs some "special magic" for SO_REUSEPORT sockets that isn't found in other operating systems: For UDP sockets, it tries to distribute datagrams evenly, for TCP listening sockets, it tries to distribute incoming connect requests (those accepted by calling accept()) evenly across all the sockets that share the same address and port combination. Thus an application can easily open the same port in multiple child processes and then use SO_REUSEPORT to get a very inexpensive load balancing.


Android

Even though the whole Android system is somewhat different from most Linux distributions, at its core works a slightly modified Linux kernel, thus everything that applies to Linux should apply to Android as well.


Windows

Windows only knows the SO_REUSEADDR option, there is no SO_REUSEPORT. Setting SO_REUSEADDR on a socket in Windows behaves like setting SO_REUSEPORT and SO_REUSEADDR on a socket in BSD, with one exception:

Prior to Windows 2003, a socket with SO_REUSEADDR could always been bound to exactly the same source address and port as an already bound socket, even if the other socket did not have this option set when it was bound. This behavior allowed an application "to steal" the connected port of another application. Needless to say that this has major security implications!

Microsoft realized that and added another important socket option: SO_EXCLUSIVEADDRUSE. Setting SO_EXCLUSIVEADDRUSE on a socket makes sure that if the binding succeeds, the combination of source address and port is owned exclusively by this socket and no other socket can bind to them, not even if it has SO_REUSEADDR set.

This default behavior was changed first in Windows 2003, Microsoft calls that "Enhanced Socket Security" (funny name for a behavior that is default on all other major operating systems). For more details just visit this page. There are three tables: The first one shows the classic behavior (still in use when using compatibility modes!), the second one shows the behavior of Windows 2003 and up when the bind() calls are made by the same user, and the third one when the bind() calls are made by different users.


Solaris

Solaris is the successor of SunOS. SunOS was originally based on a fork of BSD, SunOS 5 and later was based on a fork of SVR4, however SVR4 is a merge of BSD, System V, and Xenix, so up to some degree Solaris is also a BSD fork, and a rather early one. As a result Solaris only knows SO_REUSEADDR, there is no SO_REUSEPORT. The SO_REUSEADDR behaves pretty much the same as it does in BSD. As far as I know there is no way to get the same behavior as SO_REUSEPORT in Solaris, that means it is not possible to bind two sockets to exactly the same address and port.

Similar to Windows, Solaris has an option to give a socket an exclusive binding. This option is named SO_EXCLBIND. If this option is set on a socket prior to binding it, setting SO_REUSEADDR on another socket has no effect if the two sockets are tested for an address conflict. E.g. if socketA is bound to a wildcard address and socketB has SO_REUSEADDR enabled and is bound to a non-wildcard address and the same port as socketA, this bind will normally succeed, unless socketA had SO_EXCLBIND enabled, in which case it will fail regardless the SO_REUSEADDR flag of socketB.


Other Systems

In case your system is not listed above, I wrote a little test program that you can use to find out how your system handles these two options. Also if you think my results are wrong, please first run that program before posting any comments and possibly making false claims.

All that the code requires to build is a bit POSIX API (for the network parts) and a C99 compiler (actually most non-C99 compiler will work as well as long as they offer inttypes.h and stdbool.h; e.g. gcc supported both long before offering full C99 support).

All that the program needs to run is that at least one interface in your system (other than the local interface) has an IP address assigned and that a default route is set which uses that interface. The program will gather that IP address and use it as the second "specific address".

It tests all possible combinations you can think of:

  • TCP and UDP protocol
  • Normal sockets, listen (server) sockets, multicast sockets
  • SO_REUSEADDR set on socket1, socket2, or both sockets
  • SO_REUSEPORT set on socket1, socket2, or both sockets
  • All address combinations you can make out of 0.0.0.0 (wildcard), 127.0.0.1 (specific address), and the second specific address found at your primary interface (for multicast it's just 224.1.2.3 in all tests)

and prints the results in a nice table. It will also work on systems that don't know SO_REUSEPORT, in which case this option is simply not tested.

What the program cannot easily test is how SO_REUSEADDR acts on sockets in TIME_WAIT state as it's very tricky to force and keep a socket in that state. Fortunately most operating systems seems to simply behave like BSD here and most of the time programmers can simply ignore the existence of that state.

Here's the code (I cannot include it here, answers have a size limit and the code would push this reply over the limit).

Thurmond answered 17/1, 2013 at 21:45 Comment(86)
There's a TON of good information here. Sadly some of the details are wrong.Clink
For example, "source address" really should be "local address", the next three fields likewise. Binding with INADDR_ANY doesn't bind existing local addresses, but all future ones as well. listen certainly creates sockets with the same exact protocol, local address, and local port, even though you said that isn't possible.Clink
@Ben Source and Destination are the official terms used for IP addressing (to which I primary refer). Local and Remote would make no sense, since the Remote address can in fact be a "Local" address and the opposite of Destination is Source and not Local. I don't know what your issue is with INADDR_ANY, I never said it would not bind to future addresses. And listen does not create any sockets at all, which makes your whole sentence a little bit strange.Thurmond
@Mecki: It's actually the listen/accept pair that works together to create sockets. As far as source and destination, sockets are bidirectional. Identifying the socket using the terminology from only half the packets is bad. A packet has a source and destination address, but a connection (and therefore also a connected socket) has a local and a peer address.Clink
And where you said about INADDR_ANY -- "it is bound to all existing local addresses" should be changed to "it is bound to all local addresses, existing and future"Clink
@Ben When a new address is added to the system, it is also an "existing local address", it just started to exist. I did not say "to all currently existing local addresses". Actually I even say that the socket is in fact really bound to the wildcard, which means the socket is bound to whatever matches this wildcard, now, tomorrow and in hundred years. Similar for source and destination, you are just nitpicking here. Do you have any real technical contribution to make?Thurmond
@Mecki: You really think that the word existing includes things that do not exist now but will in the future? Source and destination is not a nitpick. When incoming packets are matched to a socket, you're saying that the destination address in the packet will be matched against a "source" address of the socket? That's wrong and you know it, you already said that source and destination are opposites. The local address on the socket is matched against the destination address of incoming packets, and placed in the source address on outgoing packets.Clink
And you still haven't dealt with listen/accept, which creates a huge exception to your claims concerning uniqueness of the protocol/source address/source port set (more properly protocol/local address/local port)Clink
@Ben If existing is confusing you, and I think you are the only one it is confusing, go and change it; I won't. And the dest addr of an incoming packet IS matched against the src addr of the socket, that's not wrong, since the src addr of a socket in my answer IS exactly what you call the local addr. A socket just has an address it is bound to and whether you name it src addr or local addr doesn't change the fact what it is, how it works, or what it is used for.Thurmond
@Mecki's answer makes sense @BenVoigt. You're right Ben: the source on one machine is the destination on another and vice versa, but irrelevant in this case. Mecki is talking about SO_REUSEPORT and SO_REUSEADDR on one machine, not on two machines; from the perspective of a single machine (after all, in most cases you only control one machine (your server) which talks to other machines you don't control (clients)).Trampoline
@trusktr: No, it's not "the source on one machine is the destination on the other". From the perspective of a single machine, "the source on incoming packets is the destination on outgoing packets, and all of them pass through the same socket". So you can't generalize to saying "the source address for this socket". What you have is a local address, and a peer address, and that perspective only changes when you move to the other end of the connection.Clink
@Ben You are talking about packets, but I've never been talking about packets, I was talking about sockets. The source address of the socket is the source address of outgoing packets and the destination address of incoming packets, but it is always the source address for the socket, because in one case the socket is the sender and in the other case it is the recipient and depending on its role, the terms source and destination have different meanings.Thurmond
@Mecki: That makes so much more sense if you say "The local address of the socket is the source address of outgoing packets and the destination address of incoming packets". Packets have source and destination addresses. Hosts, and sockets on hosts, don't. For datagram sockets both peers are equal. For TCP sockets, because of the three-way handshake, there's an originator (client) and a responder (server), but that still doesn't mean the connection or connected sockets have a source and destination either, because traffic flows both ways.Clink
'Note that binding a socket to exactly the same address and port as dying socket in TIME_WAIT state can have unexpected, and usually undesired, side effects in case the other socket is still "at work"' Can somebody make me an example of what could happen? Or where can I find more info?Deception
@antox E.g. if the socket in TIME_WAIT state receives another packet (which may still happen, it isn't dead yet), this packet may be consumed by the other socket bound to the same address, which means the TIME_WAIT socket never sees it (even though it was meant for it) and the other socket may incorrectly reacts to this packet (e.g. act according to the TCP header flags, goes into an error state, closes itself, etc.).Thurmond
Apparently SO_REUSEPORT now exists in Solaris. It may have been added in Solaris 11.Sisterhood
Windows ignores TIME_WAIT connections by default, so you don't need to use the SO_REUSEADDR option on a server program before bind to an address and port.Gunslinger
@Gunslinger Do you have an official source for that (e.g. is it in the docs somewhere) or have you just found out about that fact by using the API?Thurmond
@Thurmond It's just a simple fact, and you can test it by yourself easily. I have been write many network service for windows, and I even know many windows only programers, who do not even know about this option. If you only live in the msw world, you do not need to know this option almostly. :-)Gunslinger
@Gunslinger you still need to use SO_REUSEADDR if you hope to bind two processes to the same addr/port on Windows.Orleans
@Orleans that's right, I just said: "Windows ignores TIME_WAIT connections by default..." :-)Gunslinger
@Mecki, Where did you get the term "Linger Time" from?Weksler
@Weksler The option is named "Linger" and it is a time value, hence the "Linger Time". Search for "linger time" and you will get plenty of hits all over the Internet (including hits on the IETF mailing list)Thurmond
minor nit: TIME_WAIT isn't for the send buffer - its for the recv buffer. Since TCP packets for a maximum of up to 2 minutes acc' to the RFC there may still be packets in flight for the same connections. If application A opens a socket and starts talking on cxtn C and then application B opens the same socket it may get data destined for A. The Send buffer is irrelevant here.Lynching
@Mecki, Just a couple of clarifications: Q1) On BSD, if socket-A has already bound to 192.168.1.0:21 using SO_REUSEADDR, can socket-B come along and try to bind to 0.0.0.0:21 without using SO_REUSEADDR? Q2) On BSD, if socket-A has already bound to 192.168.1.0:21 using both SO_REUSEPORT and SO_REUSEADDR, can socket-B with SO_REUSEPORT but without SO_REUSEADDR come along and try to bind to 0.0.0.0:21?Weksler
Q3) Is it that on BSD there's absolutely no way to get 2 connections of all-5-tuple equal even with SO_REUSEPORT? And this is true of all OS too? Q4) You wrote that if a socket is connected to 0.0.0.0, the system will pick "an appropriate source address". Is it right to say that it's always either 127.0.0.1 or ::1? Q5) You wrote that to prevent port hijacking, Linux only allows SO_REUSEPORT for processes that share the same effective user ID. What about BSD's defenses against port hijacking?Weksler
What linux application can I use to test the REUSEPORT? Neither netcat not socat allow doing that. Thank youOverexert
@Thurmond For LINUX Two sockets can be bound to the same local/source ip address (wildcard address too) and port if they both have SO_REUSEADDR option set, except when one of them is already listening for connections before binding to the second port. If you bind a socket to a specific ip address and port then listen on that socket and then try to bind another socket to the same specific ip address and port, the second call to bind will fail. BUT you can first bind the two sockets to the same ip address and port and then can listen on any one of them.Pasteboard
@Thurmond And in the contrary to what you said in your answer for LINUX, it is not possible to first bind a listening TCP socket to a specific IP address and port combination and later on bind another one to a wildcard IP address and the same port. I tried that and the second call to bind failed. So you cannot bind a socket to the same address and port as another socket or wildcard address and port, if the other socket is already listening for connections even if both the sockets have SO_REUSEADDR option set. For this, you'll need SO_REUSEPORT option set for both the sockets.Pasteboard
Awesome answer. A minor point from my practical experience: > you could say that SO_REUSEADDR implies SO_REUSEPORT for all multicast addresses and the other way round. This is not right and you have to set SO_REUSEPORT explicitly. Try on Mac OS :) #define SO_REUSEADDR 0x0004 /* allow local address reuse */ #define SO_REUSEPORT 0x0200 /* allow local address & port reuse */Haemocyte
@Pasteboard Post owner is always informed of comments, no need for @Mecki. That listening blocks all reuse has always been in my reply (check history, that was my first exception). How could you overlook that? And that ADDRREUSE works like PORTREUSE for some cases was there as well (but I admit, not covering all cases). I made the text now more foolproof but I didn’t really revise any statements, just added more details. I don't see how your comment ever contradicted anything I've wrote, basically you just pointed out how Linux is different and that's the same I've done already.Thurmond
@Pasteboard E.g. And in the contrary to what you said in your answer for LINUX, it is not possible to first bind a listening TCP socket to a specific IP address and port combination and later on bind another one to a wildcard IP address and the same port. - where in the world did I ever claim so? Please provide the exact quote (all edit history is available). I said this is possible for BSD and it is, I never said this is possible for Linux, did I? I said the opposite, I said once a socket is listening, you cannot bind with just REUSESADDR, contrary to BSD (you can with REUSEPORT!)Thurmond
@Haemocyte Like I said in your other answer (I lust don't want to leave your comment uncommented here), I did test it and it works exactly like I said. Here's the code (still WIP, I try to make it even more cross platform): codepad.org/PJbOp5gW and here's the output for the code multicast loop codepad.org/mTZeJajB Everyone with a Mac is free to verify that result. W/O either option set, it always fails but it always succeeds with one option set and it doesn't matter which one.Thurmond
@Mecki: Here's the code for my subscriber/listener: codepad.org/GY8HSF7X And the code for my publisher: codepad.org/pkS0M2v5 When using SO_REUSEADDR , I get bind: Address already in use. When using SO_REUSEPORT, the problem goes away. I'm on MacOS Sierra.Haemocyte
@Haemocyte You are binding the socket to INADDR_ANY and not to a multicast address. Binding two sockets to "any address" is not possible with just SO_REUSEADDR, but that's also what my answer says. What you do is to bind a socket to any address and then joining the multicast group, which is allowed but it is not the same as binding a socket to a multicast address. When you bind that socket, how would the system know that this is going to be a multicast socket?Thurmond
@Haemocyte I didn't say your code is wrong, I just said your code binds to INADDRY_ANY and thus what I wrote about multicast addresses will not apply as INADDR_ANY is no multicast address, it is a wildcard address and thus the rules for wildcard addresses apply. What I said about multicast addresses was for the case when you bind a port to a multicast address and then both options behave the same. So please stop that stupid arguing here. What I say is correct, Period! Comments are no place for arguments, take your questions to an own question if you have one.Thurmond
@Haemocyte Binding to a multicast address is not the same as joining a multicast group. If you don't know what the difference is, you miss fundamental knowledge about multicast. When joining a multicast group, none of these flags even ever play any role. These flags only play a role when binding a socket and binding and joining a group are two entirely different things. So I really don't understand what you are intending with those comments, other than showing the world your lack of understanding multicast.Thurmond
Great answer! Really! Bounty awarded (in 24h .. :) ). So, may we say, that setting SO_REUSEADDR on Linux will behave the same way as Windows without this option only in respect to "port reuse" (ignoring the wildcard cases)?Matelda
@KirilKirov See this page: msdn.microsoft.com/en-us/library/windows/desktop/… If you scroll down a bit, there is a table showing you all combinations between two bind calls, either one having either flag set or not and even distinguishing wildcard and specific address. I'll add that link to my reply. According to table your assumption is wrong: Without any flag Windows allows no reuse whatsoever, Linux will allow certain kind of reuse.Thurmond
@KirilKirov Oh, but when I scroll down further, it shows the behavior has changed with Windows 2003, so starting with that version, the behavior is more similar to Linux, actually even more permissive but only as long as it's the same user performing both bind calls (if it's not the same user, the behavior is similar to Linux w/o any flags). So all in all this makes Windows among the complicated systems, as the behavior depends on OS version as well as on which user makes which call.Thurmond
when SO_REUSEADDR is on and bind Socket A and Socket B with the following address "ON 0.0.0.0:21 192.168.1.0:21 OK" like the answer above, when there is a remote client connect to 192.168.1.0:21, which local socket receive? SocketA or Socket B?Screwball
@Screwball Just FYI: If your network is 192.168.1.0/24, then 192.168.1.0 is no valid host address, it is the address of the network itself and must not be used by any host. Otherwise, if two sockets both bind for the same port, one with a wildcard and one with a specific address and someone from outside connects to this port and the specific address, then the socket with the specific address usually wins over the socket with the wildcard address (in networking, the most specific match usually wins), but in the end, it's not specified and thus can be OS specific.Thurmond
@BenVoigt is right on about this answer. There is a lot of good stuff here, but some of it is misleading and poorly stated. Using any REUSE options starts getting into OS-specific behavior, but generally allowed binding works something like this: TCP-listener sockets must have a unique localaddr/port combo, TCP-connected sockets must have a unique localaddr/lport/remoteaddr/rport combo, and UDP sockets must have a unique localaddr/port combo. You can of course override those rules as you start using REUSE flags, but don't expect the same behavior from different operating systems with REUSE!Orleans
I replaced the part about TIME-WAIT, which, as @GoodPerson states above, was totally incorrect. It doesn't have anything to do with outbound data, and the citation that was provided doesn't state otherwise. That citation was actually a section of RFC 793, which is what should have been cited (and understood) in the first place. It doesn't have anything to do with SO_LINGER or linger timeouts either.Cystoid
@Cystoid May explanation for Linger time may not be right, but I can prove that a socket with Linger time set will hang in Time Wait whereas a socket with no Linger time never will. So you could have fixed just that one statement about Linger time but throwing out the whole paragraph destroys an otherwise perfect answer.Thurmond
You didn't 'restore the answer to its correct state'. You put your errors back. Linger time has nothing to do with TIME-WAIT time, and neither does pending outgoing data. Read the RFC section you cited, which doesn't mention 'linger' at all. Removed my ancient UV and placed DV for this egregious mistake, and for using a citation that doesn't support your claims. Last time I try to help you too.Cystoid
@Cystoid I fixed the wrong sentence about pending outgoing data, instead it now says "data in flight" which is correct, as it applies to any data. And sorry that systems don't work like the RFC wishes that, but I can verify on my FreeBSD system as well as on my macOS system (macOS is POSIX approved!) that the longer I set the Linger time, the longer the socket stays in time wait and when I set Linger time to zero, it never stays in time wait at all. I'm not going to discuss provable facts with you here.Thurmond
A socket that is in TIME-WAIT state is there because of the state diagram in RFC 794 #3.3.3. This is your own citation and it supports what I wrote, not what you wrote. And it is certainly untrue that a socket without a linger timeout never goes into TIME-WAIT: it would be a violation of that section.Cystoid
This answer is pretty old and no longer correct. According to the text above, this command that should succeed, doesn't: python -c 'import socket; s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0); s.bind(("127.0.0.1", 9999)); s2 = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0); s2.setsockopt(socket.SOL_SOCKET, 2, 1); s2.bind(("127.0.0.1", 9999));'Haik
@Haik Your code is wrong and contradicts what is written in the answer. You didn't set SO_REUSERPORT on s1 prior to binding it but that is required. Quote: "Basically, SO_REUSEPORT allows you to bind an arbitrary number of sockets to exactly the same source address and port as long as all prior bound sockets also had SO_REUSEPORT set before they were bound." This is also repeated twice in the Linux sections.Thurmond
@Thurmond I was not using SO_REUSEPORT, I was using SO_REUSEADDR. My example had a typo, oops, it should be this: python -c 'import socket; s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0); s.bind(("127.0.0.1", 9999)); s2 = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0); s2.setsockopt(socket.SOL_SOCKET, 2, 1); s2.bind(("0.0.0.0", 9999));' According to the text above, this should work. In practice, it does not work. Compare with second-to-last row of SO_REUSEADDR table above.Haik
Amazing answer. I was surprised to read: For UDP sockets, it tries to distribute datagrams evenly. I am wondering if this applies in the case of two sockets bound to the same endpoint, but one is also connected to a remote endpoint. If a datagram is received from that endpoint, will it be "load balanced" or will it take the most specific route and end up at the bound and connected socket? I am asking specifically for linux but am also curious about windows.Hayden
@MatthewGoulart As far as I understood the documentation, it applies only to two (or more) UDP sockets locally bound to the same endpoint but neither one being connected. I think if one is connected to a remote endpoint, then all datagrams originating from that remote endpoint will end up at the connected socket. The unconnected one would only catch datagrams from other remote endpoints. AFAIK this only applies to Linux, I've not yet heard that other OSes have implemented a similar concept, yet some datagram distribution concept must exists in every OS for that case and if it is "random".Thurmond
Fantastic answer, thanks for all the details. I've had trouble to find the default for the Linger Time - how/where can I find that, specifically on Linux?Ouster
@JurajMartinka The default values for most socket options are not documented, they are hard-coded in the kernel and often they can be overridden for running systems (sysctl,/sys, /proc/sys). Also no rule forbids them to be dynamic and change automatically at system runtime (as some systems do for socket buffers). The default is usually no linger timeout at all but that only means the system itself decides about the linger time. It's complicated. You probably want to read this detailed description notes.shichao.io/unp/ch7/#so_linger-socket-optionThurmond
Interesting and thanks for the link. But isn't the default "linger timeout" 2 * MSL (as stated in "Unix Network Programming" in section "2.7. TIME_WAIT State" on p. 43)? On Linux this seems to be 60 seconds (net.ipv4.tcp_fin_timeout), on Mac OS X 30 seconds (2 * net.inet.tcp.msl, that is set to 15 seconds). Or am I'm mixing up different things?Ouster
@JurajMartinka If Linger is on, the set Linger timeout is the amount of time before a closing socket is closed the hard way if still open. If on and time is zero, sockets are instantly closed the hard way with data in flight getting lost. Yet if Linger is off, the socket is not closed instantly but in the background with no upper time limit. Of course, there must still be a time limit for closing steps performed, like the net.ipv4.tcp_fin_timeout. These time limits are also in effect if you set a Linger timeout, yet with a Linger timeout there is an upper bound for the sum of all operations.Thurmond
So If I understand what you're saying it means that with "linger" on and set to non-zero there's an upper bound on the whole "close socket" operation (4-way handshake initiated by an active close) while the "tcp_fin_timeout" only limits the time spent in the TIME_WAIT state. It was a bit confusing but after I've finally wholly read what you linked (which I realized is actually the Unix Network Programming book) I think I now understand this stuff better.Ouster
@JurajMartinka Correct. The Linger timeout you configure is the upper bound for the entire close operation (from calling close() to the point where the socket is completely gone in your local system). You use it to tell the system "when I call close, I expect that socket to be gone and the port number to be reusable after that many seconds". If you set no Linger, you just don't care how long the close operation will take but timeouts for traffic in flight or the close handshake are still in use and these are very system specific and can be overridden by system admins.Thurmond
Actually if binding local address and port to the same as another socket (before calling) connect will not cause error address in use but generate error cannot assign requested address, where the kernel is unable to assign local address and port, when it is identical to already one socket. But the meaning is the same as in you postGeriatrics
@Geriatrics See 0cn.de/ih4f It clearly defines that if the address is already in use, the error will be EADDRINUSE. The error EADDRNOTAVAIL must only be used if that is a UNIX domain socket (AF_UNIX). If your implementation may throw EADDRNOTAVAIL on an IP socket, then it violates its own documentation, that's a bug which should be reported. It also violates the POSIX standard which allows EADDRNOTAVAIL on all sockets but there it has a completely different meaning, see 0cn.de/vlbe (it means the address is generally unavailable and not because it is already in use).Thurmond
@Thurmond for Windows you write in bold that SO_REUSEADDR allows binding the second socket to the same source and port as the first socket, even if the first did not set SO_REUSEADDR. MS 2nd table that you linked contradicts that and says specifically it's no longer the case since Server 2003. I suggest clarifying your bolded statement.Trisa
@Trisa Yes, but it was the case prior to Windows 2003 and it is still possible in compatibility modes. So it would be fatal to rely on it; you shall not rely on changed default behavior if you can specify desired behavior. This exactly the kind of mistake that leads to horrible security problems, that's why I documented this horrible past default behavior in bold. As SO_EXCLUSIVEADDRUSE still works as of today, there is no reason not to use it and if you want to share ports, you better know those tables in detail, that's why I added the link. I will mention it, though.Thurmond
@Mecki, So can I bind the exact same address with a wildcard and the same port when the older socket is in TIME_WAIT state and the new socket is set with SO_REUSEADDR in Linux? I tried to define SO_REUSEADDR and linger to 0 but bind() keeps falling with EADDRINUSE. (I also posted #71826338)Antacid
@Thurmond your post contains SO_REUESADDR — with a typo. I couldn't edit it because SO wants minimum 6 chars to be changed. Maybe you could fix it yourself :)Escribe
@webknjaz Thanks for pointing it out. I've fixed it.Thurmond
Welcome to the wonderful passion, where people like @Thurmond equipped with StackOverflow, make it even better.Andrej
I agree with @BenVoigt that "local" and "remote" are more accurate and less subjective than "source" and "destination" when talking about socket addresses — see man 7 socket: "getsockname(2) returns the local socket address and getpeername(2) returns the remote socket address."Cistern
@Cistern A socket connection always has a source, the socket that calls connect(), and a destination, the socket it connects to. These roles are clear and always there. But I can have two sockets inside a single application that talk to each other, now which of those two would then be "local" and which one would be "remote"?Thurmond
@Cistern Also you cannot simply make changes like replacing "all source" with "any" as "all source" is a fixed term that is used for ages, is found in all standard reference books about BSD networking/ Any IP address mean any one, "all source" means any of the addresses available as an IP source address, which is not the same thing, as not every address may be available as a source, addresses on an interface can also only be available as a destination. And source/destination are also what the IP standard uses when talking about addresses.Thurmond
@Cistern And finally you ignore the server socket case. What if you have the server socket? Then this is still a local socket, but now the remote (client) socket is the one connecting to you. Local and remote would still be used the same way but in fact the roles are now swapped, now your local socket is the connection destination and the remote socket is the source. With source/destination, my answer is still correct for this case, with local/remote it isn't anymore as what used to be true for "local" is now true for "remote" and the other way round.Thurmond
@Mecki: "I can have two sockets inside a single application that talk to each other, now which of those two would then be local and which one would be remote?" You make an error in thinking "local" and "remote" apply to sockets. They apply to addresses, relative to a particular socket. The address you or the OS binds to the socket is local. The address you pass to connect() or sendto() is remote. For that socket whose descriptor you passed to connect() or sendto(). The remote end of the connection might or might not even be using BSD sockets API, there are others such as "lwip raw".Clink
@BenVoigt Local and remote addresses only applies to sockets and my answer is not about sockets in particular and you yourself just said "might not even be using BSD sockets "; exactly! My answer is about addresses and ports on TCP/IP protocol level and how socket options matches to them and on this protocol level there is no local/remote, a TCP connection has no local or remote end. It has a source and a destination, hence the ports are also called source and destination ports; same goes for IP packets.Thurmond
@Thurmond "on this protocol level there is no local/remote" Baloney. A connection is identified by a tuple of (local IP address, local port, remote IP address, remote port, transport protocol). A TCP connection is bidirectional, there are outgoing packets (send()) with source=local, dest=remote and incoming (recv()) with source=remote, dest=local. Unfortunately some RFCs use imprecise wording. (And no, this isn't a case where I am trying to use non-standard terminology. I use "source" and "destination" as they are defined in the Internet Protocol RFC, which is perfectly standard.)Clink
@BenVoigt Sorry but I will not discuss that with you anymore. Stop the spamming and go write your own better reply if you can and there you can use whatever names you want. I will not change the answer; period. I'm happy with it the way it is and so are 2035 other SO users. Stop wasting everyone's time.Thurmond
@Mecki: You are the one who revived the old topic with 3 comments in a rowClink
@BenVoigt I replied to claymation who just posted a comment 18 hours ago, I did not revive anything, I posted to a new comment that is not even a day old.Thurmond
@Mecki, your insistence on using the subjective "source" and "destination" is frustrating because it contradicts canonical references. Since you ignored the socket(7) reference, here's Richard Stevens in TCP/IP Illustrated: Volume 2: "The combination of the local IP address and the local port number is often called a socket, as is the combination of the foreign IP address and the foreign port number." It also defies common sense. If I create a datagram socket and only ever intend to call recvfrom(2) on it, in what universe should the address and port assigned to it be considered the "source"?Cistern
And yes, 2036 people have found your answer useful enough to upvote, myself included. But it's not as clear as it could be. I often find myself wondering what confusing comments like this mean: "source_port: Port where we are receiving traffic." Why would we be receiving traffic on a source port? It makes no sense. The word "source" is just too subjective. Here, local_port or listen_port would have been a much clearer choice. Who knows, maybe the author used "source" after reading your answer—they also set reuseaddr and reuseport on the socket. I edited your answer to avoid this confusion.Cistern
If the word "remote" is what's throwing you, feel free to substitute "peer", as in getpeername(2), or "partner". The salient point is that either or both of the two sockets involved in any socket-based communication may be a source or a destination of packets at any given time, and so to avoid confusion we should pick unambiguous names for this (local) socket and that (remote, or peer) socket. The fact that the two sockets may be on different hosts or on the same host is abstracted away and should not preclude us from saying that the peer socket is "remote" — in general, it is.Cistern
You are my network God! Can you explain in such way this thread question? https://mcmap.net/q/54860/-loopback-in-multicastHolston
I understand that two applications/processes can listen on the same combination. But then when a request arrives how does the OS bifurcate between which process should handle the request?Trieste
@Thurmond Brilliant answer, thank you for all the detail. Do you know/think the ports in the ephemeral range are treated differently when it comes to duration in TIME_WAIT state or how SO_REUSE* flags affect them? I couldn't find anything online, but I noticed that when I used (unconventionally) an eph port for a listener, and closed it, it took longer to close, than when I used a non-eph port. This allowed a client (from a diff thread) to make a short-lived connection (which terminated and was left in TIME_WAIT), preventing server from binding again. This didn't happen with non-eph port.Semiconscious
@Semiconscious TIME_WAIT is a beast of its own and how it behaves can in fact vary a lot, depending on all kind of factors. You may also want to read my reply about SO_LINGER https://mcmap.net/q/54858/-what-really-is-the-quot-linger-time-quot-that-can-be-set-with-so_linger-on-sockets It contains a link to a blog post which is not available anymore (the link goes to archive.org) and is a follow up to another blog post. Unfortunately archive.org never captured the first part. The first part was about blocking sockets (which is the default) and was also way more detailed, providing more background information about the topic.Thurmond
@Thurmond Excellent read, thank you. But I didn't find anything specific to do with ephemeral port. I am guessing what I observed may be uncaused correlation (i.e nothing to do with the port being eph or not). Will keep digging.Semiconscious
@Thurmond Found it in another answer of yours at https://mcmap.net/q/54859/-bind-fails-after-using-so_reuseaddr-option-for-wildcard-socket-in-state-time_wait. Relevant parts: "SO_REUSEADDR does not necessarily allow reuse across different processes for sockets in TIME_WAIT. But process Y may not be able to bind to a socket to the same address and port as socketA, while socketA is still in TIME_WAIT state for security reasons. It may also depend on the port number your are using. Sometimes this limitation only applies to ports below 1024 (most people testing behavior forget to test for both, ports above and below 1024)" Thank you :)Semiconscious
W
29

Mecki's answer is absolutly perfect, but it's worth adding that FreeBSD also supports SO_REUSEPORT_LB, which mimics Linux' SO_REUSEPORT behaviour - it balances the load; see setsockopt(2)

Walrus answered 10/6, 2020 at 22:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.