Does sin_addr.s_addr = INADDR_ANY; need htonl at all?
Asked Answered
A

7

37

I came across two threads:

Socket with recv-timeout: What is wrong with this code?

Reading / Writing to a socket using a FILE stream in c

one uses htonl and the other doesn't.

Which is right?

Apices answered 21/5, 2011 at 12:58 Comment(8)
+1 for trying to shed light on a confusing area: the top hit on Google is not even sure: tech-archive.net/Archive/Development/… - the linked message says (pretty convincingly) that htonl is not required for these constants, then the follow-up message retracts the statement, saying htonl is required!Injun
@John: This isn't a confusing area. If you know the numeric value of INADDR_ANY, it's clear that htonl() doesn't do anything to the result -- zero in results in zero out; similarly, for INADDR_ALL, 0xFFFFFFFF in to htonl() results in 0xFFFFFFFF out. However, INADDR_LOOPBACK is different -- it is specified in network byte order as 0x7F000001. For this constant, use of htonl() is required.Medorra
@Heath: You may read (and downvote!) my answer below then. I wrote it before @Mat updated his answer to say that htonl should be used after all. Just because you say this area is not confusing does not make it so (and the evidence is that a number of people find it confusing).Injun
@John: It is confusing to the extent that people haven't established their basic knowledge in the area. If the programmer can't read the output from ifconfig, netstat, tcpdump, then they're going to be nothing but confused. If they have basic knowledge of this area, they won't be confused.Medorra
The point is INADDR_ANY/INADDR_LOOPBACK doesn't have a standard specified value. It's clear on your machine how it works, but that might not be univerally true (or it might - which is what the question is about)Islas
@nos: My machine is probably wired to your machine, or to your wireless hub. At some level, they do work the same, the way specified by IETF RFCs. It's no coincidence that the Berkeley sockets API treats the constants this way -- but avocados of encapsulation and abstraction would suggest that we pretend it is a coincidence.Medorra
The RFCs does not specify that INADDR_LOOPBACK should be 0x7F000001 or e.g. 0x100007F.Islas
Unless you can think of any case where INADDR_LOOPBACK would not be 127.0.0.1, I will say that it will always be 0x7f000001.Raleigh
I
36

Since other constants like INADDR_LOOPBACK are in host byte order, I submit that all the constants in this family should have htonl applied to them, including INADDR_ANY.

(Note: I wrote this answer while @Mat was editing; his answer now also says it's better to be consistent and always use htonl.)

Rationale

It is a hazard to future maintainers of your code if you write it like this:

if (some_condition)
    sa.s_addr = htonl(INADDR_LOOPBACK);
else
    sa.s_addr = INADDR_ANY;

If I were reviewing this code, I would immediately question why one of the constants has htonl applied and the other does not. And I would report it as a bug, whether or not I happened to have the "inside knowledge" that INADDR_ANY is always 0 so converting it is a no-op.

The code you write is not only about having the correct runtime behavior, it should also be obvious where possible and easy to believe it is correct. For this reason you should not strip out the htonl around INADDR_ANY. The three reasons for not using htonl that I can see are:

  1. It may offend experienced socket programmers to use htonl because they will know it does nothing (since they know the value of the constant by heart).
  2. It requires less typing to omit it.
  3. A bogus "performance" optimization (clearly it won't matter).
Injun answered 21/5, 2011 at 13:50 Comment(5)
Nobody suggested performance optimization, that's your straw man argument. The reason not to do this is that it looks stupid to experienced socket programmers and indicates a lack of insight into what the code does. In fact, there is no performance benefit either way.Medorra
I completely agree that "experienced socket programmers" such as yourself will not have any issues reading the code without htonl. I submit that people who know C but do not have expertise with its socket APIs will find it easier to maintain code if it is consistent. I will agree to disagree with your assertion that it is better to write code which is not friendly to newbies.Injun
I will edit my answer to reflect that the third reason for not using htonl is so that it does not offend greybeards.Injun
You said "you clearly haven't actually done any sockets interface programming" earlier, now you complain when I make a satirical jab in your general direction. It may surprise you to learn that I have done quite a bit of work with sockets in C (yet still found the original question here an interesting one!).Injun
Re #3, on GCC at least the compiler is capable of constant-folding through htonl and friends, so there will be zero performance impact.Meeker
T
19

INADDR_ANY is the "any address" in IPV4. That address is 0.0.0.0 in dotted notation, so 0x000000 in hex on any endianness. Passing it through htonl has no effect.

Now if you want to wonder about other macro constants, look at INADDR_LOOPBACK if it's defined on your platform. Chances are it will be a macro like this:

#define INADDR_LOOPBACK     0x7f000001  /* 127.0.0.1   */

(from linux/in.h, equivalent definition in winsock.h).

So for INADDR_LOOPBACK, an htonl is necessary.

For consistency, it could thus be better to use htonl in all cases.

Trauner answered 21/5, 2011 at 13:2 Comment(6)
@It won't be. but feel free to use htonl anyway.Trauner
@Mat: You correctly state that s_addr is in network byte order, but you gloss over which byte order INADDR_ANY is supposed to be in. This is the crux of it.Injun
@John: It is both orders. It's zero. All the bits are zero. It isn't going to change from zero, either -- it's defined in the IP RFC.Medorra
Agreed. What would be your advice for the other (non-zero) constants? Certainly whatever is the correct answer for them should be applied to INADDR_ANY, because to not do so introduces a code maintenance headache down the road (most people won't know INADDR_ANY is zero, so may think it is a bug if it turns out ntohl is used for other constants but not this one).Injun
@John - As you can see above, the constants are in network-byte order. Your suggestion that not using htonl() introduces a code maintenance bug is naive -- you clearly haven't actually done any sockets interface programming. There is no occassion when maintenance would require changing an INADDR_ANY to an INADDR_LOOPBACK, and there is no such person as a qualified maintainer of sockets code who doesn't know that INADDR_ANY stands for zero. I would say better not to dumb it down and invite an unqualified maintainer into the source -- they cost more than they benefit.Medorra
+1 for technical correct, -1 for bogus consistency suggestion = +/-0.Medorra
D
8

Neither is right, in the sense that both INADDR_ANY and htonl are deprecated, and lead to complex, ugly code that only works with IPv4. Switch to using getaddrinfo for all of your socket address creation needs:

struct addrinfo *ai, hints = { .ai_flags = AI_PASSIVE|AI_ADDRCONFIG };
getaddrinfo(0, "1234", &hints, &ai);

Replace "1234" with your port number or service name.

Downhearted answered 21/5, 2011 at 15:11 Comment(10)
Brilliant, side stepped the controversy and posted something more modern (yet still C)!Injun
You might want AI_NUMERICSERV if you know you are supplying a number, I won't comment directly on the NULL, and of course this is a "little" slower with the call to atoi than using htons, but otherwise, OK.Medorra
AI_NUMERICSERV is only needed to inhibit string-based service lookup. If your service string is a number anyway, it should be a no-op. But it couldn't hurt to include it.Downhearted
Does it first look up "1234" in /etc/services or first try atoi on the parameter?Medorra
I suppose it's up to the implementation, but I can't imagine an implementation being stupid enough to read /etc/services when the argument is numeric. Then again glibc never ceases to amaze me, so you might want to use strace and check that... ;-)Downhearted
@R: I did the experiment. /etc/services was not opened when the service was a number (as in your answer). It was opened when the service was a name (like "asp"). This was using glibc 2.13 (the current stable).Injun
In that case, I believe it rarely makes sense to use AI_NUMERICSERV. The only use would be explicitly rejecting non-numeric service names, but you could just as easily have rejected them earlier yourself. Thanks for checking.Downhearted
for some reason we were getting EAI_AGAIN on getaddrinfo when called with "127.0.0.1" which was annoyingBundesrat
There seem to be some suggestions around that getaddrinfo is not a panacea, and that the much simpler inet_pton should be used if possible - blog.powerdns.com/2014/05/21/…Adulterine
@JosephH: Well these suggestions are wrong. inet_pton cannot work, for example, with link-local addresses requiring a scope id, unless you add address-family-specific logic on top of it. The blog post you linked is about a stupid glibc bug which has hopefully been reported and fixed. If not somebody should do that. In the post-Drepper era glibc is much better about actually fixing bugs rather than closing them as WONTFIX.Downhearted
S
3

Stevens uses htonl(INADDR_ANY) consistently in the book UNIX Network Programming (my copy is from 1990).

The current release version of FreeBSD defines 12 INADDR_ constants in netinet/in.h; 9 of the 12 require htonl() for proper functionality. (The 9 are INADDR_LOOPBACK and 8 other multicast group addresses such as INADDR_ALLHOSTS_GROUP and INADDR_ALLMDNS_GROUP.)

In practice, it makes no difference whether you use INADDR_ANY or htonl(INADDR_ANY), other than the possible performance hit from htonl(). And even that possible performance hit may not exist -- with my 64-bit gcc 4.2.1, turning on any level of optimization at all seems to activate compile-time htonl() conversion of constants.

In theory it would be possible for some implementer to redefine INADDR_ANY to a value where htonl() actually does something, but such a change would break tens of thousands of existing pieces of code out there and wouldn't survive in the "real world"... Too much code exists which depends explicitly or implicitly on INADDR_ANY being defined as some sort of zero-valued integer. Stevens likely didn't intend for anyone to assume that INADDR_ANY is always zero when he wrote:

cli_addr.sin_addr.s_addr = htonl(INADDR_ANY);
cli_addr.sin_port        = htons(0);

In assigning a local address for the client using bind, we set the Internet address to INADDR_ANY and the 16-bit Internet port to zero.

Stull answered 15/6, 2011 at 21:6 Comment(0)
S
3

Was going to add this as a comment, but it got a little long-winded ...

I think it's clear from the answers and the commentary here that htonl() needs to be used on these constants (albeit that calling it on INADDR_ANY and INADDR_NONE are tantamount to no-ops). The problem that I see as to where the confusion arises is that it is not explicitly called out in documentation - someone please correct me if I simply missed it, but I have not seen in the man pages, nor in the include header where it explicitly states that the defines for INADDR_* are in host order. Again, not a big deal for INADDR_ANY, INADDR_NONE, and INADDR_BROADCAST, but it is significant for INADDR_LOOPBACK.

Now, I've done quite a bit of low-level socket work in C, but the loopback address rarely, if ever, gets used in my code. Although this topic is over a year old, this very problem just jumped up to bite me in the behind today, and it was because I went on the mistaken assumption that the addresses defined in the include header are in network order. Not sure why I had that idea - probably because the in_addr structure needs to have the address in network order, inet_aton and inet_addr return their values in network order, and so my logical assumption was that these constants would be usable as-is. Throwing together a quick 5-liner to test that theory showed me otherwise. If any of the powers-that-be happen to see this, I would make the suggestion to explicitly call out that the values are, in fact, in host order, not network order, and that htonl() should be applied to them. For consistency's sake, I would also suggest, as others have done so already here, that htonl() be used for all of the INADDR_* values, even if it does nothing to the value.

Sumac answered 30/5, 2012 at 15:51 Comment(0)
T
2

Let's summarize it a little bit, as none of the previous answers seems to be up to date and I may not be the last person who will see this question page. There have been opinions both for and against usage of htonl around INADDR_ANY constant or avoiding it entirely.

Nowadays (and it's been nowadays for quite some time now) system libraries are mostly IPv6 ready, so we use IPv4 as well as IPv6. The situation with IPv6 is much easier as the data structures and constants don't suffer from byte order. One would use 'in6addr_any' as well as 'in6addr_loopback' (both struct in6_addr type) and both of them are constant objects in the network byte order.

See why IPv6 doesn't suffer from the same problem (if IPv4 addresses were defined as four byte arrays they wouldn't suffer either):

struct in_addr {
    uint32_t       s_addr;     /* address in network byte order */
};

struct in6_addr {
    unsigned char   s6_addr[16];   /* IPv6 address */
};

For IPv4, it would be nice to also have 'inaddr_any' and 'inaddr_loopback' as 'struct in_addr' constants (so that they can also be compared with memcmp or copied with memcpy). Indeed it might be a good idea to create them in your program as they aren't provided by glibc and other libraries:

const struct in_addr inaddr_loopback = { htonl(INADDR_LOOPBACK) };

With glibc, this only works for me inside a function (and I can't make it static), as htonl is not a macro but an ordinary function.

The problem is that glibc (in contrast with what was claimed in other answers) doesn't provide htonl as a macro but rather as a function. Therefore you would have to:

static const struct in_addr inaddr_any = { 0 };
#if BYTE_ORDER == BIG_ENDIAN
static const struct in_addr inaddr_loopback = { 0x7f000001 };
#elif BYTE_ORDER == LITTLE_ENDIAN
static const struct in_addr inaddr_loopback = { 0x0100007f };
#else
    #error Neither big endian nor little endian
#endif

That would be a really nice addition to the headers and then you could work with IPv4 constants as easily as you can with IPv6.

But then to implement that, I had to use some constants to initialize that. When I know the respective bytes exactly, I don't need any constants. Just as some people claim that htonl() is redundant for a constant that evaluates to zero, anyone else could claim that the constant itself is redundant as well. And he would be right.

In the code I prefer to be explicit than implicit. Therefore if those constants (like INADDR_ANY, INADDR_ALL, INADDR_LOOPBACK) are all consistently in host byte order, then it's only correct if you treat them like that. See for example (when not using the above constant):

struct in_addr address4 = { htonl(use_loopback ? INADDR_LOOPBACK : INADDR_ANY };

Of course you could say that you don't need to call htonl for INADDR_ANY and therefore you could:

struct in_addr address4 = { use_loopback ? htonl(INADDR_LOOPBACK) : INADDR_ANY };

But then when ignoring the byte order of the constant because it's zero anyway, then I don't see much logic in using the constant at all. And the same applies to INADDR_ALL, as it's easy to type 0xffffffff as well;

Another way to get around it is to avoid setting those values directly altogether:

struct in_addr address4;

inet_pton(AF_INET, "127.0.0.1", &address4);

This adds a little bit of useless processing but it has no byte order problems and it is virtually the same for IPv4 and IPv6 (you just change the address string).

But the question is why are you doing that at all. If you want to connect() to IPv4 localhost (but sometimes to IPv6 localhost, or just any hostname), getaddrinfo() (mentioned in one of the answers) is much better for that, as:

  1. It is a function used for translating any hostname/service/family/socktype/protocol a to a list of matching struct addrinfo records.

  2. Each struct addrinfo includes a polymorphic pointer to struct sockaddr that you can directly use with connect(). Therefore you don't need to care about the construction of struct sockaddr_in, typecasting (via a pointer) to struct sockaddr, etc.

    struct addrinfo *ai, hints = { .ai_family = AF_INET }; getaddrinfo(0, "1234", &hints, &ai);

    record that in turn include pointers polymorphic struct sockaddr structures which you need for the connect() call.

So, the conclusion is:

1) The standard API fails to provide directly usable struct in_addr constants (instead it provides rather useless unsigned integer constants in host order).

struct addrinfo *ai, hints = { .ai_family = AF_INET, .ai_protocol = IPPROTO_TCP };
int error;

error = getaddrinfo(NULL, 80, &hints, &ai);
if (error)
    ...

for (item = result; item; item = item->ai_next) {
    sock = socket(item->ai_family, item->ai_socktype, item->ai_protocol);

    if (sock == -1)
        continue;

    if (connect(sock, item->ai_addr, item->ai_addrlen) != -1) {
        fprintf(stderr, "Connected successfully.");
        break;
    }

    close(sock);
}

When you are sure your query is selective enough that it only returns one result, you could do (omitting error handling for brevity) the following:

struct *result, hints = { .ai_family = AF_INET, .ai_protocol = IPPROTO_TCP };
getaddrinfo(NULL, 80, &hints, &ai);
sock = socket(result->ai_family, result->ai_socktype, result->ai_protocol);
connect(sock, result->ai_addr, result->ai_addrlen);

If you're afraid getaddrinfo() might be significantly slower than using the constants, the system library is the best place to fix that. A good implementation would just return the requested loopback address when service is null and hints.ai_family is set.

Timofei answered 13/10, 2013 at 16:50 Comment(0)
M
0

I don't usually like to answer when there is already a "decent" answer. In this case, I am going to make an exception because information I added to these answers is being misconstrued.

INADDR_ANY is defined as an all-zero-bits IPv4 address, 0.0.0.0 or 0x00000000. Calling htonl() on this value will result in the same value, zero. Therefore, calling htonl() on this constant value is not technically necessary.

INADDR_ALL is defined as an all-one-bits IPv4 address, 255.255.255.255 or 0xFFFFFFFF. Calling htonl() with INADDR_ALL will return INADDR_ALL. Again, calling htonl() is not technically necessary.

Another constant defined in the header files is INADDR_LOOPBACK, defined as 127.0.0.1, or 0x7F000001. This address is given in network-byte order, and cannot be passed to the sockets interface without htonl(). You must use htonl() with this constant.

Some would suggest that consistency and code readability demand that programmers use htonl() for any constant named INADDR_* -- because it is required for some of them. These posters are wrong.

An example given in this thread is:

if (some_condition)
    sa.s_addr = htonl(INADDR_LOOPBACK);
else
    sa.s_addr = INADDR_ANY;

Quoting from "John Zwinck":

"If I were reviewing this code, I would immediately question why one of the constants has htonl applied and the other does not. And I report it as a bug, whether or not I happened to have the "inside knowledge" that INADDR_ANY is always 0 so converting it is a no-op. And I think (and hope) many other maintainers would do the same."

If I were receiving such a bug report, I would immediately throw it away. This process would save me a lot of time, fielding bug reports from people who don't have the "basic minimum knowledge" that INADDR_ANY is always 0. (Suggesting that knowing the values of INADDR_ANY et al. somehow violates encapsulation or whatever is another non-starter -- the same numbers are used in the netcat output and inside the kernel. Programmers need to know the actual numerical values. People who don't know aren't lacking inside knowledge, they are lacking basic knowledge of the area.)

Really, if you have a programmer maintaining sockets code, and that programmer doesn't know the bit patterns of INADDR_ANY and INADDR_ALL, you are already in trouble. Wrapping 0 in a macro which returns 0 is the kind of mentality that is a slave to meaningless consistency and doesn't respect domain knowledge.

Maintaining sockets code is about more than understanding C. If you don't understand the difference between INADDR_LOOPBACK and INADDR_ANY at a level compatible with netstat output, then you are dangerous in that code and shouldn't be changing it.

Straw-man arguments proposed by Zwinck regarding the needless use of htonl():

  1. It may offend experienced socket programmers to use htonl because they will know it does nothing (since they know the value of the constant by heart).

This is a straw argument because we have a portrayal that experienced socket programmers know the value of INADDR_ANY by heart. This is like writing that only an experienced C programmer knows the value of NULL by heart. Writing "by heart" gives the impression that the number is slight difficult to memorize, perhaps a few digits, such as 127.0.0.1. But no, we are hyperbolically discussing the difficult of memorizing the patterns named "all zero bits" and "all one bits."

Considering that these numerical values appear in the output of, e.g., netstat and other system utilities, and also considering that some of these values appear in IP headers, there is no such thing as a competent sockets programmer who does not know these values, whether by heart or by brain. In fact, attempting sockets programming without knowing these basics can be dangerous to the network availability.

  1. It requires less typing to omit it.

This argument is intended to be absurd and dismissive, so it doesn't need much refuting.

  1. A bogus "performance" optimization (clearly it won't matter).

It's hard to know where this argument came from. It could be an attempt to supply stupid-seeming arguments to the opposition. In any case, not using the htonl() macro makes no difference to performance when you provide a constant and use a typical C compiler -- the constant expressions are reduced to a constant in either case.


A reason not to use htonl() with INADDR_ANY is that most experienced sockets programmer knows that it is not needed. What's more: those programmers who do not know need to learn. There is no extra "cost" with use of htonl(), the trouble is the cost of establishing a coding standard which fosters ignorance of such critically important values.

By definition, encapsulation fosters ignorance. That very ignorance is the usual benefit of using an encapsulated interface -- knowledge is expensive and finite, therefore encapsulation is usually good. The question becomes: which efforts of programming are best enhanced via encapsulation? Are there programming tasks which are disserved by encapsulation?

It is not technically incorrect to use htonl(), because it has no effect on this value. However, arguments that you should use it may be misleading.

There are those who would argue that a better situation would be one in which the developer did not need to know that INADDR_ANY is all zeroes and so on. This land of ignorance is worse, not better. Consider that these "magic values" are used throughout various interfaces with TCP/IP. For example, when configuring Apache, if you would like to listen only to IPv4 (and not IPv6), you must specify:

Listen 0.0.0.0:80

I have run into programmers who mistakenly supplied the local IP address instead of INADDR_ANY (0.0.0.0) above. These programmers don't know what INADDR_ANY is, and they probably wrap it in htonl() while they are at it. This is the land of abstaction-thinking and encapsulating.

The ideas of "encapsulation" and "abstraction" have been widely accepted and too-widely applied, but they do not always apply. In the domain of IPv4 addressing, it's not appropriate to treat these constant values as "abstract" -- they are converted directly into bits on the wire.


My point is this: there is no "correct" usage of INADDR_ANY with htonl() -- both are equivalent. I would not recommend adopting a requirement that the value be used any particular way, because the INADDR_X family of constants only have four members, and only one of them, INADDR_LOOPBACK has a value which is different depending on byte ordering. It is better to just know this fact than to establish a standard for using the values which turns a "blind eye" to the bit patterns of the values.

In many other APIs, it is valuable for programmers to proceed without knowing the numeric value or bit patterns of constants used by the APIs. In the case of the sockets API, these bit patterns and values are used as input and displayed pervasively. It is better to know these values numerically than to spend time thinking about using htonl() on them.

When programming in C, especially, most "use" of the sockets API involves grabbing some other person's source code, and adapting it. This is another reason it is so important to know what INADDR_ANY is before touching a line which uses it.

Medorra answered 21/5, 2011 at 14:2 Comment(4)
I used Google Code Search to compare results for " = INADDR_ANY" (15K hits) vs " = htonl(INADDR_ANY)" (9K hits). I thought this was interesting.Injun
Although your search result supports my point, I would never consider that kind of result informative. Rather than consult the democratic misunderstanding of the masses, why not look at Stevens' source code or pick another individual who is clearly an expert in TCP/IP and the sockets API?Medorra
I would lean toward using htonl(INADDR_ANY), simply because it's better to write code into which it is more difficult to introduce a bug later. Suppose one uses a plain INADDR_ANY, and later changes it to some non-zero value later. Sure, it's easy to remember to add the htonl() later if you've had enough coffee, but what if you haven't? The benefit of using htonl() at the outset is small, but the cost is even smaller.Lenette
@Bill: I would usually go that path, too. The thing is, there are only 3 (or 4, for pedants) constants in the family INADDR_X. All but one of them are unchanged via htonl while the distinct member, INADDR_LOOPBACK means something very different from the rest. Enabling sleepy "not enough coffee" changes is a non-goal in that terrain. Thinking beyond the INADDR_X family, you can't make assumptions about another family of defined constants, they might already include the htonl().Medorra

© 2022 - 2024 — McMap. All rights reserved.