Safe reinterpret_cast with sockaddr?
Asked Answered
P

5

7

Assume I'm doing some socket programming:

struct sockaddr_in sa;
inet_pton(AF_INET, "127.0.0.1", &(sa.sin_addr));
auto *resa = reinterpret_cast<struct sockaddr*>(&sa);
bind(sfd, resa, sizeof(sa));

Now the question is: we do reinterpret_cast (or C-style (struct sockaddr *) cast like in tutorials or man) yet the standard does not guarantee that to work, right? On the other hand there does not seem to be a way to do that differently, bind() requires struct sockaddr* (and it has to access the underlying struct to determine what it received).

So is it safe to do reinterpret_cast between different types in this case? If yes then why?

Poling answered 27/5, 2017 at 20:37 Comment(26)
Hmm, the sockaddr structs are designed in an opaque manner. IMO it's OK.Pfeffer
Yes, it's safe; reinterpret_cast is pretty much the same thing as a C-style cast (except safer because it complains about certain things you might want to know about).Holdall
@πάνταῥεῖ What does it mean that they are designed in this way? How can we design different types to work with reinterpret_cast? Is there a specific part of the standard saying something like "if structs/classes X and Y begin with same fields then I can cast one to another and use common fields safely"?Poling
Strictly undefined behaviour, probably will work, memcpy is the only around this issue.Calcic
@RichardCritten Is it safe to memcpy to a different type?Poling
@Poling - was just thinking about that .... not sure and if it's not then how to solve the issue in the post. Just found "Where strict aliasing prohibits examining the same memory as values of two different types, std::memcpy may be used to convert the values." source: en.cppreference.com/w/cpp/string/byte/memcpyCalcic
must be reinterpret_cast<struct sockaddr*>(&sa); not reinterpret_cast<struct sockaddr*>(sa); you forget & or this is typo ? and use bind(sfd, reinterpret_cast<struct sockaddr*>(&sa), sizeof(sa)); absolute safe and correctAut
@Aut Yeah, that was a typo.Poling
@RichardCritten "Strictly undefined behaviour" why?Foreshorten
@Foreshorten The full list of valid casts is here: en.cppreference.com/w/cpp/language/reinterpret_cast . Note that casting between pointers to different types is not in the list. That said, there is so much code out there that does this for networking that disallowing compiler extensions for this sort of thing would probably be a bad idea.Calcic
@RichardCritten Your reference is inexact. "Unlike static_cast, but like const_cast, the reinterpret_cast expression does not compile to any CPU instructions." is mostly true, but not always true. Some casts require actual code. Never mind.Foreshorten
@RichardCritten "Note that casting between pointers to different types is not in the list" Are you really claiming what you just claimed? That's a grotesque claim. You are saying that you can't turn an int* into char* a with reinterpret_cast. It's so preposterous I won't bother refuting it.Foreshorten
@Foreshorten The standard explicitely allows reinterpret_cast to char*: AliasedType is char, unsigned char, or std::byte: this permits examination of the object representation of any object as an array of bytes., see "type aliasing" in the link. It's a special case. Anyway I don't think that the case I'm talking about falls into any category mentioned in the article about reinterpret_cast. Correct me if I'm wrong.Poling
@Foreshorten was just saying you need to read the entire article before making any claims. The cast(s) you mention are explicitly listed as valid exceptions. The comments section of SO is no good for long articles which is why I summarised and linked the original to provide full and complete information.Calcic
@Poling "The standard explicitely allows reinterpret_cast to char*" Yes but not relevant. Your claim was "Note that casting between pointers to different types is not in the list" Casting between pointers to different types is allowed: "Any pointer to object of type T1 can be converted to pointer to object of another type cv T2. This is exactly equivalent to static_cast<cv T2*>(static_cast<cv void*>(expression))"Foreshorten
@Foreshorten First of all that wasn't my claim. Anyway, you are nitpicking now. Conversions are allowed but the value is unspecified (unlike cast to char*). And accessing the structure is UB (unlike cast to char*). Thus you can't achieve safely what I am talking about. Unless I'm wrong. Let's stick to the context of the question and not drift away.Poling
Why do you think that reinterpret_cast is not safe in this case? The C++ code you written doesn't read/write from/to resa, so it should be safe (yes, bind will read from resa, but it is a presumable C lib, C++ rules doesn't apply).Ingeingeberg
@Ingeingeberg sure, C++ rules don't apply but I have no choice but to use bind (somewhere under the hood) for networking. So it would be great to have some guarantee that it won't randomly fail. Not that I think it's a big issue.Poling
I don't think that it will ever fail, if bind is compiled to proper object code. I mean, if bind is compiled as just lto, it could fail in theory. But I still don't understand: you reinterpret_cast (which is fine), then you give the resulting pointer to bind. bind can be implemented in any language, so it should work no matter how you call it. The only requirement is that the pointer should point to properly formatted bytes, which is true in your case. It would be ridiculous if bind cannot be called from C++ code.Ingeingeberg
@Ingeingeberg it may fail because reinterpret_cast is not guaranteed to preserve structure that bind requires. In theory it can point to anything or it can do something to the underlying memory.Poling
Hmm, I may not follow you :) reinterpret_cast is just a cast, it doesn't do anything to the underlying memory. What do you mean by "preserve structure"? What structure? sa? As I understand, resa should point exactly to the same byte as &sa points.Ingeingeberg
@Ingeingeberg that's the whole point: the standard doesn't guarantee that. This is expected but not guaranteed in case of unrelated (by inheritance or some special cases) types.Poling
Are you sure? en.cppreference.com/w/cpp/language/static_cast: "10) Otherwise the pointer value is unchanged."Ingeingeberg
@Ingeingeberg I'm not sure how to interpret this. But have a look at reinterpret_cast notes (en.cppreference.com/w/cpp/language/reinterpret_cast ). Accessing members of unrelated cast is explicitely stated as ub. Which again: these are C++ rules, not C. Which is again confusing. But it seems that C++ is at least allowed to do something to the structure such that accessing it may fail.Poling
Yes, it is confusing for me too. Maybe it's worth a SO question :) Anyways, I don't think that we need to consider accessing. bind does the access, which is not covered by the C++ standard.Ingeingeberg
You don't have to worry about UB caused by accessing or modifying members through pointer resa, since you aren't accessing or modifying any members through that pointer. [expr.reinterpret.cast] has me wondering, though: "Note: The mapping performed by reinterpret_­cast might, or might not, produce a representation different from the original value. — end note". That indeed implies, that the object representation of the pointer can change. I'm at a loss, what implications that has, but it makes me feel uncomfortable about the code.Illuminative
F
4

The ability to support many kinds of pointer manipulation is regarded by the Standard as a Quality of Implementation issue. The Standard does not mandate that all implementations be suitable for low-level or systems programming, but quality implementations suitable for such purpose on e.g. Unix should support the kinds of semantics typically used by systems code on such platform. An implementation could be incapable of handling code that treats shared parts of structures in type-agnostic fashion and yet still be a high-quality implementation for some specialized purposes that don't involve any low-level or systems programming. On the other hand, a quality implementation suitable for low-level programming should have no trouble handling such code. Any implementation that can't handle such code should be viewed as being a low-quality implementation and/or unsuitable for low-level programming, and a low-level program's inability to work with such implementations is not a defect.

Finochio answered 7/7, 2018 at 20:4 Comment(0)
C
5

So is it safe to do reinterpret_cast between different types in this case?

No, not really. You use a pointer to a sockaddr in order to point to an object of type sockaddr_in. These are unrelated types and so it implies something that isn't true: two unrelated objects are pointed to, but only one object is allocated.

If you work on the most constrained of systems then yes, you might be happy with this and as @supercat says, your implementor might have your back. But your code will not be portable.

there does not seem to be a way to do that differently

The prescribed solution is to allocate memory for both objects and to use std::memcpy to make them equal:

sockaddr sa2;
std::memcpy(&sa2, &sa, sizeof(sa));
bind(sfd, &sa2, sizeof(sa));

From cppreference.com:

Where strict aliasing prohibits examining the same memory as values of two different types, std::memcpy may be used to convert the values.

It is important that the two objects (sockaddr_in and sockaddr) are the same size. You can assert that this is the case:

static_assert(sizeof(sa2) == sizeof(sa));

The call to std::memcpy is not always free but often it is. (example)

Cordalia answered 18/9, 2019 at 17:46 Comment(5)
Jason Turner explains this pretty nicely in C++ Weekly - Ep 185.Embrocate
Unfortunately, this memcpy()-based approach "cuts against the grain" of the BSD sockets API. It happens to work for the specific type sockaddr_in (an IPv4 address) mentioned in the question's example code. But this approach will cause a buffer overflow if you use it on type sockaddr_in6 (an IPv6 address). Because struct sockaddr_in6 happens to be larger than struct sockaddr. (Admittedly, the BSD sockets API design might be a bit quirky/old. But that API is a fact of life, and we'll have to use C++ in a way that "cuts with the grain" of the API.)Bingo
It's unfortunate that in other circumstances the same API could encourage code with non-portable behaviour. The above solution is not a licence to cause a buffer overrun under those circumstances. I've updated the answer with a static_assert which prevents this. Thanks.Cordalia
True, it's an unfortunate mismatch of design philosophies. In the BSD sockets API's defense, "it was here first": it goes all the way back to the early 1980ies. The era where it was not abnormal to use C as a portable assembler. Long before the C/C++ strict aliasing rules became an issue. So, it's up to the C++ world to support this preexisting pointer-cast-oriented use case. You rightfully note that the C++ standard does not formally support it. But as noted by @supercat, most practical C++ implementations must guarantee that reinterpret_cast<>() behaves like C compilers from the 1980ies.Bingo
(Side note: I won't defend the BSD sockets API on other aspects; it does some other things that were dirty even back in its day. For example, it assumes that the bit encoding of a positive number in a signed short is identical to the encoding in a unsigned short. That's the "All the world's a VAX" mindset, which is warned against by the Ten Commandments for C Programmers. And what really grinds my gears: there was no pressing reason to have that unportable mismash of integer types.)Bingo
F
4

The ability to support many kinds of pointer manipulation is regarded by the Standard as a Quality of Implementation issue. The Standard does not mandate that all implementations be suitable for low-level or systems programming, but quality implementations suitable for such purpose on e.g. Unix should support the kinds of semantics typically used by systems code on such platform. An implementation could be incapable of handling code that treats shared parts of structures in type-agnostic fashion and yet still be a high-quality implementation for some specialized purposes that don't involve any low-level or systems programming. On the other hand, a quality implementation suitable for low-level programming should have no trouble handling such code. Any implementation that can't handle such code should be viewed as being a low-quality implementation and/or unsuitable for low-level programming, and a low-level program's inability to work with such implementations is not a defect.

Finochio answered 7/7, 2018 at 20:4 Comment(0)
T
4

Contrary to the top answer, I would say that bind CAN and MUST be written in a way such that reinterpret_cast is safe to use here. For instance, bind could be implemented as:

int bind(SOCKET s, const sockaddr* addr, int addrlen) {
    std::uint16_t address_family;
    std::memcpy(&address_family, addr, sizeof(address_family));
    if (address_family == AF_INET) {
        const sockaddr_in* sin = reinterpret_cast<const sockaddr_in*>(addr);
        // Accessing sin->sin_addr is safe here
        ...
    } else if (address_family == AF_INET6) {
        const sockaddr_in6* sin6 = reinterpret_cast<const sockaddr_in6*>(addr);
        ...
    }
}

The key point is that the reinterpret_cast itself is not UB, it's trying to access the data that's UB (see Type Aliasing):

Whenever an attempt is made to read or modify the stored value of an object of type DynamicType through a glvalue of type AliasedType, the behavior is undefined[...]

In the code above, we never try to read the contents of addr through a pointer of type sockaddr*. We examine the value representation (raw bytes) to get the address family, which tells us the exact type of struct to use. Then we can safely cast back to the original type. Casting to a pointer of a different type then back to the original type is allowed by the standard.

Taking it one step further, I would say that with sockaddr_in6 the implementation must handle reinterpret_cast correctly. Since sizeof(sockaddr_in6) > sizeof(sockaddr), the memcpy trick no longer works. The API is specifically asking for a pointer to an object of the wrong type, so the onus is on the API implementers to use the pointer correctly.

Tetherball answered 18/7, 2021 at 7:39 Comment(1)
That's actually a very good point.Adkins
D
-1

Considering that casting to struct sockaddr* is seen in example code and that you are passing it off to a API function (for a well-documented and well-tested library), it is only unsafe if you violate the preconditions of the function.

Casting is necessary because there are different types of sockaddr (like sockaddr_in and sockaddr_un) and only one type of bind function.

The C++ compiler will choose reinterpret_cast when doing a C-style cast anyway, but it is preferable to be more explicit for readability.

Danziger answered 27/5, 2017 at 20:55 Comment(7)
The problem is that socket api is C api. And so all tutorials are C tutorials. Perhaps these kind of casts are safe in C, I don't know about that. But it definitely is not guaranteed to work under C++. I know that in practice this works. On the other hand in practice I've never used funky platforms. So perhaps on some platform it will fail? If not then why? So as you can see your answer doesn't really answer my question.Poling
@Poling The answer is correct. C++ reinterpret_cast does merely nothing beyond a plain C-style cast (except the const correctness maybe).Pfeffer
@πάνταῥεῖ Fair enough, then what you said is that this is not safe under C as well, right?Poling
@Poling Safety is relative here. That's why the basic structure has the discriminator type field. Well, that's how polymorphism is done in C.Pfeffer
@πάνταῥεῖ I'm very skeptic about that. On one hand, you are right - this is C style polymorphism. On the other hand, this kind of polymorphism is clearly UB in C++ standard (more precisely: accessing structure after reinterpret_cast is UB). Again, I do not know the C standard. Also I'm not really interested in C, so lets focus on C++. What I'm actually interested in is some quote from C++ standard that justifies this. Or perhaps a simple "yeah, this is UB but we do that anyway". This also brings another question: why would C++ standard state that this is UB if everyone does this anyway?Poling
@Poling Well, UB just means that it's beyond the scope of the C++ standard. You are using a C API. Within that API it's well defined what is necessary to do. Given the fact that reinterpret_cast does the same as a C-style cast, and you can rely that the structs are well designed (regarding alignment and aliasing) it's just fine to do so, yes. I've done a lot of network programming BTW, and be assured it worked well on all systems and compilers I met. If you want a C++ abstraction level use something like e.g. boost::asio.Pfeffer
This answer does not address the issue. If you invoke UB the statement "it is only unsafe if you violate the preconditions of the function" is wrong. This answer does not tell us if UB is invoked or not.Matriarchate
A
-2

yes,this is absolute safe and correct.

Winsock functions using sockaddr are not strictly interpreted to be pointers to a sockaddr structure. The structure is interpreted differently in the context of different address families. The only requirements are that the first u_short is the address family and the total size of the memory buffer in bytes is namelen.

so really you need pass pointer to some location which begin from u_short sin_family; . the sockaddr_in conform this condition. reinterpret_cast not change your pointer to sockaddr_in sa; and not produce any binary code: reinterpret_cast<sockaddr*>(&sa); or (sockaddr*)&sa is the same as &sa. in other words binary pointer &sa == reinterpret_cast<sockaddr*>(&sa)

so you can and must use

 bind(sfd, reinterpret_cast<sockaddr*>(&sa), sizeof(sa));

use auto *resa here senseless, for what ?

alternatively for example we can use next code:

union {
    sockaddr sa;
    sockaddr_in sa_in;
};

sa_in.sin_family = AF_INET;
sa_in.sin_port = *;
sa_in.sin_addr.S_un.S_addr = *;
bind(0, &sa, sizeof(sa_in));

and main - try put yourself on place who wrote bind function. how it must work with sockaddr ? obviously it first look to it sa_family and then based on it sa_family value reinterpret_cast it to more specific struct

Aut answered 27/5, 2017 at 22:15 Comment(1)
Comments are not for extended discussion; this conversation has been moved to chat.Catalepsy

© 2022 - 2024 — McMap. All rights reserved.