Does the 'offsetof' macro from <stddef.h> invoke undefined behaviour?
Asked Answered
L

6

18

Example from MSVC's implementation:

#define offsetof(s,m) \
    (size_t)&reinterpret_cast<const volatile char&>((((s *)0)->m))
//                                                   ^^^^^^^^^^^

As can be seen, it dereferences a null pointer, which normally invokes undefined behaviour. Is this an exception to the rule or what is going on?

Lament answered 21/6, 2011 at 23:48 Comment(4)
Note that it's hard to talk about "standard compliance" in the header files supplied with an implementation. e.g. Microsoft having control and knowledge about the internals of their compiler can do whatever they want there, as long as their compiler/header files/libraries is standard compliant regarding user supplied code.Tomi
@Lyke: Well, "working as expected" is one of the many flavours of "undefined behaviour". Often, it is also the most dangerous. :PLament
@Tomi Your comment should be an answer: the compiler provides an offsetofmacro that works as expected. The compiler is also free to implement memmove() with a comparison that would produce undefined behavior if it was in user code. All do.Pauiie
It would be undefined to write and use such a macro in your own code; that's why your implementation is required to provide you with an offsetof macro that you can use instead.Eulogistic
Y
26

Where the language standard says "undefined behavior", any given compiler can define the behavior. Implementation code in the standard library typically relies on that. So there are two questions:

(1) Is the code UB with respect to the C++ standard?

That's a really hard question, because it's a well known almost-defect that the C++98/03 standard never says right out in normative text that in general it's UB to dereference a nullpointer. It is implied by the exception for typeid, where it's not UB.

What you can say decidedly is that it's UB to use offsetof with a non-POD type.

(2) Is the code UB with respect to the compiler that it's written for?

No, of course not.

A compiler vendor's code for a given compiler can use any feature of that compiler.

Cheers & hth.,

Yehudit answered 22/6, 2011 at 0:54 Comment(6)
Can you check whether the note I found in [dcl.ref] (see my answer) also exists in C++03?Navada
@Ben: it still exists in N3290 (i.e. C++0x, §8.3.2/5). But notes and examples are not normative in ISO standards. For example, the C++98/C++03 $5/4 examples that say "unspecified" behavior are incorrect, and are contradicted by the immediately preceding normative text.Yehudit
I know it exists in N3290, that's where I found it. But I don't have a copy of the official C++98 or C++03.Navada
@Ben: it's there in the original C++98 standard, but one paragraph earlier, §8.3.2/4. But as I said it's not normative. This lack of a normative text description is all the more baffling because the standard claims to have such a description: §1.9/4 "Certain other operations are described in this International Standard as undefined (for example, the effect of dereferencing the null pointer)." -- but no reference to where that alleged description resides... :-) Cheers,Yehudit
"never says right out in normative text that in general it's UB to dereference a nullpointer" anyway, it follows from the fact that null pointer dereference behaviour is never defined, that it is UB.Sowers
@AlfP.Steinbach That's just standard irony.Sowers
A
17

The notion of "undefined behavior" is not applicable to the implementation of the Standard Library, regardless of whether it is a macro, a function or anything else.

In general case, the Standard Library should not be seen as implemented in C++ (or C) language. That applies to standard header files as well. The Standard Library should conform to its external specification, but everything else is an implementation detail, exempt from all and any other requirements of the language. The Standard Library should be always thought of as implemented in some "internal" language, which might closely resemble C++ or C, but still is not C++ or C.

In other words, the macro you quoted does not produce undefined behavior, as long as it is specifically the offsetof macro defined in the Standard Library. But if you do exactly the same thing in your code (like define your own macro in the very same way), it will indeed result in undefined behavior. "Quod licet Jovi, non licet bovi".

Aldine answered 22/6, 2011 at 1:6 Comment(13)
Exactly. It is common practice to use different compiler extensions, intrinsics, inline assembler, etc (not allowed in conforming application) in standard library implementations. But for some reason nobody complains about this. :)Reviewer
This is a wrong answer, because it negates possibility of Boost. You are free to exploit undefined behavior if you have enough expertise as, for example, Boost writers have.Tremolite
@0kcats: Your comment makes no sense in the context if the question. The question is about whether offsetof produces UB or not. Whether one should "exploit undefined behavior" is a completely different matter. And the answer is "no", one should not, regardless of how much "expertise" on has. Believing that Boost authors "exploit UB" would only mean that you misinterpreted/misunderstood something.Aldine
@AnT Say, you didn't have offsetof operation in the language, but needed it. Definitely you could use same approach as in the the question in Visual Studio. Or you believe it will not work? This is stated at the bottom of your answer, "But if you do exactly the same thing in your code (like define your own macro in the very same way), it will indeed result in undefined behavior" which is simply wrong.Tremolite
@0kcats: A compiler is free to say "if this expression is being evaluated as the result of macro expansion of offsetof as defined by header stddef.h in a year between 1980 and 3047, then the result is a size_t representing the offset of the member within the struct, otherwise reformat the harddrive". This is what undefined behavior means: the compiler writers get to chose (explicitly or implicitly) what the behavior is. So on this example compiler, no, you couldn't try it yourself. Real compilers are free to document these extensions, if they want, but certainly don't want to.Quadrant
@0kcats: No. Modern compilers do a lot of things to take advantage of optimizational freedoms provided by UB. Firstly, the effects of that are most of the time overwhelmingly surprising to the proponents of the "undefined is really implementation-defined" philosophy. SO is choke-full of questions related to that topic.Aldine
Secondly, such compilers often do contain non-standard internal features that make sure that any kind of useful "definedness" of UB is available to standard library code, but not [immediately] available to user code. My statement of the same code being UB in user code (even if it works in standard header) is perfectly true from that standpoint.Aldine
While these arguments might seem logical they are not real. Headers are compiled the same (give an opposite example). Yes, Undefined Behavior in the language standard is sometimes, but far from always, Undefined Behaviors of a particular compiler on a particular platform. Don't mix this up. This is always exploited. As well as bugs in compilers have been worked around in different weird ways. Oh, and if you know that a compiler has a bug, you should stop using it completely, because it is UB of any code that you write? Go check bug list in your favorite compiler and stop writing the code.Tremolite
@0kcats: Yes, these arguments are real. Lots of GCC headers, for one example, contain various setup directives intended to suppress warnings or tailor other compiler behavior to the implementation-specific language features used in these standard headers. These adjustments are reversed before the end of the header.Aldine
And even if implementation-specific features define something undefined by the language, this is nothing more than a compiler extension. It has nothing to do with the language itself. In the realm of C+ (or C) programming exploiting such extensions without and extremely good and thoroughly documented reason is a sign of incompetence.Aldine
It very simple, again: user source code has to be conformant in its entirety. Standard library has to be conformant functionally and at the interface level only. Claiming that these requirements are somehow equivalent or even comparable is beyond ridiculous. Implementations are free to implement the standard library in Fortran, if they so desire.Aldine
Exactly same things that you find in the gcc headers you can apply in your 'private' library headers just ifdef them for this only compiler. If the code is old enough and was targeting a single platform it will be riddled with language UBs that are not UBs for the compiler. Same is true if you are targeting several different architectures - in that case you are going to have a set of non-compliant headers of your own with different ifdefs that are working around compiler bugs, and provide missing functionality for different compilers.Tremolite
User code is not treated differently than other headers.Tremolite
S
4

When the C Standard specifies that certain actions invoke Undefined Behavior, that does has not generally meant that such actions were forbidden, but rather that implementations were free to specify the consequent behaviors or not as they see fit. Consequently, implementations would be free to perform such actions in cases where the Standard requires defined behavior, if and only if the implementations can guarantee that the behaviors for those actions will be consistent with what the Standard requires. Consider, for example, the following implementation of strcpy:

char *strcpy(char *dest, char const *src)
{
  ptrdiff_t diff = dest-src-1;
  int ch;
  while((ch = *src++) != 0)
    src[diff] = ch;
  return dest;
}

If src and dest are unrelated pointers, the computation of dest-src would yield Undefined Behavior. On some platforms, however, the relation between char* and ptrdiff_t is such that given any char* p1, p2, the computation p1 + (p2-p1); will always equal p2. On platforms which make that guarantee, the above implementation of strcpy would be legitimate (and on some such platforms might be faster than any plausible alternative). On some other platforms, however, such a function might always fail except when both strings are part of the same allocated object.

The same principle applies to the offsetof macro. There is no requirement that compilers offer any way to get behavior equivalent to offsetof (other than by actually using that macro) If a compiler's model for pointer arithmetic makes it possible to get the required offsetof behavior by using the -> operator on a null pointer, then its offsetof macro can do that. If a compiler wouldn't support any efforts to use -> on something other than a legitimate pointer to an instance of the type, then it may need to define an intrinsic which can compute a field offset and define the offsetof macro to use that. What's important is not that the Standard define the behaviors of actions performed using standard-library macros and functions, but rather than the implementation ensures that behaviors of such macros and functions match requirements.

Shingly answered 9/8, 2015 at 19:55 Comment(0)
N
1

This is basically equivalent to asking whether this is UB:

s* p = 0;
volatile auto& r = p->m;

Clearly no memory access is generated to the target of r, because it's volatile and the compiler is prohibited from generating spurious accesses to volatile variables. But *s is not volatile, so the compiler could possibly generate an access to it. Neither the address-of operator nor casting to reference type creates an unevaluated context according to the standard.

So, I don't see any reason for the volatile, and I agree with the others that this is undefined behavior according to the standard. Of course, any compiler is permitted to define behavior where the standard leaves it implementation-specified or undefined.

Finally, a note in section [dcl.ref] says

in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the "object" obtained by dereferencing a null pointer, which causes undefined behavior.

Navada answered 22/6, 2011 at 2:3 Comment(1)
Any poitner rvalue which is used for any purpose must always be null, a pointer within a valid object, or a pointer to the space immediately following a valid object. Any action which would create or use a pointer rvalue which is not one of the above will invoke Undefined Behavior, whether or not the pointer is ever dereferenced.Shingly
P
0

It is NOT undefined behavior in C++ if m is at offset 0 within the structure s, as well as in certain other cases. According to Issue 232 (emphasis mine):

The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points, if any. If the pointer is a null pointer value (7.11 [conv.ptr]) or points one past the last element of an array object (8.7 [expr.add]), the result is an empty lvalue and does not refer to any object or function. An empty lvalue is not modifiable.

Therefore, the &((s *)0)->m is undefined behavior only if m is neither at offset 0, nor at an offset corresponding to an address which is one past the last element of an array object. Note that adding a 0 offset to null is allowed in C++ but not in C.

As others have noted, the compiler is allowed (and extremely likely) to not ever create the undefined behavior, and may be packaged with libraries that make use of the specific compiler's enhanced specifications.

Prut answered 1/3, 2019 at 6:40 Comment(0)
A
-4

No, this is NOT undefined behaviour. The expression is resolved at runtime.

Note that it is taking the address of the member m from a null pointer. It is NOT dereferencing the null pointer.

Aloysia answered 21/6, 2011 at 23:55 Comment(14)
((s *)0)->m obviously does dereference a null pointer. -> is dereference too.Lament
"The expression is compiled at runtime"?Sting
Aren't things complied at compile time?Cazzie
@Richard Schneider: If interpreted as plain C/C++ code, it is dereferencing the null pointer, no question about it.Aldine
@Xeo, @Andrey: The C99 standard explicitly states that &*NULL is equivalent to NULL (I don't know if C++ says something equivalent). I haven't yet found anything that covers this specific case, though.Sting
@OliCharlesworth "I don't know if C++ says something equivalent)." I hope C++ never imports this C nonsense!Sowers
"The expression is resolved at runtime" that doesn't mean anything.Sowers
@OliverCharlesworth I'm pretty sure the C99 standard does not state that, neither explicitly nor implicitly. Correct me if I'm wrong, though.Rotation
@Sowers To be pedantic, "The expression is resolved at runtime" does mean something, namely that the expression is resolved at runtime, however, solving an expression at runtime in this context means that the expression is evaluated (i.e. calculated) at runtime. This statement is not necessarily true though, as compilers generally define ((*s)0)->m to return the offset of member m in the struct or union s, which can and frequently will be calculated at compile time.Rotation
@YoYoYonnY - see footnote 87.Sting
@OliverCharlesworth Interesting! I always thought gcc warned about void *a = 0; &a[0]; because it was non-standard, but I guess it's just because gcc sees a[0] and expects that it dereferences a void pointer.Rotation
@OliverCharlesworth Footnotes aren't normative. Does the C standard specifies that dereferencing a null point is sometimes allowed?Sowers
@Sowers - Are you saying that the footnote is not to be trusted?Sting
@OliverCharlesworth Actually, nothing in the C std is trustworthy!Sowers

© 2022 - 2024 — McMap. All rights reserved.