Do C and C++ standards imply that a special value in the address space must exist solely to represent the value of null pointers?
Asked Answered
T

4

0

Following discussion from this question about null pointers in C and C++, I'd like to have the ending question separated here.

If it can be inferred from C and C++ standards (answers can target both standards) that dereferencing a pointer variable whose value is equal to the nullptr (or (void *)0) value is undefined behavior, does it imply that these languages require that a special value in the address space is dead, meaning that it's unusable except for the role of representing nullptr? What if the system has a really useful function or data structure at the same address that's equal to nullptr? Should this never happen because it's a compiler's writer responsibility to figure out a non-conflicting null pointer value for each system the compiler compiles to? Or should the programmer that needs to access such function or data structure be content while programming in "undefined behavior mode" to achieve its intents?

This looks like blurring the lines of the roles of a compiler and a computer system. I would ask whether it's right to do so, but I guess there's no room for this here.

This blog post digs about tackling the problem situation

Tammietammuz answered 18/2, 2015 at 0:52 Comment(18)
Can we not pick a single language, please? That's the second time you've lumped C and C++ together on this topic, yet they are distinct languages and you were asked before to choose one.Mercantile
I agree with separating the languages because they are both distinct languages, but the NULL in C++ (with the exception of nullptr) and NULL in C is the same thing so why separate them? Just ask one question to conquer both languages? I don't feel this question is specific to C or C++ but rather applies to both, no?Dissentious
@Brandon: That's the purview of an answer, not a question. For one thing, nullptr doesn't even exist in C, and the definition of NULL differs between the two languages. Furthermore, there is a ton of wording difference between the two around this topic. If the answers happen to be the same then great, but lumping together two distinct questions about two distinct languages just on the offchance of that is wrong.Mercantile
@Brandon: The definitions of a null pointer are very similar in C and in C++. The definition of NULL has a somewhat significant difference. In C, ((void*)0) is a valid definition for NULL; in C++, it's not. That's not relevant to what you're asking, but it illustrates that you shouldn't assume C and C++ are identical in some way.Exorable
I don't know what the C and C++ fuss is about. The post clearly asks two questions. One about C and one about C++. No assumption about similarity between C and C++ was made or stated. Do C and C++ [blah blah]? The answer can be C [blah blah] but C++ doesn't [blah blah], or it can be both C and C++ [blah blah]. I don't think there's anything wrong with asking 2 questions in the same post.Verrocchio
@LightnessRacesinOrbit exactly what thang said. This is clear from the question, and as I've already explained before where you've probably already read.Tammietammuz
@thang: You've hit the nail on the head then tripped up at the last hurdle: two questions in one "post" is self-evidently wrong. On Stack Overflow, a post is a question. And we have one question per question, please.Mercantile
@pepper_chico: And I've already explained it, too. Please do read carefully.Mercantile
@LightnessRacesinOrbit there're many, many questions in SO that contains lots of subquestions, this is indeed another one, if you feel it's really that bad for the platform, feel free to vote for closing it.Tammietammuz
I personally find questions with lots of subquestions annoying and counter productive. and the consensus agrees. As an example, your previous post was even worse, containing multiple questions about multiple languages, and the result was a poor question and answers that failed to answer the problem. I'm not voting to close at this point, I am simply downvoting.Fifty
@Yakk ok no problem, I'll make sure next time I ask any question that concerns both C and C++, I'll make two almost identical questions! I guess they will differ solely about the lack of ++. But you know what, I don't doubt someone comes along to tag one of the questions as a duplicate.Tammietammuz
@pepper_chico: Ah yes, the old "if everybody else does it, that means it's okay" argument. Just because we have failed to close all of them doesn't mean they are alright that way.Mercantile
@pepper_chico: It's not "solely about the lack of ++" and you full well know it. The differences between the two languages go far beyond two ASCII characters.Mercantile
@LightnessRacesinOrbit I was talking about the question's content solely.Tammietammuz
@pepper_chico: My point stands!Mercantile
@LightnessRacesinOrbit Not when talking about character's content from text in a question to another. It's solely related to this that I'm talking about it being almost identical.Tammietammuz
@pepper_chico: My point still stands. Let's just leave it there.Mercantile
@LightnessRacesinOrbit I'm thankful for that.Tammietammuz
F
3

does it imply that these languages require that a special value in the address space is dead, meaning that it's unusable except for the role of representing nullptr?

No.

The compiler needs a special value to represent a null pointer, and must take care that it does not place any object or function at that address, because all pointers to objects and functions are required to compare unequal to the null pointer. The standard library must take similar precautions in its implementation of malloc and friends.

However, if there is something at that address already, something that no strictly conforming program can access, then an implementation is allowed to support dereferencing the null pointer to access it. Dereferencing the null pointer is undefined in standard C, so an implementation can make it do anything it likes, including the obvious.

Both the C and the C++ standards understand the concept of the as-if rule, which basically means that if to valid input, an implementation is indistinguishable from one that conforms to the standard, then it does conform to the standard. The C standard uses a trivial example:

5.1.2.3 Program execution

10 EXAMPLE 2 In executing the fragment

char c1, c2;
/* ... */
c1 = c1 + c2;

the "integer promotions" require that the abstract machine promote the value of each variable to int size and then add the two ints and truncate the sum. Provided the addition of two chars can be done without overflow, or with overflow wrapping silently to produce the correct result, the actual execution need only produce the same result, possibly omitting the promotions.

Now, if c1 and c2's values come from registers, and it's possible to force values outside of char's range into those registers (e.g. by inline assembly), then the fact that the implementation optimises away the integer promotions might be observable. However, since the only way to observe it is through undefined behaviour or implementation extensions, there is no way for any standard code to be affected by this, and an implementation is allowed to do it.

This is the same logic that applies to getting useful results when dereferencing null pointers: there are only two ways to see, from code, that there is something meaningful at that particular address: getting a null pointer from an evaluation that is guaranteed to produce a pointer to an object, or by just trying it. The former is what I mentioned the compiler and standard library must take care of. The latter is not something that can affect a valid standard program.


A well-known example is the interrupt vector table on DOS implementations, which resides at address zero. It is typically accessed simply by dereferencing a null pointer. The C and C++ standards don't, shouldn't and cannot cover access to the interrupt vector table. They do not define such behaviour, but they do not restrict access to it either. Implementations should be and are allowed to provide extensions to access it.

Favrot answered 18/2, 2015 at 10:40 Comment(17)
The other obvious example besides the interrupt table is main() itself. You can't legally form a pointer to that, so &main == nullptr may evaluate to true on a hypothetical implementation.Fustigate
@Fustigate Only in C++, though. C does allow you to take a pointer to that. And C++ requires binary compatibility with a conforming C99 implementation in its extern "C" support, so any representation of null pointers that doesn't work for C also can't work for C++. But, I suppose a C++ compiled program could have a different address for main than a C compiled program would.Favrot
IIRC, that binary compatibility is limited, and in particular main would still need to be compiled as C++ because C99's startup code cannot be expected to call C++ constructors before main is entered. Which BTW is closely related to the reason C++ doesn't allow main to be recursively called, on reasonable implementations that would rerun global ctors.Fustigate
@Fustigate Yep, and that's something that really does happen on Cygwin, which is easy to see if C++ code defines main, and C code contains an external declaration of main to call it. Valid code cannot do this, of course.Favrot
I've removed yours as correct answer because #28574569Tammietammuz
@pepper_chico You're free to pick whichever answer you feel is best, but the reasoning you give seems flawed to me. Yes, a null pointer is guaranteed not to point to any function or object, but if no strictly conforming program can tell, then even if such data structures as the IVT do qualify as objects, the as-if rule still applies: an implementation that does not place anything useful at that address can clearly conform. To strictly conforming programs, an implementation that does place something useful there is indistinguishable from one that doesn't. Therefore, it's allowed.Favrot
@hvd I just want to pick the correct, but this is hard huh?! I just came back to this topic out of nothing on irc a week back... Will try grasp this your added reasoning, you could add it to the answer if you wish too.Tammietammuz
@hvd my point on the referred comment is that, you can't write a conforming program that dereferences the null pointer because the standards puts that guarantee. If you're able to do it, you're breaking that law, even if it's indistinguishable on a strictly conforming program whether a pointer is null when dereference is applied. If it's the null value the implementation has chosen for at runtime, the implementation could still enforce a null pointer exception at runtime, just because it's breaking the law, regardless the platform having data there and it being a bad implementation choice.Tammietammuz
@pepper_chico The C standard has an example of the as-if rule, I included it in my answer. I hope it answers the concerns in your comments, I'll re-read them as well to check if I missed something. Note that Keith Thompson's answer that you now accepted is right as well. It merely focuses on a different aspect of your question, one that I at the time did not think was the main focus, but I could be wrong on that.Favrot
@pepper_chico BTW, "If it's the null value the implementation has chosen for at runtime, the implementation could still enforce a null pointer exception at runtime" -- Sure, but that wasn't what we were talking about. An implementation is certainly allowed to reject attempts to dereference a null pointer, regardless of what the underlying hardware would do. If anything I wrote in my answer says otherwise, I severely misspoke, but I don't think anything did.Favrot
@hvd thanks anyway, I see the practical aspect of yours and others too, but from what I can tell, and even reading the question again (plus the blog post I provided), it's not even a question of working in UB mode, but simply getting out of the standard to get the job done.Tammietammuz
@hvd reading your extended answer I guess you're trying to point out source conforming vs behaviour conforming right? And given that it's impossible at the dereference of a pointer to infer in all possible cases, from source, whether it's null or not, then behaviour can act behind the scenes and "break the law"?Tammietammuz
@pepper_chico "working in UB mode" does not make sense to me. A program's behaviour may be defined by the standard, defined by the implementation as an extension, or defined by neither. If it's the first, the standard's requirements apply. If it's either the second or third, the standard's requirements do not apply, at all, but there's still a huge difference between the two of those. Lumping them together under "UB mode" is not fair. Know that every time you define _POSIX_C_SOURCE as a macro, for example, that's UB for the C standard. Even though it's required by a different formal standardFavrot
@pepper_chico "And given that it's impossible at the dereference of a pointer to infer in all possible cases, from source, whether it's null or not, then behaviour can act behind the scenes and "break the law"?" -- Perhaps a more apt analogy would be to say you're already in another jurisdiction. :)Favrot
@hvd, summing it up. It's no because through an strictly conforming source code one is unable to tell whether a pointer dereference happens over a null pointer in all situations, hence the null pointer dereference can still happen through that source level inability of telling whether a pointer is null. OK, I guess yours in right on that then. This is hard, I've seen as-if rule on copy constructor aliasing, but I'm not even going dive that again for this.Tammietammuz
@hvd at last I guess it's better explained now (how to be a smooth criminal).Tammietammuz
s/copy constructor aliasing/copy constructor eliding/Tammietammuz
E
6

That depends on what is meant by the phrase "address space". The C standard uses the phrase informally, but doesn't define what it means.

For each pointer type, there must be a value (the null pointer) that compares unequal to a pointer to any object or function. That means, for example, that if a pointer type is 32 bits wide, then there can be at most 232-1 valid non-null values of that type. There could be fewer than that if some addresses have more than one representation, or if not all representations correspond to valid addresses.

So if you define the "address space" to cover 2N distinct addresses, where N is the width in bits of a pointer, then yes, one of those values must be reserved as the null pointer value.

On the other hand, if the "address space" is narrower than that (for example, typical 64-bit systems can't actually access 264 distinct memory locations), then the value reserved as the null pointer can easily be outside the "address space".

Some things to note:

  • The representation of a null pointer may or may not be all-bits-zero.
  • Not all pointer types are necessarily the same size.
  • Not all pointer types necessarily use the same representation for a null pointer.

On most modern implementations, all pointer types are the same size, and all represent a null pointer as all-bits-zero, but there are valid reasons to, for example, make function pointers wider than object pointers, or make void* wider than int*, or use a representation other than all-bits-zero for the null pointer.

This answer is based on the C standard. Most of it also applies to C++. (One difference is that C++ has pointer-to-member types, which are typically wider than ordinary pointers.)

Exorable answered 18/2, 2015 at 1:35 Comment(17)
I come with this question from experience not only regarding a big OS, what about some microcontrollers for example, or other rather well used systems, where the OEM does put a function/data lying at address 0 (or any) while the available C compiler of such platform may make a mess of a pointer of such value because it's the value of a null pointer.Tammietammuz
@pepper_chico: The same answer applies. Some address value must be reserved as a null pointer. That value needn't necessarily be 0x00000000, but it often is.Exorable
This, from the point of view of the C and C++ compiler implementers that must take care for which and each system it's compiling for. This is to me quite weird, turn the language less portable, but anyway, if it's the way it's, ¯_ツ_¯Tammietammuz
@pepper_chico: It doesn't harm portability as long as you write portable code. Well-written code doesn't care how a null pointer is represented.Exorable
Think how a compiler writer has to care to make the compiler source code agnostic to the actual value for the nullptr in a given platform. Also think how such code to be portable to any amount of platforms must take care to get an acceptable value for nullptr depeding of each platform it's being compiled for. For me, this looks like an additional problem to be dealt with, irrespective of the quality of the compiler source code, and it comes purely from language requirements.Tammietammuz
@pepper_chico: I was talking about C source code, not code that implements a C compiler. Since the representation of a null pointer is determined by the compiler, there's no reason to assume the compiler itself should be unaware of it.Exorable
As I've pointed, I was talking from the point-of-view of the compiler implementer. I see that one can write portable code irrespective of the actual value of the null pointer. It's because of this, the compiler implementaion, that I said it turns the language less portable.Tammietammuz
"All pointers are the same size" is usually true for C, but usually false for C++. The reason is that C++ has a pointer-to-member type with unusual semantics. These semantics usually won't fit in sizeof(void*) bytes.Fustigate
@Fustigate Pointer-to-member types are not a subset of the pointer types, though. [basic.compound] defines "pointers to void or objects or functions (including static members of classes) of a given type", and separately, "pointers to non-static class members, which identify members of a given type within objects of a given class". "All pointers are the same size" is still usually true for C++, because pointer-to-member types don't count.Favrot
@pepper_chico, I don't get your point about compiler implementation details making the language less portable. Of course the compiler isn't portable--that's it's job, to convert portable programs into non-portable machine-specific assembly. The compiler should know every little quirk of every different machine. It's job is to know that so the application programmer doesn't have to.Florinda
@LeeDanielCrocker as can be seen from many references (committee discussions, inconsistent wording, etc), this is controversial. Besides, it imposes an extra difficulty on compliant implementations or, if it's not compliant, leaves the burden of programming in undefined behavior mode to the programmer. This small language detail that has implications going across compilers, system architecture and programmer's perpective doesn't turn things easier. Hence technically in terms of difficulty of implementation and usage, it's less portable than a language whose design this would be a non-issue.Tammietammuz
@pepper_chico: Can it cite some of those references? The language's definition of a null pointer is unambiguous, and it happens to leave the representation of a null pointer up to each implementation. Choosing to use a non-zero representation would impose some slight overhead on an implementation (for example, a static pointer object could not simply be zeroed), but that's frankly trivial. What change to the language do you think would make the language more "portable"?Exorable
@pepper_chico: I don't see how that supports your point -- perhaps because I don't entirely understand what point you're making. Nothing in the cited document mentions the representation of a null pointer, which is what I thought we were talking about. What problem do you see, and what change would you make to the language (C and/or C++) to address it?Exorable
@pepper_chico: 232 is quite old. I know, I was on the CWG back then. The issue there is more related to code reordering - how soon can a compiler actually execute a dereference? A reasonable idea is that *p is sufficient justification for the compiler to actually read the contents, even if that value is needed only a few instructions later.Fustigate
@KeithThompson I got that reference from my former question, hvd has provided it. In the context of the two questions you may undertand it. When I said it's controversial, I'm talking about both subjects as a whole. By the way, I'm not so into in the discussion anymore, what I accept now is that, implementation wise, this won't be settled, there will be issues, because of how the language historically evolved regarding this subject. I'm just more aware of what I may find.Tammietammuz
Correct answer is yes, I accepted no before but it came to my senses that it's guaranteed that null must not point to any object or function, which comes before its dereference being UB. The previously accepted answer only covered UB, which could possibly be defined by the platform. Regardless whether that's defined in the plataform, the language asserts and guarantees null not pointing to an object. So I view now that being able to do that is just outside of what the standard is able to cover.Tammietammuz
Scratch that, other answer's author changed my mind again...Tammietammuz
F
3

does it imply that these languages require that a special value in the address space is dead, meaning that it's unusable except for the role of representing nullptr?

No.

The compiler needs a special value to represent a null pointer, and must take care that it does not place any object or function at that address, because all pointers to objects and functions are required to compare unequal to the null pointer. The standard library must take similar precautions in its implementation of malloc and friends.

However, if there is something at that address already, something that no strictly conforming program can access, then an implementation is allowed to support dereferencing the null pointer to access it. Dereferencing the null pointer is undefined in standard C, so an implementation can make it do anything it likes, including the obvious.

Both the C and the C++ standards understand the concept of the as-if rule, which basically means that if to valid input, an implementation is indistinguishable from one that conforms to the standard, then it does conform to the standard. The C standard uses a trivial example:

5.1.2.3 Program execution

10 EXAMPLE 2 In executing the fragment

char c1, c2;
/* ... */
c1 = c1 + c2;

the "integer promotions" require that the abstract machine promote the value of each variable to int size and then add the two ints and truncate the sum. Provided the addition of two chars can be done without overflow, or with overflow wrapping silently to produce the correct result, the actual execution need only produce the same result, possibly omitting the promotions.

Now, if c1 and c2's values come from registers, and it's possible to force values outside of char's range into those registers (e.g. by inline assembly), then the fact that the implementation optimises away the integer promotions might be observable. However, since the only way to observe it is through undefined behaviour or implementation extensions, there is no way for any standard code to be affected by this, and an implementation is allowed to do it.

This is the same logic that applies to getting useful results when dereferencing null pointers: there are only two ways to see, from code, that there is something meaningful at that particular address: getting a null pointer from an evaluation that is guaranteed to produce a pointer to an object, or by just trying it. The former is what I mentioned the compiler and standard library must take care of. The latter is not something that can affect a valid standard program.


A well-known example is the interrupt vector table on DOS implementations, which resides at address zero. It is typically accessed simply by dereferencing a null pointer. The C and C++ standards don't, shouldn't and cannot cover access to the interrupt vector table. They do not define such behaviour, but they do not restrict access to it either. Implementations should be and are allowed to provide extensions to access it.

Favrot answered 18/2, 2015 at 10:40 Comment(17)
The other obvious example besides the interrupt table is main() itself. You can't legally form a pointer to that, so &main == nullptr may evaluate to true on a hypothetical implementation.Fustigate
@Fustigate Only in C++, though. C does allow you to take a pointer to that. And C++ requires binary compatibility with a conforming C99 implementation in its extern "C" support, so any representation of null pointers that doesn't work for C also can't work for C++. But, I suppose a C++ compiled program could have a different address for main than a C compiled program would.Favrot
IIRC, that binary compatibility is limited, and in particular main would still need to be compiled as C++ because C99's startup code cannot be expected to call C++ constructors before main is entered. Which BTW is closely related to the reason C++ doesn't allow main to be recursively called, on reasonable implementations that would rerun global ctors.Fustigate
@Fustigate Yep, and that's something that really does happen on Cygwin, which is easy to see if C++ code defines main, and C code contains an external declaration of main to call it. Valid code cannot do this, of course.Favrot
I've removed yours as correct answer because #28574569Tammietammuz
@pepper_chico You're free to pick whichever answer you feel is best, but the reasoning you give seems flawed to me. Yes, a null pointer is guaranteed not to point to any function or object, but if no strictly conforming program can tell, then even if such data structures as the IVT do qualify as objects, the as-if rule still applies: an implementation that does not place anything useful at that address can clearly conform. To strictly conforming programs, an implementation that does place something useful there is indistinguishable from one that doesn't. Therefore, it's allowed.Favrot
@hvd I just want to pick the correct, but this is hard huh?! I just came back to this topic out of nothing on irc a week back... Will try grasp this your added reasoning, you could add it to the answer if you wish too.Tammietammuz
@hvd my point on the referred comment is that, you can't write a conforming program that dereferences the null pointer because the standards puts that guarantee. If you're able to do it, you're breaking that law, even if it's indistinguishable on a strictly conforming program whether a pointer is null when dereference is applied. If it's the null value the implementation has chosen for at runtime, the implementation could still enforce a null pointer exception at runtime, just because it's breaking the law, regardless the platform having data there and it being a bad implementation choice.Tammietammuz
@pepper_chico The C standard has an example of the as-if rule, I included it in my answer. I hope it answers the concerns in your comments, I'll re-read them as well to check if I missed something. Note that Keith Thompson's answer that you now accepted is right as well. It merely focuses on a different aspect of your question, one that I at the time did not think was the main focus, but I could be wrong on that.Favrot
@pepper_chico BTW, "If it's the null value the implementation has chosen for at runtime, the implementation could still enforce a null pointer exception at runtime" -- Sure, but that wasn't what we were talking about. An implementation is certainly allowed to reject attempts to dereference a null pointer, regardless of what the underlying hardware would do. If anything I wrote in my answer says otherwise, I severely misspoke, but I don't think anything did.Favrot
@hvd thanks anyway, I see the practical aspect of yours and others too, but from what I can tell, and even reading the question again (plus the blog post I provided), it's not even a question of working in UB mode, but simply getting out of the standard to get the job done.Tammietammuz
@hvd reading your extended answer I guess you're trying to point out source conforming vs behaviour conforming right? And given that it's impossible at the dereference of a pointer to infer in all possible cases, from source, whether it's null or not, then behaviour can act behind the scenes and "break the law"?Tammietammuz
@pepper_chico "working in UB mode" does not make sense to me. A program's behaviour may be defined by the standard, defined by the implementation as an extension, or defined by neither. If it's the first, the standard's requirements apply. If it's either the second or third, the standard's requirements do not apply, at all, but there's still a huge difference between the two of those. Lumping them together under "UB mode" is not fair. Know that every time you define _POSIX_C_SOURCE as a macro, for example, that's UB for the C standard. Even though it's required by a different formal standardFavrot
@pepper_chico "And given that it's impossible at the dereference of a pointer to infer in all possible cases, from source, whether it's null or not, then behaviour can act behind the scenes and "break the law"?" -- Perhaps a more apt analogy would be to say you're already in another jurisdiction. :)Favrot
@hvd, summing it up. It's no because through an strictly conforming source code one is unable to tell whether a pointer dereference happens over a null pointer in all situations, hence the null pointer dereference can still happen through that source level inability of telling whether a pointer is null. OK, I guess yours in right on that then. This is hard, I've seen as-if rule on copy constructor aliasing, but I'm not even going dive that again for this.Tammietammuz
@hvd at last I guess it's better explained now (how to be a smooth criminal).Tammietammuz
s/copy constructor aliasing/copy constructor eliding/Tammietammuz
M
2

Yes, that's precisely what it means.

[C++11: 4.10/1]: [..] A null pointer constant can be converted to a pointer type; the result is the null pointer value of that type and is distinguishable from every other value of object pointer or function pointer type. [..]

The null pointer value doesn't need to be 0x00000000, but it does need to be unique; there's no other way to make this rule work.

It's certainly not the only rule of the abstract machine that implicitly emplaces strict limitations upon practical implementations.

What if the OS puts a really useful function or data structure at the same address that's equal to nullptr?

The OS won't do that but it can be exploited.

Mercantile answered 18/2, 2015 at 0:54 Comment(36)
Please notice from previous discussion that's quite important whether a paragraph targets solely the constant or not.Tammietammuz
@pepper_chico: It doesn't matter. A pointer with the null pointer value can always trace its state — through assignments, copies or moves — to a null pointer constant. There's no other way to obtain it.Mercantile
... except through zero-initialisation, which also produces the null pointer value (per footnote 103).Mercantile
@LightnessRacesinOrbit Wait, so while(ptr)ptr--; will never yield a nullptr?Whizbang
@BWG: Not in a well-defined fashion, no. You are not [compliantly] permitted to decrement a pointer past the beginning of the object to which it points. It is a myth that simply won't die that you can just assign anything you like to a pointer! Unless you're wondering about within an array, pointer arithmetic is mostly UB. Of course in practice you can probably get your compiler to make it happen. Again, though, remember that neither nullptr nor (void*)0 needs to be actually 0x00000000.Mercantile
But the tracing aspect says nothing about it being UB at the point it's not the constant anymore (or a cast of it to other pointer type). Following strictly the standard, when it's defining some behavior about usage of the constant, I'm accepting it's just talking about it.Tammietammuz
@pepper_chico: It still doesn't matter. Simply shift your question to the null pointer constant which, as I've shown, has the same value and is thus precisely as valid a candidate for such a discussion.Mercantile
It is conceivable for the compiler to use a different value to denote null pointer. It could very well be 0xffffff (for 32-bit), and this is not that far fetched.Verrocchio
@LightnessRacesinOrbit If it's UB, could a compiler decide to crash the program if a pointer reaches out of its bounds? That would actually be really neat for debugging. And by reaches, i mean if the value even exists, not just if you attempt to access the memory?Whizbang
@BWG: Yes, absolutely. There are widely-used tools that do precisely that! [edit: You changed your question! I don't know whether that exists currently, but theoretically a compiler could generate code to do that in many cases, yes. I'm not sure whether it can feasibly do so in the general case.]Mercantile
I'd like to point out that there are limitations in electric fence type approach (en.wikipedia.org/wiki/Memory_debugger). For example, if you have two objects a and b, and pointers to them. If you over run a and into b without hitting any other addresses, there is no way for the compiler/library to pick this up.Verrocchio
i worked on a machine where ffffffff was null. It was a shock to me to see 0 as null. 0 seems a poor choice since every machine (real or virtual) has a an address 0 but most (or many) dont have an address FFFFFFFF so making fffffffff 'reserved' is less invasiveYaelyager
@pm100: An interesting observation.Mercantile
Why shift the question to refer to the constant instead? Your assumption is that it doesn't matter, mine is that it does. If you have a quote that states it doesn't so your assumption would be fine. In both questions I've clearly distinguished the cases, in the previous I got quotes that addressed specifically what I'm asking, differently from your quoting.Tammietammuz
@pepper_chico: I'm not "assuming" anything. At this stage I'm not even sure what you're arguing about..? The meaning and intent of the standard passage are both quite clear. I don't comprehend how else you expect that a null pointer value could possibly work.Mercantile
The standard may be being clear that (int *)(void *)0 and variations are UB, but not so clear about *ptr, whether ptr is equal to (void *)0 or not. Previously I got quotations addressing this specifically, which means this difference of context in expressions is meaningful in some sense.Tammietammuz
@pepper_chico: You've lost me.Mercantile
@LightnessRacesinOrbit: "A pointer with the null pointer value can always trace its state — through assignments, copies or moves — to a null pointer constant. There's no other way to obtain it." That's not quite true. For example, a failing malloc() or fopen() call returns a null pointer, and there's no requirement that either of them must be implemented in C (or C++).Exorable
@Keith: Don't ruin this with facts!Mercantile
@LightnessRacesinOrbit Trivia time. What if I have a valid T* named t that I increase by a large amount, and causes the pointer to overflow into a valid range? Is that UB?Whizbang
@BWG: Yes.​​​​​ Pointers aren't integers. You're not supposed to use them to skit around your process's memory space like that.Mercantile
I don't understand.. If null could be defined as 0xFFFFFFFF or w/e else.. This means that using a pointer in a conditional statement like: while (!p) or if (!p) would be undefined? and instead you'd have to do: if (p != NULL)? That would destroy so many implementations of the standard library, no?Dissentious
@Brandon: No. The ! doesn't compare against a verbatim 0x00000000 in memory; pointers aren't integers. They are pointers! In fact, the standard says explicitly that converting a pointer to bool results in false iff the pointer holds the null pointer value. It doesn't really matter what that value is. (This, along with the special case that integer 0 is a null pointer constant regardless of how it'll be represented as a pointer in memory, means that this apparent oddity is rendered largely transparent to you in every way that matters.)Mercantile
@Brandon: You are confusing the value and the actual bit-pattern of adress that is underlying it. In C null pointer constant is always defined as either 0 or (void *) 0 and there are no exceptions. See c-faq.com/null/machnon0.html for more explanation.Handicraft
@Brandon: while (!p) is equivalent to while (p == 0), which by definition compares p to a null pointer. 0 is a null pointer constant, which is converted to a null pointer value. If a null pointer's representation is not all-bits-zero, the conversion is non-trivial.Exorable
Seems acceptable anyways... What I meant that made you lost yourself :-) is that the value of nullptr or (void *) 0 is a compile time value, while ptr which may hold a nullptr value is a runtime value, and I know, that are means to have ptr as nullptr in untraceable ways. In truth it doesn't matter, how can an implementation turn *ptr illegal, not because ptr is not pointing to object/function in memory, but just because its value is nullptr. It may indeed be implemented, but it's nonsense for me.Tammietammuz
This commitee discussion is relevant about that: open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232Tammietammuz
I didn't lose myself; you lost me! And I'm sorry but I still have absolutely no idea what you're asking. Could you phrase it in a clearer way?Mercantile
@GrzegorzSzpetkowski, you are mistaken that null pointer values are "defined" as 0. The spec says that the integer value zero, or an expression of that value, cast to a pointer or not, must compare unequal to any valid pointer. That's a restriction on how == and != are to behave, not on how the underlying implementation might work.Florinda
@LeeDanielCrocker: The C11 spec says in 6.3.2.3/p3: An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant.66). Now, I guess you are talking about C++ Standard and if I am right, that is the exact reason why both languages shouldn't merged into one question.Handicraft
@GrzegorzSzpetkowski: Plot twist: the wording's basically the same in C++. I don't know what Lee's talking about. He seems to be confusing underlying representation with a few things.Mercantile
No, I was talking about C--I wouldn't touch C++ with a ten-foot-pole. But my point is that very paragraph you partially quote is quite clear that the behavior of this thing called "null pointer" is that it must compare unequal to any valid pointer. That's it. It has no other restrictions on how it might be implemented or what the bit pattern is. If you compare a valid pointer to the integer 0, it must fail--even if that pointer happens to be a valid pointer to address 0, which it might well be.Florinda
@LeeDanielCrocker: We all know that. The point is that you can often use logical deduction to come up with unavoidable constraints on the implementation. As far as I can tell, this is one of those times.Mercantile
The OP asks if one of those constraints is if there's any part of the address space that a C program cannot address ("dead" in his terms). The answer to that is quite clearly NO. Any C implementation can allow pointers to every single bit of address space, period, if it wants to. All it has to do for that to work is for pointers to have at least one more possible value than processor addresses, perhaps by being one or more bits wider, or perhaps by other means. I'm sure we agree here, we're just expressing it differently.Florinda
"for pointers to have at least one more possible value than processor addresses" Right, which is generally impossible when everything, including the process address space, is fundamentally based on 32-bit (or 64-bit) addressing. You have to set aside one address, because fundamentally you can't expose more than the OS does. The OS already maxes it out.Mercantile
@pepper_chico: Two years later, I still don't know what you're talking about. Let's allow this thread to rest now.Mercantile
O
2

does it imply that these languages require that a special value in the address space is dead, meaning that it's unusable except for the role of representing nullptr?

Yes.

C has requirements for null pointer that make it different to object pointers:

(C11, 6.3.2.3p3) "[...] If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function."

What if the system has a really useful function or data structure at the same address that's equal to nullptr? Should this never happen because it's a compiler writer responsibility to figure out a non-conflicting null pointer value for each system the compiler compiles to?

The New C Standard by Derek M. Jones provides the following commentary on implementations:

All bits zero is a convenient execution-time representation of the null pointer constant for many implementations because it is invariably the lowest address in storage. (The INMOS Transputer[632] had a signed address space, which placed zero in the middle.) Although there may be program bootstrap information at this location, it is unlikely that any objects or functions will be placed here. Many operating systems leave this storage location unused because experience has shown that program faults sometimes cause values to be written into the location specified by the null pointer constant (the more developer-oriented environments try to raise an exception when that location is accessed).

Another implementation technique, when the host environment does not include address zero as part of a processes address space, is to create an object (sometimes called _ _null) as part of the standard library. All references to the null pointer constant refer to this object, whose address will compare unequal to any other object or function.

Overrun answered 18/2, 2015 at 9:53 Comment(3)
As pointed out correctly by KT, there is no requirement whatsoever that pointer values represent the address space at all, so to say that they require a "dead spot in the address space" is mistaken.Florinda
@LeeDanielCrocker My answer covers the general case but of course if your pointer width is larger than the available address space, no "dead spot" is required because the "dead spot" can anyway be set at an address outside the available address space.Overrun
It's interesting to learn about implementation techniques for this, thanks.Tammietammuz

© 2022 - 2024 — McMap. All rights reserved.