Is a pointer with the right address and type still always a valid pointer since C++17?
Asked Answered
O

3

85

(In reference to this question and answer.)

Before the C++17 standard, the following sentence was included in [basic.compound]/3:

If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained.

But since C++17, this sentence has been removed.

For example I believe that this sentence made this example code defined, and that since C++17 this is undefined behavior:

 alignas(int) unsigned char buffer[2*sizeof(int)];
 auto p1=new(buffer) int{};
 auto p2=new(p1+1) int{};
 *(p1+1)=10;

Before C++17, p1+1 holds the address to *p2 and has the right type, so *(p1+1) is a pointer to *p2. In C++17 p1+1 is a pointer past-the-end, so it is not a pointer to object and I believe it is not dereferencable.

Is this interpretation of this modification of the standard right or are there other rules that compensate the deletion of the cited sentence?

Obryan answered 2/1, 2018 at 14:0 Comment(7)
Note: there are new/updated rules about pointer provenance in [basic.stc.dynamic.safety] and [util.dynamic.safety]Supertax
@Supertax That matters only on implementations with strict pointer safety, which is an empty set (to within experimental error).Maculation
The quoted statement has never been actually true in practice. Given int a, b = 0;, you can't do *(&a + 1) = 1; even if you checked &a + 1 == &b. If you can obtain a valid pointer to an object by just guessing its address, then even storing local variables in registers becomes problematic.Maculation
@Maculation 1) Which compiler puts a var in reg after you have taken its address? 2) How do you guess an address correctly without measuring it?Colorcast
@Colorcast Exactly that is why simply casting a number obtained by other means (e.g., guessing) to the address where an object happens to be is problematic: It aliases that object but the compiler is not aware of it. If you, by contrast, take the object's address it's as you say: the compiler is warned and syncs accordingly.Staggard
@PeterA.Schneider Trying to guess is one thing, but how are you going to verify your guess, for a complete object of scalar type, if its address is never taken?Colorcast
"regardless of how the value was obtained" doesn't mean "using invalid operations" like violating [expr.add] or obtaining pointer value from /dev/random. At least it was not intended to be read like this.Ryun
D
46

Is this interpretation of this modification of the standard right or are there other rules that compensate the deletion of this cited sentence?

Yes, this interpretation is correct. A pointer past the end isn't simply convertible to another pointer value that happens to point to that address.

The new [basic.compound]/3 says:

Every value of pointer type is one of the following:
(3.1) a pointer to an object or function (the pointer is said to point to the object or function), or
(3.2) a pointer past the end of an object ([expr.add]), or

Those are mutually exclusive. p1+1 is a pointer past the end, not a pointer to an object. p1+1 points to a hypothetical x[1] of a size-1 array at p1, not to p2. Those two objects are not pointer-interconvertible.

We also have the non-normative note:

[ Note: A pointer past the end of an object ([expr.add]) is not considered to point to an unrelated object of the object's type that might be located at that address. [...]

which clarifies the intent.


As T.C. points out in numerous comments (notably this one), this is really a special case of the problem that comes with trying to implement std::vector - which is that [v.data(), v.data() + v.size()) needs to be a valid range and yet vector doesn't create an array object, so the only defined pointer arithmetic would be going from any given object in the vector to past-the-end of its hypothetical one-size array. Fore more resources, see CWG 2182, this std discussion, and two revisions of a paper on the subject: P0593R0 and P0593R1 (section 1.3 specifically).

Dorty answered 2/1, 2018 at 14:14 Comment(12)
This example is basically a special case of the known "vector implementability problem". +1.Maculation
@Maculation Does C++17 add a new "vector implementability problem" ? Or this one was already there?Obryan
@Obryan The general case has existed since C++03. The root cause is pointer arithmetic not working as expected because you don't have an array object.Maculation
@Maculation I believed the only problem came from the restriction on pointer arithmetic. Is not this sentence deletion add a new problem? Is the code example also UB in pre-C++17?Obryan
@Obryan If the pointer arithmetic is fixed, then your p1+1 would no longer produce a past-the-end pointer and the whole discussion about past-the-end pointers is moot. Your particular two-element special case may not be UB pre-17, but it's also not very interesting.Maculation
@Maculation I admit this unusefullness, my point was only to know if C++17 introduced more access limitation rules, you have answered this question, thanks:) (The other answer say the opposite, that I am still not 100% sure)Obryan
@RJFalconer Thanks for the edit. I modified it slightly because I didn't want to change the OP quote.Dorty
I have asked a related questionObryan
@Maculation Can you point me somewhere that I can read up on this "vector implementability problem"?Musil
@Musil What Barry said and [vec.data(), vec.data() + vec.size()) is a valid rangeHighflown
@Musil See this.Maculation
We also have the non-normative note which clarifies the intent. Not really, because it does not specify what it means by "unrelated". Someone in a trolly mood could claim that the objects created in the same buffer are not unrelated.Ryun
H
8

In your example, *(p1 + 1) = 10; should be UB, because it is one past the end of the array of size 1. But we are in a very special case here, because the array was dynamically constructed in a larger char array.

Dynamic object creation is described in 4.5 The C++ object model [intro.object], §3 of the n4659 draft of the C++ standard:

3 If a complete object is created (8.3.4) in storage associated with another object e of type “array of N unsigned char” or of type “array of N std::byte” (21.2.1), that array provides storage for the created object if:
(3.1) — the lifetime of e has begun and not ended, and
(3.2) — the storage for the new object fits entirely within e, and
(3.3) — there is no smaller array object that satisfies these constraints.

The 3.3 seems rather unclear, but the examples below make the intent more clear:

struct A { unsigned char a[32]; };
struct B { unsigned char b[16]; };
A a;
B *b = new (a.a + 8) B; // a.a provides storage for *b
int *p = new (b->b + 4) int; // b->b provides storage for *p
// a.a does not provide storage for *p (directly),
// but *p is nested within a (see below)

So in the example, the buffer array provides storage for both *p1 and *p2.

The following paragraphs prove that the complete object for both *p1 and *p2 is buffer:

4 An object a is nested within another object b if:
(4.1) — a is a subobject of b, or
(4.2) — b provides storage for a, or
(4.3) — there exists an object c where a is nested within c, and c is nested within b.

5 For every object x, there is some object called the complete object of x, determined as follows:
(5.1) — If x is a complete object, then the complete object of x is itself.
(5.2) — Otherwise, the complete object of x is the complete object of the (unique) object that contains x.

Once this is established, the other relevant part of draft n4659 for C++17 is [basic.coumpound] §3(emphasize mine):

3 ... Every value of pointer type is one of the following:
(3.1) — a pointer to an object or function (the pointer is said to point to the object or function), or
(3.2) — a pointer past the end of an object (8.7), or
(3.3) — the null pointer value (7.11) for that type, or
(3.4) — an invalid pointer value.

A value of a pointer type that is a pointer to or past the end of an object represents the address of the first byte in memory (4.4) occupied by the object or the first byte in memory after the end of the storage occupied by the object, respectively. [ Note: A pointer past the end of an object (8.7) is not considered to point to an unrelated object of the object’s type that might be located at that address. A pointer value becomes invalid when the storage it denotes reaches the end of its storage duration; see 6.7. —end note ] For purposes of pointer arithmetic (8.7) and comparison (8.9, 8.10), a pointer past the end of the last element of an array x of n elements is considered to be equivalent to a pointer to a hypothetical element x[n]. The value representation of pointer types is implementation-defined. Pointers to layout-compatible types shall have the same value representation and alignment requirements (6.11)...

The note A pointer past the end... does not apply here because the objects pointed to by p1 and p2 and not unrelated, but are nested into the same complete object, so pointer arithmetics make sense inside the object that provide storage: p2 - p1 is defined and is (&buffer[sizeof(int)] - buffer]) / sizeof(int) that is 1.

So p1 + 1 is a pointer to *p2, and *(p1 + 1) = 10; has defined behaviour and sets the value of *p2.


I have also read the C4 annex on the compatibility between C++14 and current (C++17) standards. Removing the possibility to use pointer arithmetics between objects dynamically created in a single character array would be an important change that IMHO should be cited there, because it is a commonly used feature. As nothing about it exist in the compatibility pages, I think that it confirms that it was not the intent of the standard to forbid it.

In particular, it would defeat that common dynamic construction of an array of objects from a class with no default constructor:

class T {
    ...
    public T(U initialization) {
        ...
    }
};
...
unsigned char *mem = new unsigned char[N * sizeof(T)];
T * arr = reinterpret_cast<T*>(mem); // See the array as an array of N T
for (i=0; i<N; i++) {
    U u(...);
    new(arr + i) T(u);
}

arr can then be used as a pointer to the first element of an array...

Huskamp answered 2/1, 2018 at 17:40 Comment(16)
Aha, so the world has not gone crazy. +1His
@StoryTeller: I hope too. In addition not a word about it in the compatibility section. But it looks like the opposite opinion has more reputation here...Huskamp
You are seizing a single word, "unrelated", in a non-normative note, and giving it a meaning it cannot bear, in contradiction to the normative rules in [expr.add] governing pointer arithmetic. There is nothing in Annex C because the general-case pointer arithmetic has never worked in any standard. There is nothing to break.Maculation
@T.C.: I'm afraid that you are right... So you think that my example was already broken in C++11 or 14, despite the old [basic.compound] §3? But what could be the intent of placement new?Huskamp
Yes, that example's the very vector implementability problem I was alluding to in my comments under Barry's answer. It's always been broken.Maculation
Well then, I will leave my answer. Your comments and Barry's answer are enough to explain its limits. Because I really think that when the standard does not offer the necessary features to build a conformant standard library, the implementations have to fill the gap. But I must admit this is philosophy and no longer language-lawyer...Huskamp
But still your answer is interesting. The question was not about pointer arithmetic. This is why I took care to add 1 to the pointer and not 2 or a larger number. The question is about this concept of pointer value categorie listed in [[basic.compound]/3](eel.is/c++draft/basic.compound#3). I have difficulties in interpreting what is the actual value categorie of a pointer.Obryan
@SergeBallesta I have long thought this way but actualy there is a possibilty to do the pointer arithmetic you are looking for. The following expression is implementation defined behavior [exp.reinterpret_cast]: reinterpret_cast<int*>(reinterpret_cast<intptr_t>(p)+sizeof(int)). The categorie of the value of the pointer is implementation defined, this work on all compilers.Obryan
@SergeBallesta: It's ironic that one of the distinguishing features of C and C++ used to be that on most implementations could "self-host" the vast majority of the Standard library using code written in the language itself, but the Standard never recognized anything special about implementations that could do so.Enclosure
@T.C.: Google is very unhelpful in locating any information on this "vector implementability problem", could you help?Warble
@Maculation I agree with Mathieu and others: I've never heard something about this "vector implementability problem" and google doesn't help. Can you provide a link or some keywords to help the search? Thanks!Campball
@MatthieuM. See core issue 2182, this std-discussion thread, P0593R0, and P0593R1 (particularly section 1.3). The basic problem is that vector doesn't (and can't) create an array object, but has an interface that allows the user to obtain an pointer that supports pointer arithmetic (which is only defined for pointers into array objects).Maculation
@T.C.: <rant>But why the standard does not allow to process addresses the natural way? An array int [2][3] should be aliasable to an int[6], VLA should be allowed as parameters to function (how can a function receive multi-dimensional arrays of unknow dimensions?) and a byte array of size N*sizeof(T) should be aliasable to a T[N]. But maybe things would be too simple for programmers...</rant>Huskamp
@SergeBallesta: The problem is that in the 1980s, the authors of the Standard didn't think it necessary to fully categorize all of the behaviors that compilers naturally supported on platforms where they made sense, since they had no reason to expect that compiler writers would treat everything that wasn't explicitly mandated as undefined even in cases where there would be obvious useful behaviors.Enclosure
SergeBallesta and @T.C.: Does "vector implementability problem" have a solution?Tonytonya
"The basic problem is that vector doesn't (and can't) create an array object" But you don't need to. The so called lifetime rules are a failed attempt and obviously incorrect (and hilariously wrong with the use of the future tense to describe the runtime behavior - think about it, future!) Objects MUST exist at all time at all addresses. The std tried and failed to make up a different model. The only C++ model that is well defined is the one where any aligned address as all objects that fit at that address. There are so many holes in the C++ std, it isn't even funny.Colorcast
A
1

To expand on the answers given here is an example of what I believe the revised wording is excluding:

Warning: Undefined Behaviour

#include <iostream>
int main() {
    int A[1]{7};
    int B[1]{10};
    bool same{(B)==(A+1)};

    std::cout<<B<< ' '<< A <<' '<<sizeof(*A)<<'\n';
    std::cout<<(same?"same":"not same")<<'\n';
    std::cout<<*(A+1)<<'\n';//!!!!!  
    return 0;
}

For entirely implementation dependent (and fragile) reasons possible output of this program is:

0x7fff1e4f2a64 0x7fff1e4f2a60 4
same
10

That output shows that the two arrays (in that case) happen to be stored in memory such that 'one past the end' of A happens to hold the value of the address of the first element of B.

The revised specification is ensuring that regardless A+1 is never a valid pointer to B. The old phrase 'regardless of how the value is obtained' says that if 'A+1' happens to point to 'B[0]' then it's a valid pointer to 'B[0]'. That can't be good and surely never the intention.

Ageless answered 3/1, 2018 at 11:23 Comment(8)
Does this also effectively proscribe use of an empty array at the end of a struct such that a derived class or custom allocator new can specify a custom-sized array? Perhaps the new issue is with the "regardless of how" - there are some ways that are valid, and some ways that are dangerous?Dipeptide
@Ageless So the value of a pointer object is determined by the bytes of the objects, and nothing else. So two objects with the same state point to the same object. If one is valid, the other is too. So on common architectures, where a pointer value is represented as a number, two pointers with equal values point to the same objects and one of the end the same other objects.Colorcast
@Ageless Also, trivial type means that you can enumerate the possible values of a type. Essentially, any modern compiler in any optimizing mode (even -O0 on some compilers) doesn't consider pointers as trivial types. Compilers don't treat the requirements of the std seriously, and neither do the people who write the std, who dream of a different language and make all kinds of inventions that directly contradict basic principles. Obviously users are confused and sometimes treated badly when they complain about compiler bugs.Colorcast
The non-normative note in the question wants us to think of 'one-past-the-end' as not pointing to anything. We both know in practice is may well be pointing to something and in practice it may be possible to dereference it. But that (according to the standard) is not a valid program. We can imagine an implementation that knows a pointer was obtained by arithmetic-to-past-the-end and raises an exception if dereferenced. While I know of know platform that does that. I think the standard doesn't want to exclude it.Ageless
@Colorcast Also, I'm not sure what you mean by enumerating the possible values. That's not a required feature of a trivial type as defined by C++.Ageless
@Colorcast I'm not sure what you mean. You can't just add one to a pointer and be guaranteed to pointing to something. Ultimately it is binary but C++ is very clear that it's not a valid way to try and scan the whole memory.Ageless
@Ageless "But that (according to the standard) is not a valid program" It is a valid program in common architectures. (But not where pointers have bounds in their representation) For a trivial type, all values (like integers) can be enumerated by starting at 0, because a valid bitwise representation is a valid object. When you find a match, you have a valid pointer. You might even try rand() until you get a valid pointer. The only problem is how to determine when a representation is valid for a pointer. That's another direct effect of making a type trivial.Colorcast
Let us continue this discussion in chat.Colorcast

© 2022 - 2024 — McMap. All rights reserved.