Dereferencing a 50% out of bound pointer (array of array)
Asked Answered
U

2

-1

This is a new question in my "I don't understand pointers in C and C++" collection.

If I mix the bits of two pointers with equal values (pointing to the same memory address), that happen to have exactly the same bit representation, when one is dereferenceable and one is one past the end, what does the standard say should happen?

#include <stdio.h>
#include <string.h>
#include <assert.h>

// required: a == b
// returns a copy of both a and b into dest 
// (half of the bytes of either pointers)
int *copy2to1 (int *a, int *b) {
    // check input: 
    // not only the pointers must be equal
    assert (a == b);
    // also the representation must match exactly
    int *dest;
    size_t s = sizeof(dest);
    assert(memcmp(&a, &b, s) == 0); 

    // copy a and b into dest:
    // on "exotic" architectures, size does't have to be dividable by 2
    size_t half = s/2; // = floor(s/2), 
    char *pa = (char*)&a, *pb = (char*)&b, *pd = (char*)&dest;

    // copy half of a into dest:
    memcpy (pd, pa, half);
    // copy half of b into dest:
    memcpy (pd+half, pb+half, s-half); // s-half = ceil(s/2)

    //printf ("a:%p b:%p dest:%p \n", a, b, dest);    

    // check result
    assert(memcmp(&dest, &a, s) == 0);
    assert(memcmp(&dest, &b, s) == 0);

    return dest;
}

#define S 1 // size of inner array

int main(void) {
    int a[2][S] = {{1},{2}};
    int *past = a[0] + S, // one past the end of inner array a[0]
        *val = &a[1][0], // valid dereferenceable pointer
        *mix = copy2to1 (past, val);
    #define PRINT(x) printf ("%s=%p, *%s=%d\n",#x,x,#x,*x)
    PRINT(past);
    PRINT(mix);
    PRINT(val);
    return 0;
}

What I really want to understand is: what does "p points to object x" mean?

SEE ALSO

This question is a better version of my previous questions about array of arrays:

and other related questions about pointer validity:

Uniseptate answered 19/8, 2015 at 15:49 Comment(45)
What the... Why would the standard guys even think about such a monstrosity ?Unwarranted
@Unwarranted That's kind of the point. They did not, they should not have to, and this shows there is problem with C and C++. The standards are written for particular examples, and they do not create general rules, and the committees have zero idea what the semantics of fundamental constructs is or should be (see unions, see type punning, see strict aliasing, see lifetime of non polymorphic objects in C++).Uniseptate
This is making my head hurt - past and val should evaluate to the same value. a[0] + S is outside the bounds of a[0], but still within the bounds of a, so it should be a valid pointer. But honestly, masking two pointer values together to get another pointer is a use case I've never considered.Trueblood
@JohnBode "This is making my head hurt" I hope so. My head was about to explode when I though of that, now I want everybody to have the same feelings.Uniseptate
Why do you think the standard does not consider your case? Why do you think unions, type punning, strict aliasing are problematic? If you think you understand something better than the standards committee, please provide a demonstration.Qualitative
@n.m. I understand that "declared type" is problematic and unlikely to help compiler writers. I understand that decades after publishing the first languages descriptions, people are still fighting over fundamental issues of lifetimes, valid operations on pointers, allowed casts, allowed type punning... For a language hyped as "portable assembly", it sucks.Uniseptate
Could you add the reasoning that led you to believe that the standard committee should define what happens when you mash two pointers together ? Or are you just here to hate on C and C++ to no end ?Unwarranted
@Uniseptate yes it is. But you deliberately went outside of what the langage defines. That's getting a bit tautologic.Unwarranted
@Uniseptate it's not a standard-compliant C program, because it relies on at least implementation-defined behaviour.Unwarranted
Let us continue this discussion in chat.Uniseptate
@n.m. f.ex. DR #236 The interpretation of type based aliasing rule when applied to union objects or allocated objects. (2006-05-08) "The current situation requires more consideration, but general consensus seems to be" "The committee does not think that the suggested wording is acceptable" etc. That's committee members not sure about the validity of very simple code.Uniseptate
No demonstration so far, only vague references to some problems that may or may not exist or be serious enough to worry about them.Qualitative
".. there is problem with C and C++. ... hyped as ..." There's no problem. That is ongoing development of two (distinct btw) languages. They aren't hyped as portable assembly, they are as portable as you can currently get with out sacrificing the high level of abstraction both provide. Heck, at least they are standardised, to my knowledge this doesn't apply to Java, C#, Python, Ruby, Go, Rust, JavaScript (though there's ECMA), BASIC dialects, ...Dharma
The C standard has defects, like many man-made complex constructs. The committee is well aware of (hopefully) most of them. These defects don't preclude us from building useful programs. Most programmers won't ever come across any of them in their life. I.e. there's no fundamental problem you are trying to make out of them.Qualitative
@n.m. Bull. Did you read the DR? Are you claiming that type punning, reading of not the active member of unions, don't happen?Uniseptate
See WG14 N980? How can such features from the original (pre-ANSI) C specification can not be cast in stone?Uniseptate
@DanielJour "They aren't hyped as portable assembly" I believed they were. Both C and C++ are often used for the backend of compilers of high level languages. "Heck, at least they are standardised, to my knowledge this doesn't apply to Java" are you saying there is no Java specification?Uniseptate
@n.m. and the current state of mind of C++ lawyers is absolutely worse: Questions on N4430 (Core Issue 1776: Replacement of class objects containing reference members) They are essentially killing C/C++.Uniseptate
@DanielJour: C was invented to serve as a form of semi-portable assembly, with a stronger focus on the portability of the language itself than of code written in the language. On many platforms, it is still the best such language. Unfortunately, some people have lost sight of the fact that one of the reasons various actions invoke Undefined Behavior was that different platforms had different contradictory but sometimes useful behaviors. If an application needed to print the arithmetic value of x+y when it was representable as int and never output an incorrect value, but...Cuisse
...having the program terminate when given invalid input would be acceptable, and if the program was running on a platform which traps integer overflow, printf("%d",x+y); would satisfy the requirements. If a different program running on a platform where overflow yields two's-complement reduction had a requirement that overflow must not disrupt program behavior, but could output any arbitrary value, the same code would be useable there. Mandating any particular overflow behavior would have required one of those programs to be longer and more complicated.Cuisse
@Cuisse C should have 2complement types or operators.Uniseptate
@curiousguy: For most applications, I would posit that the most optimizer-friendly normative specification for overflow would be to have it yield a "partially indeterminate value", with the rule that every rvalue conversion of a partially-indeterminate value may (independently) yield any number which is congruent to arithmetical value of the overflowing expression, mod the number of values of the type, but with an additional rule that an explicit cast to int must perform a two's-complement reduction, and possibly with an additional rule which would permit for loops to early-exit...Cuisse
Yes I did read it. Why should I care one way or another? This is an awful code that has no place in a real-life program. Let language lawyers discuss its validity, meanwhile we mere mortals shall steer clear of such constructs regardless. If you think issues like this are killing C and C++, you need to think harder.Qualitative
@n.m. What is an awful code?Uniseptate
...in case of overflow (I'd limit the early-exit to for loops only, so that code can indicate whether or not early-exit is acceptable by its choice of loop type). Such rules would not only allow the vast majority of the useful optimizations would be made possible by having int overflow be undefined, but more importantly would allow programmers to safely write code that could be optimized in such fashion.Cuisse
@n.m.: The fundamental question which is embodied by the code is the extent to which C will continue to abide by one of its fundamental principles--that every object is represented by a sequence of unsigned character values. While useful optimizations may be facilitated by allowing objects to be invisibly tagged with other information that compilers may use to make inferences regarding their usage, such tagging may in some cases force programmers to write code that is less efficient than they could have done otherwise, and it may not always be possible for optimizers to remove such redundancy.Cuisse
@Cuisse An implementation is allowed to tag bytes however it wishes for whatever purposes, always was, no problem with that. Are you saying the strict aliasing rules violate this principle? I fail to see how. If strict aliasing rules inhibit some potential optimisations or coding idioms, I have no problem with that, cost/benefit-wise. I just can't see what it has to do with the fundamental principle in question.Qualitative
@n.m.: Strict aliasing went against the fundamental principle of C; the performance advantages may be seen as more important than that principle, but that doesn't make the principle worthless. For any sequence of bytes and type, there is at most one object that can be identified by any pointer of that type whose representation matches that sequence. I would suggest that if a sequence of unrelated bytes are assembled into a pointer, and if any pointer could exist whose representation matches that sequence and which could be used in a particular way, then the pointer made from the sequence...Cuisse
...should be likewise usable. Such cases are rare, and the cost of lost optimization opportunities in them would pale in comparison to the cost of errant program behavior caused by "optimizations" that make a program violate its requirements.Cuisse
@Cuisse Again I don't see where any fundamental principles are violated by strict aliasing. Could you please point me to a reliable source that names such a principle and describes why it is broken by strict aliases? I couldn't find any. You can assemble a valid pointer out of bytes, no problem with that; how does it relate to strict aliasing?Qualitative
@supercat: C was not invented to be a semi-portable assembly; it was invented to be a high-level language for low-level (systems) programming. There's a difference.Trueblood
@n.m. Everybody has a different interpretation of strict aliasing.Uniseptate
That's an interesting observation. Is the standard so much ambiguous on this? Can you point out the passage you draw your interpretation from?Qualitative
@JohnBode: There are different shades of meaning, but I'd posit that a key aspect of C's success for systems programming as well as many other purposes stems from the fact that operations which could be performed by some but not all hardware platforms (e.g. relational comparisons of unrelated pointers) would be supported by compilers for platforms where they were supported.Cuisse
Downvoting because you ask a question that is very specific to what the standard says but you don't specify if you mean C or C++. There is likely different behaviour between these two languages in this particular case. When do you understand that you should tell us what language your question is about?Thermostat
@FUZxxl The tags tell the answer.Uniseptate
@FUZxxl Next time I'll ask TWO word for word identical questions, one for C, one for C++ (not).Uniseptate
@Unwarranted I hate broken tools. To no end.Uniseptate
@Uniseptate Yeah, please do so. Then I can answer one for C and the other for C++.Thermostat
@FUZxxl Really? It's the officially recommended way?Uniseptate
@Uniseptate Yes. You ask the same question for two different languages. The point of this rule is that many people tag-spam C and C++ to attract more viewers but then aren't interested in the C answer because they program in C++ or whatever. This is why I'm pretty hostile towards questions that are tagged both C and C++. Only questions asking for the differences or the interaction between C and C++ should be tagged with both.Thermostat
"Why would the standard guys even think about such a monstrosity ?" That cases that nobody thought about are not covered by the specification proves that it is not a specification. The std is a glorified tutorial.Uniseptate
@Unwarranted "Or are you just here to hate on C and C++ to no end ?" I hate that C/C++ pretend to be ultra low level and high level at the same time, because it's a lie; you cannot do both and the deck of cards falls down as I have proved again and again. High level as in Java implies that many things are opaque for the user, esp. pointers. Low level means that the translation is predictable and not optimized except for trivial simplifications. Even the C++ committee "hates" the C semantics and its union visibility rules. They think it's BS, so it isn't just me.Uniseptate
As a rule, you cannot specify a high level semantic on a type and then expose all its internal components. It doesn't go together, as any optimisation would break when the internals are poked by the user. But the C/C++ pointers aren't high level, as they are just numbers: you can convert to from numbers with casts, with printf/scanf, with iostream, with volatile pointers. If it's just a number it CANNOT have the high level semantics and optimisations are limited. I understand that what I saw is very hurtful for the people who used to think that the optimisations of pointers are sound.Uniseptate
"(...), pointer types, (...) are collectively called scalar types () Cv-unqualified scalar types, (...) are collectively called trivially copyable types." "For trivially copyable types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values." [basic.types]Uniseptate
A
9

In [basic.compound]:

If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained.

past and val have the same address, so they point to the same object. It doesn't matter that one is "one past the end" of the first row and the second is the first element of the second row. There is a valid object at that address, so everything here is perfectly reasonable.


In C++17, as of P0137, this changes a lot. Now, [basic.compound] defines pointers as:

Every value of pointer type is one of the following:
— a pointer to an object or function (the pointer is said to point to the object or function), or
a pointer past the end of an object (5.7), or
— the null pointer value (4.11) for that type, or
— an invalid pointer value.

So now, past is a value of the 2nd type (a pointer past the end of), but val is a value of the 1st type (a pointer to). Those are different categories of values and are not comparable:

A value of a pointer type that is a pointer to or past the end of an object represents the address of the first byte in memory (1.7) occupied by the object or the first byte in memory after the end of the storage occupied by the object, respectively. [ Note: A pointer past the end of an object (5.7) is not considered to point to an unrelated object of the object’s type that might be located at that address. A pointer value becomes invalid when the storage it denotes reaches the end of its storage duration; see 3.7. —end note ]

past doesn't point to something, so viewing its contents as if it were the same as val is no longer meaningful.

Annunciator answered 19/8, 2015 at 16:14 Comment(10)
So can I use a[0][size] to refer to a[1][0]?Uniseptate
@Uniseptate Yes. &a[0][size] is the address a + size * sizeof(T). &a[1][0] is the address a + 1 * (size * sizeof(T)) + 0. Those are the same address, so when you dereference it you get the same value.Annunciator
If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated (C99 6.5.6 Additive operators). C and C++ seem to diverge here.Qualitative
@n.m. But it's not one past the end of the array object. There is one object: a. One past the end would be &a[1][size] (or &a[2][0])Annunciator
@Annunciator a[0] and a[1] are array objects as well.Qualitative
@Barry: The declaration int a[3][5] creates an object a which contains three smaller objects, each of which is an array of five integers; it also contains, superimposed on that, an array of 15 integers. Using a directly as an rvalue yields a pointer to the first of three five-element inner arrays, and using a[0] as an rvalue will yield a pointer to the first int within that five-element inner array. Casting a to int* or int[] will yield a pointer to the first element of the superimposed 15-element array. Compilers are not required to keep track of how pointers are derived...Cuisse
...but they are allowed to do so. The pointers a[0]+5 and a[1] will compare equal, but the former is a one-past pointer for a[0] and the second is a pointer to the first element of a[1]. It would be helpful if the Standard were to mandate that certain actions on a pointer must cause the compiler to "forget about" any restrictions it might have had on its behavior, but it's rather vague on such issues.Cuisse
@T.C. Is the issue there that now the fact that the array and its first element aren't interconvertible? I'm still not sure I understand that change well enough to explain it.Annunciator
Basically, this sentence is not correct anymore. A pointer value is now defined abstractly to be one of 1) invalid, 2) null, 3) "pointer to X", 4) "pointer past the end of X". Even if a type-3 pointer and type-4 pointer represent the same address, the latter is still not considered to point to the object pointed to by the former.Novak
I don't think that the old wording in [basic.compound] was not defective and supposed to be taken literally. Some time ago, an object was a region of storage, but this have never been true.Perri
T
3

What I really want to understand is: what does "p points to object x" means.

The object p contains a value that corresponds to the location of the object x in memory.

That's it. That's all it means. You seem determined to make this more complicated than it needs to be.

Pointer types are not arithmetic types, and aren't meant to be arbitrarily munged like that. Valid pointer values are obtained by using the unary & operator on an lvalue, using an array expression that isn't the operand of the sizeof or unary & operator, or calling a library function that returns a pointer value.

Everything beyond that (size, representation, physical vs. virtual, etc.) is an implementation detail, and implementations vary widely when it comes to representing addresses. That's why the standards don't say anything about what to expect when you play Dr. Frankenstein with pointer values.

If you are intimately familiar with your platform's addressing conventions (both virtual and physical), and you know how your implementation lays out items in memory and how it represents pointer types, and you have a valid use case for hacking your pointer values this way, then hack away to your heart's content - neither language standard has anything to say on the subject.

Trueblood answered 19/8, 2015 at 16:27 Comment(3)
"Valid pointer values are obtained (...)" Are you saying that I can't use memcpy on a pointer in a std conforming program?Uniseptate
@curiousguy: For copying the contents of a valid pointer value to another object of the same type, sure, although p = q; is just as easy. If the types are not compatible and you try to use the copied-to pointer, then the behavior will be undefined (meaning, it may work just fine, it may crash outright, it may cause a runtime error later on, etc.). Pointers to different types don't have to have the same size or representation.Trueblood
Yes: only compatible types in my program (only int*).Uniseptate

© 2022 - 2024 — McMap. All rights reserved.