Is it well-defined to use a pointer pointing to one-past-malloc?
Asked Answered
O

4

47

In C, it is perfectly well to make a pointer that points to one past the last element of an array and use it in pointer arithmetics, as long as you don't dereference it:

int a[5], *p = a+5, diff = p-a; // Well-defined

However, these are UBs:

p = a+6;
int b = *(a+5), diff = p-a; // Dereferencing and pointer arithmetic

Now I have a question: Does this apply to dynamically allocated memory? Assume I'm only using a pointer pointing to one-past-the-last in pointer arithmetics, without dereferencing it, and malloc() succeeds.

int *a = malloc(5 * sizeof(*a));
assert(a != NULL, "Memory allocation failed");
// Question:
int *p = a+5;
int diff = p-a; // Use in pointer arithmetic?
Oldtime answered 20/12, 2017 at 7:3 Comment(16)
Good question. It's made more interesting by the fact that the memory doesn't have an effective type before you write to it.Melleta
@StoryTeller This is interesting because in C++ when you use new then it's perfectly fine.Oldtime
Well, you tagged C. And C++'s new is a different beast. It's more than mere memory allocation. Plus, C++ language lawyers would say that just writing to the memory returned by malloc doesn't create an object there, let alone make the memory have an effective type.Melleta
You can actually have a pointer to anywhere, as long as you don't dereference it. You can even use it for comparison with other pointers, even though it might make no sense.Overview
@Someprogrammerdude That's too wild. Isn't that UB?Oldtime
@Someprogrammerdude - But I don't think you can obtain that pointer to anywhere in every way. For instance, you can't do pointer arithmetic like iBug pointed out. That's UB by itself. You may cast an integral constant to a pointer, but there's no guarantee it would be the same address as a + 6 for instance.Melleta
To late to update my comment, but it should be added that you can't dereference it or use it for pointer arithmetic. Doing e.g. int *some_variable = (int *) 0x1234 is perfectly valid, and often used on small embedded system for memory mapped registers. Having a pointer to anywhere is not a problem as long as you don't attempt to do anything with it. It's using the pointer that can lead to UB, if it doesn't point anywhere valid.Overview
And to answer the question, "Is it well-defined to point to one-past-malloc?", then yes it is valid and works just the same as for one-past an array.Overview
Guess that's not what I meant. Now I explicitly ask for pointer arithmetic.Oldtime
ISO/IEC 9899:2011 §7.22.3 Memory management functions ¶1 The order and contiguity of storage allocated by successive calls to the aligned_alloc, calloc, malloc, and realloc functions is unspecified. The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated). It says "an array of such objects"—it's OK for arrays; therefore it's OK here.Nat
@JonathanLeffler Sounds like a valid answer. What about posting it as an answer?Oldtime
I'm hoping one of the answers quoting the relevant parts of the language definition (§6.x.y) will pick up on it, maybe including links to N1570 which is an online copy of a late draft of the C11 standard. I'm mildly puzzled by the reference to N4296 in one answer; the latest mailing of the WG14 (C standard committee) references document numbers in the n21xx range.Nat
The only thing that makes malloc a special case is that the allocated data has no effective type. The type of the allocated data is determined upon access, as specified in 6.5/6. When writing int* ptr = malloc(n*sizeof *ptr); ...ptr[0] = x; you actually never get an array type, but each chunk of data accessed gets the effective type int. Not an array of int, but a whole bunch of individual int. The C standard doesn't make much sense here.Dorcy
One more thing to note: int diff might be too small for the difference between the first and last element of an array. And ptrdiff_t as well - if that happens, behaviour is undefined.Bowerbird
@JonathanLeffler: N4296 is probably a mixup of filenames, N4296 is post-C++ 14, but the section numbers appear to be from some C draft.Wrestling
Note that pointer subtraction like p-a returns a type intptr_t whose range can exceed int.Squirmy
H
23

Is it well-defined to use a pointer pointing to one-past-malloc?

It is well defined if p is pointing to one past the allocated memory and it is not dereferenced.

n1570 - §6.5.6 (p8):

[...] If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

Subtracting two pointers are valid only when they point to elements of the same array object or one past the last element of the array object, otherwise it will result in undefined behavior.

(p9):

When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object [...]

The above quotes are well applicable for both dynamically and statically allocated memory.

int a[5];
ptrdiff_t diff = &a[5] - &a[0]; // Well-defined

int *d = malloc(5 * sizeof(*d));
assert(d != NULL, "Memory allocation failed");
diff = &d[5] - &d[0];        // Well-defined

Another reason that this is valid for dynamically allocated memory, as pointed by Jonathan Leffler in a comment is:

§7.22.3 (p1):

The order and contiguity of storage allocated by successive calls to the aligned_alloc, calloc, malloc, and realloc functions is unspecified. The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated).

The pointer returned by malloc in the above snippet is assigned to d and the memory allocated is an array of 5 int objects.

Honeysucker answered 20/12, 2017 at 8:5 Comment(8)
Formally, how does the data pointed at by d end up as an array? According to the C standard, the effective type of the malloc:ed data is that used for lvalue access. Which is int, not int[5].Dorcy
@Lundin; No, it doesn't. d is a pointer that points to the first block of the memory chunk allocated by malloc.Honeysucker
The cited text only shows that allocated storage can be used to store arrays, not how the data becomes an array. Suppose I do int(*ptr)[5] = malloc_chunk; memcpy(something, ptr, 5*sizeof(int); Then I make the effective type an array type. But without such code, the "chunk" is not formally an array type. I don't think there is any text in the standard that makes sense to cite here, the rules about effective type (and strict aliasing) are simply poor.Dorcy
The word "until" is ambiguous (or even wrong) here: It is well defined until the pointer pointing to one past the allocated memory. According to your answer it is still true when the pointer points to one past, but "until" means "when it happens it's no longer true", so you'd better find a better wording.Oldtime
@iBug: What use of "until" is ambiguous or wrong? In the Standard, it applies to the clause "the space is explicitly deallocated". Once the space is freed, pointers to it cease to be valid.Wolof
@Lundin: Under C89's concept of "object", one could regard the result from malloc as returning a pointer to a union containing all possible combinations of types that could fit in the indicated storage. There would be no need for that storage to ever "become" anything else. C99's Effective Type rule is an abomination that requires completely changing the notion of what an "object" is in ways that can't be consistent with usages of the term elsewhere in the Standard, and invents an unnecessary new concept of runtime state. A simpler and better rule would have simply said...Wolof
...that a compiler may regard two uses of an lvalue will be unsequenced relative to anything between them absent certain evidence of outside access, and that a compiler may hoist or defer accesses to the beginning/end of a function or loop if there is no evidence of outside access in the code it's moved across. Given void test(int *ip, float *fp, int mode) { *ip=1; *fp=2; if (mode) *ip=1;}; if ip and fp alias, the Effective Type rule would require that the Effective Type of the storage be left as either int or float, depending upon mode, but there's no evidence that should matter.Wolof
@Honeysucker Good correction. My note on the word "until" is somewhat a concern about English, not the C standard. It looks certainly better now.Oldtime
P
26

The draft n4296 for C11 is explicit that pointing one past an array is perfecly defined: 6.5.6 Language / Expressions / Additive operators:

§ 8 When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. ... Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object... If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

As the type of the memory is never precised in the sub clause, it applies to any type of memory including allocated one.

That clearly means that after:

int *a = malloc(5 * sizeof(*a));
assert(a != NULL, "Memory allocation failed");

both

int *p = a+5;
int diff = p-a;

are perfectly defined and as the usual pointer arithmetic rules apply, diff shall receive the value 5.

Prole answered 20/12, 2017 at 7:18 Comment(6)
If I write p = a+6 then I can't expect p - a == 6 according to the standard, right?Oldtime
@Oldtime Yes, you cannot expect it to work. " If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined"Litha
@Oldtime the standard mandated defined behaviour only up to one paste the last element of the array object. If you go further (2 past last element), nothing is specified by the standard which is enough to be Undefined Behaviour.Prole
@Oldtime A particular concern that your example raises is that overflows in pointer arithmetic are undefined behavior in C++. Thus the rules basically state that malloc will never allocate the last byte of memory unless that compiler also simultaneously defines overflow in a way that makes these overflow issues invisible.Overstep
The published WG14 paper with the highest N-number is currently N2184. Where did you get N4296 from?Philips
@Philips N4296 sounds like a early draft for C++17.Oldtime
H
23

Is it well-defined to use a pointer pointing to one-past-malloc?

It is well defined if p is pointing to one past the allocated memory and it is not dereferenced.

n1570 - §6.5.6 (p8):

[...] If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

Subtracting two pointers are valid only when they point to elements of the same array object or one past the last element of the array object, otherwise it will result in undefined behavior.

(p9):

When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object [...]

The above quotes are well applicable for both dynamically and statically allocated memory.

int a[5];
ptrdiff_t diff = &a[5] - &a[0]; // Well-defined

int *d = malloc(5 * sizeof(*d));
assert(d != NULL, "Memory allocation failed");
diff = &d[5] - &d[0];        // Well-defined

Another reason that this is valid for dynamically allocated memory, as pointed by Jonathan Leffler in a comment is:

§7.22.3 (p1):

The order and contiguity of storage allocated by successive calls to the aligned_alloc, calloc, malloc, and realloc functions is unspecified. The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated).

The pointer returned by malloc in the above snippet is assigned to d and the memory allocated is an array of 5 int objects.

Honeysucker answered 20/12, 2017 at 8:5 Comment(8)
Formally, how does the data pointed at by d end up as an array? According to the C standard, the effective type of the malloc:ed data is that used for lvalue access. Which is int, not int[5].Dorcy
@Lundin; No, it doesn't. d is a pointer that points to the first block of the memory chunk allocated by malloc.Honeysucker
The cited text only shows that allocated storage can be used to store arrays, not how the data becomes an array. Suppose I do int(*ptr)[5] = malloc_chunk; memcpy(something, ptr, 5*sizeof(int); Then I make the effective type an array type. But without such code, the "chunk" is not formally an array type. I don't think there is any text in the standard that makes sense to cite here, the rules about effective type (and strict aliasing) are simply poor.Dorcy
The word "until" is ambiguous (or even wrong) here: It is well defined until the pointer pointing to one past the allocated memory. According to your answer it is still true when the pointer points to one past, but "until" means "when it happens it's no longer true", so you'd better find a better wording.Oldtime
@iBug: What use of "until" is ambiguous or wrong? In the Standard, it applies to the clause "the space is explicitly deallocated". Once the space is freed, pointers to it cease to be valid.Wolof
@Lundin: Under C89's concept of "object", one could regard the result from malloc as returning a pointer to a union containing all possible combinations of types that could fit in the indicated storage. There would be no need for that storage to ever "become" anything else. C99's Effective Type rule is an abomination that requires completely changing the notion of what an "object" is in ways that can't be consistent with usages of the term elsewhere in the Standard, and invents an unnecessary new concept of runtime state. A simpler and better rule would have simply said...Wolof
...that a compiler may regard two uses of an lvalue will be unsequenced relative to anything between them absent certain evidence of outside access, and that a compiler may hoist or defer accesses to the beginning/end of a function or loop if there is no evidence of outside access in the code it's moved across. Given void test(int *ip, float *fp, int mode) { *ip=1; *fp=2; if (mode) *ip=1;}; if ip and fp alias, the Effective Type rule would require that the Effective Type of the storage be left as either int or float, depending upon mode, but there's no evidence that should matter.Wolof
@Honeysucker Good correction. My note on the word "until" is somewhat a concern about English, not the C standard. It looks certainly better now.Oldtime
C
7

Yes, the same rules apply to variables with dynamic and automatic storage duration. It even applies to a malloc request for a single element (a scalar is equivalent to a one-element array in this respect).

Pointer arithmetic is only valid within arrays, including one past the end of an array.

On dereferencing, it's important to note one consideration: with respect to the initialisation int a[5] = {0};, the compiler must not attempt to dereference a[5] in the expression int* p = &a[5]; it must compile this as int* p = a + 5; Again, the same thing applies to dynamic storage.

Crafty answered 20/12, 2017 at 8:10 Comment(2)
In int* p = &a[5]; a[5] is not dereferenced. It is equivalent to int p = a + 5; or might be I am getting that para wrong.Honeysucker
I'm trying to say that there is no UB with the expression &a[5] since the compiler must treat it as a + 5. Does it not read well? I have a cold following a weekend of implementing this: meta.stackexchange.com/questions/303920/…Crafty
S
7

Is it well-defined to use a pointer pointing to one-past-malloc?

Yes, yet a corner case exists where this is not well defined:

void foo(size_t n) {
  int *a = malloc(n * sizeof *a);
  assert(a != NULL || n == 0, "Memory allocation failed");
  int *p = a+n;
  intptr_t diff = p-a;
  ...
}

Memory management functions ... If the size of the space requested is zero, the behavior is implementation-defined: either a null pointer is returned, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object. C11dr §7.22.3 1

foo(0) --> malloc(0) may return a NULL or non-NULL. In the first implementation a return of NULL is not an "Memory allocation failure". This means code is attempting int *p = NULL + 0; with int *p = a+n; which fails the guarantees about pointer math - or at least brings such code into question.

Portable code benefits by avoiding 0 size allocations.

void bar(size_t n) {
  intptr_t diff;
  int *a;
  int *p;
  if (n > 0) {
    a = malloc(n * sizeof *a);
    assert(a != NULL, "Memory allocation failed");
    p = a+n;
    diff = p-a;
  } else {
    a = p = NULL;
    diff = 0;
  }
  ...
}
Squirmy answered 20/12, 2017 at 13:20 Comment(2)
I really wonder why the standard does not necessitate the return of NULL pointer in case 0 was passed to malloc(). Why the standard goes through the trouble of stating: "either a null pointer is returned, or the behavior is as if the size were some nonzero value".?Ellerey
@Ellerey - I'd guess that two alternative implementations already existed by the time the (first) standard was written.Doggy

© 2022 - 2024 — McMap. All rights reserved.