When exactly is a pointer difference defined?
Asked Answered
I

2

19

I have a question about differences of pointers and the resulting type, ptrdiff_t.

C99 §6.5.6 (9) says:

When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the two array elements. The size of the result is implementation-defined, and its type (a signed integer type) is ptrdiff_t defined in the header. If the result is not representable in an object of that type, the behavior is undefined. In other words, if the expressions P and Q point to, respectively, the i-th and j-th elements of an array object, the expression (P)-(Q) has the value i−j provided the value fits in an object of type ptrdiff_t.

§7.18.3 (2) requires ptrdiff_t to have a range of at least [−65535, +65535]

What I am interested in is the undefined behaviour if the result is too big. I couldn't find anything in the standard guarantying at least the same range as the signed version of size_t or something similar. So, now here my question: Could a conforming implementation make ptrdiff_t a signed 16-bit type but size_t 64 bit? [edit: as Guntram Blohm pointed out, 16 bit signed makes a maximum of 32767, so my example is obviously not conforming] As far as I see, I cannot do any pointer subtraction on arrays with more than 65535 elements in strictly conforming code even if the implementation supports objects much larger than this. Furthermore, the program may even crash.

The Rationale (V5.10) § 6.5.6 says:

It is important that this type [ptrdiff_t] be signed in order to obtain proper algebraic ordering when dealing with pointers within the same array. However, the magnitude of a pointer difference can be as large as the size of the largest object that can be declared; and since that is an unsigned type, the difference between two pointers can cause an overflow on some implementations.

which may explain why it is not required that every difference of pointers (to elements of the same array) is defined, but it does not explain why there isn't a restriction on PTRDIFF_MAX to be at least SIZE_MAX/2 (with integer division).

To illustrate, suppose T is any object type and n an object of size_t not known at compile time. I want to allocate space for n objects of T and I want to do pointer subtraction with addresses in the allocated range.

size_t half = sizeof(T)>1 ? 1 : 2; // (*)
if( SIZE_MAX/half/sizeof(T)<n ) /* do some error handling */;
size_t size = n * sizeof(T);
T *foo = malloc(size);
if(!foo) /* ... */;

would not be strictly conforming, I had to do

if( SIZE_MAX/sizeof(T) < n || PTRDIFF_MAX < n )

instead. Is it really that way? And if so, does someone know a reason for that (i.e. for not requiring PTRDIFF_MAX >= SIZE_MAX/2 [edit: changed > to >=] or something similar)?

(*) The half stuff in the first version is something I recognized while I was writing this text, I had

if( SIZE_MAX/2/sizeof(T) < n )

first, taking the half of SIZE_MAX to solve the problems mentioned in the Rationale; but then I realized we only need to half SIZE_MAX if sizeof(T) is 1. Given this code, the second version (the one which is surely strictly conforming) doesn't seem to be so bad at all. But still, I am interested if I'm right.

C11 keeps the wording of §6.5.6 (9), C++-related answers to this topic are also welcome.

Inspired answered 10/12, 2013 at 12:34 Comment(1)
The numbers in the standard seem to fit perfectly to a 16-bit system with segmented memory: Each segment is 64kB large, and arrays have to fit into a segment, so the maximal difference is +/-64k. However, you need 32 bits for a far pointer (segment selector plus offset).Hanahanae
P
6

To give you an answer to the question in the title: pointer difference itself cannot be used to determine the difference of two pointers without eventually leading to undefined behavior. This would be a severe restriction, as you notice, on systems where PTRDIFF_MAX is much smaller than the possible size of an object. But such systems are rare (I don't know of any), so if you'd have code that depends on being able to do the difference with large objects you always put something like

#if PTRDIFF_MAX < SIZE_MAX/2
# error "we need an architecture with sufficiently wide ptrdiff_t"
#endif

But even in such a case (too narrow ptrdiff_t) you will always be able to compute the difference between two pointers of the same larger object.

  1. Determine which of the two (p or q) is smaller. This is always well defined.
  2. Say p is the smaller one, then test all p + i for size_t i starting at 1 until you reach q or i is SIZE_MAX.
  3. If the final i is SIZE_MAX and you didn't reach q the difference is not representable. Otherwise that i plus an eventual sign is the information that you are looking for.

This is not really satisfactory, but I was not able to figure out how to improve that linear algorithm to something logarithmic: to avoid UB we wouldn't be allowed to go beyond q with the comparison.

And, as I said you'd only need that for the case of some really exotic architecture.

Edit:

With mafso's trick for getting the most significant bit of the pointer difference this can be done in O(log(n)) where n is the required distance. First declare two internal functions that assume that p < q

// computes the maximum value bit of the pointer difference
//
// assumes that p < q
inline
uintmax_t ptrdiff_maxbit(char const* p, char const* q) {
  uintmax_t len = 1;
  while (p+len <= q-len)
    len <<= 1;
  return len;
}

// compute the pointer difference
//
// assumes that p < q
// assumes that len2 is a power of two
// assumes that the difference is strictly less than 2*len2
inline
uintmax_t ptrdiff_bounded(char const* p, char const* q, uintmax_t len2) {
  if (len2 < 2) return len2;
  uintmax_t len = (len2 >> 1);
  p += len;
  q -= len;
  for (; len; len >>= 1)
    if (p + len <= q) {
      len2 |= len;
      p += len;
    }
  return len2;
}

Then define the function that computes the difference in bytes and adds a convention in case the difference is not representable in intmax_t:

inline
intmax_t ptrdiff_byte(void const* p0, void const* q0) {
  char const * p = p0;
  char const * q = q0;
  if (p == q) return 0;
  if (p < q) {
    uintmax_t ret = ptrdiff_bounded(p, q, ptrdiff_maxbit(p, q));
    if (ret > (-(INTMAX_MIN+1))+UINTMAX_C(1)) return INTMAX_MIN;
    else return -ret;
  } else {
    uintmax_t ret = ptrdiff_bounded(q, p, ptrdiff_maxbit(q, p));
    if (ret > INTMAX_MAX) return INTMAX_MAX;
    else return ret;
  }
}

Finally, a macro that fits it with the type of *p.

#define ptrdiff(P, Q) (ptrdiff_byte((P), (Q))/(intmax_t)sizeof(*Q))
Philander answered 10/12, 2013 at 13:23 Comment(4)
You don't need anything more exotic than an array covering half the address space. Subtracting beginning from end of a 2.5 GB char array on a 32-bit system is sure to overflow PTRDIFF_MAX. But, most systems do provide only at most half the space to the user, so this is somewhat exotic.Lacylad
+1 for the algorithm (although it is mostly of academic interest); I was able to improve it to O((log(n))^2): Say p<q; check p==q and p+1==q (if *p is valid, then is p+1), then check p+(1<<k) < q-(1<<k) while incrementing k. Choose maximal k with that property. Go on recursively with dist( p+(1<<k), q-(1<<k) ) + (1 << k+1). Unfortunately, this is still not O(log n)…Inspired
@mafso, right. But I think you can transform this to O(log n). First use your trick to find k. Then don't search recursively between p+(1<<k) and q-(1<<k), but do a normal binary search between them. Since these two addresses are sufficiently far away from p and q, you will not overrun p or q. (modulo some details)Philander
@Lacylad My 32bit system( Windows ) certainly cannot allocate more than SIZE_MAX/2 at once. Are you sure this is true for most systems?Hymenium
T
5

I remember, back in the old days, some 16-bit 80x86 compilers had "large" or "huge" datamodels, where pointers had 32 bit, but integers still had only 16 bit. These compilers allowed you to create arrays that were larger than 65536 bytes, but, with integers being only 16 bit, accessing elements that weren't in the first 64K required pointer manipulation (which was real strange, a pointer consisted of a 16 bit segment value and a 16 bit offset value, with the real address being ((segment << 4) + offset))

I don't know how compliant these compilers were, but they would have had to define SIZE_MAX to something like 1M (since that's the largest object you can address given the strange pointer model), but ptrdiff_t would have been a 16 bit integer (which wouldn't be compliant as the range is only -32768 to +32767).

So, a sane C implementation on sane hardware wouldn't have any reason to have PTRDIFF_MAX less than SIZE_MAX. But there might be exotic hardware (which, in the case of the 80x86, wasn't really exotic at that time), that allows you to define large arrays, but does't allow you to access all of them "at once". In that case, PTRDIFF_MAX could well be below SIZE_MAX/2.

Tocantins answered 10/12, 2013 at 12:57 Comment(1)
Yeah, there are several environments where storage is "segmented", and hence the practical limit on pointer difference is less than the full address size. Any time you subtract two pointers in different segments the result is meaningless. The net of this is basically that you should never expect good results from subtraction unless the two pointers are known to be addressing within the same structure or array. And even that is suspect if the compiler is doing some finagling to get arrays larger than a segment.Convene

© 2022 - 2024 — McMap. All rights reserved.