For iterating though an array should we be using size_t or ptrdiff_t?
Asked Answered
V

3

4

In this blog entry by Andrey Karpov entitled, "About size_t and ptrdiff_t" he shows an example,

for (ptrdiff_t i = 0; i < n; i++)
  a[i] = 0;

However, I'm not sure if that's right, it seems that should be

for (size_t i = 0; i < n; i++)
  a[i] = 0;

Is this correct?

I know we should also likely be using something like memset, but let's avoid that entirely. I'm only asking about the type

Vadim answered 7/7, 2018 at 21:41 Comment(8)
ptrdiff_t is a signed type, so it may not be very safe if you access array with.Headship
Depends on what n is and what a is. If a is a built-in array, size_t is more appropriate.Giles
@PaulAnkman that also sounds like a good reason, also reading this answer by AnT it seems like in one possible C implementation ptrdiff_t may actually be larger than size_t which would seem to make this less efficient and dangerous.Vadim
Not sure what container::size_type is, but this question is on C, not C++.(Guessing that's C++?)Vadim
@EvanCarroll Sorry, missed that. The first part of my comment still stands though.Giles
The linked article is totally bogus, full of mistakes and misconceptions. Of course you're right. Forget about that article and the guy who wrote it.Transgress
In this particular case, all values of i are >=0, so size_t is definitely more appropriate.Cade
I just read the referenced article and it's all about using size_t and pntrdiff_t for pointer arithmetic and array indexing, though it does have one unfortunate example where it suggested for(pntrdiff_t i = 0; i < n; i++) that lacked a type for n. After reading the whole article it is clear that size_t is perfectly safe for all unsigned pointer arithmetic, but you should always use pntrdiff_t for anything that involves signed entities in the expression. Not the strongest article on the subject, I am sure, but I can't provide a better example at the moment.Cade
B
9

In a blog post, I argue that you should always refrain from allocating memory blocks larger than PTRDIFF_MAX(*), because doing so will make compilers such as Clang and GCC generate nonsensical code even if you do not subtract pointers to that block in a way that causes the result to overflow.

(*) Even if malloc succeeds when you pass it a value larger than PTRDIFF_MAX. The crux of the problem is that GCC and Clang only generate code that behaves correctly when linked with such a malloc, but Glibc provides a malloc function that does not implement this limitation.

If you follow that constraint (which I encourage you to: that's the message of the blog post), then both types are equally correct.

This said, since only positive offsets need to be represented, size_t would be the natural choice in your example.

Bearded answered 7/7, 2018 at 22:0 Comment(12)
This is an interesting wrinkle I have been blissfully unaware of. I have never been tasked with building an application that required a single allocation so large as half the range of a size_t. I am sure they are out there, but they I think they are rare.Cade
@jwdonahue: Maybe you'd be interested in Extra, Extra — Read All About It: Nearly All Binary Searches and Mergesorts Are Broken — a post from Google about how arrays got big enough that index arithmetic really did overflow and break.Strachey
@JonathanLeffler, LOL, I knew somebody would dredge up a hyper-scale example ;). I will read it, but the OP's question involves an example that does not reference memory allocation or any kind of signed pointer arithmetic.Cade
@JonathanLeffler, I would point out, now I've read that article, they were very stupidly using signed integer math where it was not needed and should not have been used. Appropriately sized unsigned types (size_t in C/C++) would never have exhibited the referenced bug. To be fair, size_t wasn't around when those algorithms were first documented, but then neither was C.Cade
PTRDIFF_MAX is not the max. allowed size of any object, but the max. value of the a ptrdiff_t. As much as SIZE_MAX is just the max. value a size_t can represent. None is related to the max. allocation size. This would be guaranteed by the implementation for static/atuomatic objects and the stdlib for dynamically allocated objects (or whatever mechanism the environment provides, if any). Nevertheless, size_t is guaranteed to hold the max. allowed array index and the size of the largest possible object in bytes. ptrdiff_t is in fact not (also see 6.5.6p9).Affranchise
"but Glibc provides a malloc function that does not implement this limitation." - If that is true, this is clearly a bug in glibc. However, until proof, I rather think there was something else broken in the code. You might confuse this with optimistic allocation which is done by modern OS like Linux: they simply behave as if there is enough memory available up to the machine's virtually addressable memory assuming you won't use all. There is also no true memory mapped until the first write.Affranchise
I would like to point out that malloc is not mentioned in the OP's post. Only the appropriateness of using pntrdiff_t rather than size_t where it is obvious that all index values should be positive. size_t is in fact the correct type if n is not bounded by a smaller unsigned type.Cade
@Olaf Acknowledgement that glibc allows to allocate more than 2GiB on 32-bit platforms, that this contradicts compiler assumptions, together with decision not to change: gcc.gnu.org/bugzilla/show_bug.cgi?id=67999#c8Bearded
@PascalCuoq: This could be because such a restriction does not really make sense. IIRC, Linux 32bit allowed up to 3GiB (logical) application address space. Going back to the C standard: there is no requirement a ptrdiff_t to be able too be able to hold all sizes of objects, not even the difference of two pointers into the same array, see 6.5.6p9 in the standard. I agree, though, it is unfortunate. Nevertheless it has been reasonable for small systems (16/24/32) in the past and it will be again once we reach the 64bit size_t limit. Nevertheless both sides are correct. Forget portability.Affranchise
@Olaf Have you read the blog post, or the bug report? They show that GCC and Clang generate wrong code for defined programs. The bug report contains a discussion, involving GCC maintainers and a libc maintainer, of what should be fixed between the compilers and the libc. You remark is not adding anything to the discussion.Bearded
@Olaf letting 32-bit programs have 3GiB of virtual address space does not mean that malloc needs to let them have it in one block. Programs can still be allowed to allocate three times 1GiB.Bearded
Ok, I see now. Yeah, that seems to stem from scaling down the result after the operation wrongly. I can't reproduce this currently, as I don't have an x86-32 environment (and for 64 bit there is the 48 bit virtual address-range limit protecting from these problems). I should test this for other architectures if I find the time some day. But I don't see an easy way to get this right indeed, so I understand the gcc people not fixing this. Expecially as it is of decreasing relevance. Nevertheless: thanks for your time.Affranchise
E
2

Use of ptrdiff_t is ok since a[i] is translated as *(a + i).

If you take two pointers, p1 and p2, it is encouraged that you use:

ptrdiff_t d = p2 - p1; // Assuming p2 - p1 is valid.

Given that, p2 == p1 + d, i.e. <ptr type> + ptrdiff_t is a valid expression. Whether ptrdiff_t or size_t is better as the index type is a matter of opinion and/or coding style used in a team.

Elouise answered 7/7, 2018 at 22:26 Comment(3)
I don't believe this proof is sound. And it's confusing to present types and variables together. The question is about types: of course #=#-# so #+#=#. But in our case the types are bounded. For example, in your example above you start with ptrdiff_t = uintptr_t - uintptr_t. That not guaranteed to work though, not even on the same object. ptrdiff_t + uintptr_t = uintptr_t is also even less likely to work. On the matter of values and not types, you're right if any construction works the others are likely to work too (afaik), but in the question we're creating the memory accessor.Vadim
re: a[i] = 0; and it's my contention that because i can be two types (in the question, size_t and ptrdiff_t and ptrdiff_t can be larger than size_t and negative that it's less type-safe to go outside of size_t. But that doesn't sound to me like coding style or opinion.Vadim
@EvanCarroll, as long as the array size is smaller than PTRDIFF_MAX. you can use either ptrdiff_t or size_t without any difference in the bahavior of the program. When your array size exceeds PTRDIFF_MAX, use of size_t is your only choice. It sounds like that's what you are trying to say. For such extreme cases, I can see that coding style or opinion does not matter.Elouise
C
0

The answer to the OP's question is yes, size_t is most appropriate for the example code, where no pointer values are being subtracted from each other, and there are no cross-compiler/library compatibility issues around malloc behaviors. Regardless of difference in heap managers, in C, an array can be SIZE_MAX bytes in length and that requires a size_t to represent it. Nothing in the standard requires a heap manager to be able to allocate all of a process memory space in the heap, or to allocate up to SIZE_MAX bytes for that matter, but an array can be SIZE_MAX in length, hence size_t is appropriate.

Even if n is signed, using a ptrdiff_t for i isn't going to help, as the initial i < n test will fail anyway, if n is negative, because i is initialized to zero. There is no index into i that a size_t index cannot access. The only place that ptrdiff_t is needed, is where you subtract one pointer value from another, and the OP isn't asking about that.

Cade answered 8/7, 2018 at 0:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.