Should pointer comparisons be signed or unsigned in 64-bit x86?
Asked Answered
G

1

2

When writing x86-64 user-space assembly and comparing two pointer values, should we use signed conditions such as jl and jge or unsigned conditions such as jb and jae?

Intuitively I think of pointers as unsigned, running from 0 to 2^64-1 in the case of a 64-bit process, and I think this model is accurate for 32-bit code. I guess that's how most people think about them.

In 64-bit code however I don't think you can ever validly cross over the signed discontinuity at 0x7FFFFFFFFFFFFFFF (2^63 - 1), and many interesting memory regions tend to clustered near signed 0 (for code and static data often, and sometimes heap depending on the implementation), and near the maximum address on the lower half of the canonical address space (something like of 0x00007fffffffffff on most systems today) for stack locations and the heap on some implementations1.

So I'm not sure which way they should be treated: signed has the advantage that it is safe around 0 since there is no discontinuity there, and unsigned has the same advantage near 2^63 since there is no discontinuity there. However in practice you don't see any addresses anywhere close to 2^63 since the virtual address space of current commodity hardware is limited to less than 50 bits. Does that point towards signed?


1 ... and sometimes the heap and other mapped regions are not close to either the bottom or top of the address space.

Gerlac answered 7/12, 2017 at 4:49 Comment(7)
does the processor have negative memory addresses? how do you build a memory with an address of -7? addresses go from 0 to a positive number N.Malfeasance
@Malfeasance - well that's begging the question of course. They have negative addresses if you consider them signed, and they don't if you consider them unsigned (the latter one by definition). Maybe you mean: is the sign (top) bit ever set on memory addresses in x86? In that case the answer is "yes". I do agree that "intuitively" addresses seem like they are positive, but since things like addressing math don't care about signedness, whether a memory can have an address of -7 is more a philosphical question than a practical one.Gerlac
Note also I'm not really asking whether memory addresses "can be negative", since I don't that's interesting (I just think of them being "in hex" for the purposes of display, etc - whether 0xFFFFFFFFFFFF1234 is a small negative number or a big positive one doesn't really matter, but sure let's say it is positive). My question is more about whether comparisons specifically should be done in a signed matter or unsigned, which doesn't necessarily directly follow from whether the values themselves are "signed" or "unsigned".Gerlac
Linux x86-64 user-space addresses are in the low half of the canonical split. (But at the top of the bottom, up against the 0x7FF...). Only the vDSO pages are mapped into the upper part of virtual memory, because high addresses are the kernel addresses. But anyway, addresses aren't clustered around signed 0. (Also, -pie executables load at 0x5555... + some amount of ALSR)Numidia
Yes, many addresses are clustered around zero, although on the positive side of zero as you point out. I forgot that most of the user user addresses were still < 0x7FF..., will update.Gerlac
Oops, I mean user-space stack addresses are just below 0x00007fffffffffff (in the low 47 bits of virtual address space, see an ascii diagram); In a position-dependent executable, user-space static addresses (code/data) are near 0.Numidia
"0x7FFFFFFFFFFFFFFF bytes should be enough for everyone"?Baddie
N
7

TL:DR: intptr_t might be best in some cases because the signed-overflow boundary is in the middle of the "non-canonical hole". Treating a value as negative instead of huge may be better if wrapping from zero to 0xFF...FF or vice versa is possible, but pointer+size for any valid size can't wrap a value from INT64_MAX to INT64_MIN.

Otherwise you probably want unsigned for the "high half" (high bit set) to compare as above the low half.


It depends exactly what you want to know about two pointers!

A previous edit of your question gave ptrA < ptrB - C as the use-case you're interested in. e.g. an overlap check with ptrA < ptrB - sizeA, or maybe an unrolled SIMD loop condition with current < endp - loop_stride. Discussion in comments has been about this kind of thing, too.

So what you're really doing is forming ptrB - C as a pointer that's potentially outside the object you were interested in, and which may have wrapped around (unsigned). (Good observation that stuff like this may be why C and C++ make it UB to form pointers outside of objects, but they do allow one-past-the-end which has unsigned wrapping at the end of the highest page, if the kernel even lets you map it.) Anyway, you want to use a signed comparison so it "still works" without having to check for wraparound, or check the sign of C or any of that stuff. This is still a lot more specific than most of the question.

Yes, for "related" pointers derived from the same object with reasonable sizes, signed compare is safe on current hardware, and could only break on unlikely / distant-future machines with hardware support for full 64-bit virtual addresses. Overlap checks are also safe with unsigned if both pointers are in the low half of the canonical range, which I think is the case for user-space addresses on all the mainstream x86-64 OSes.


As you point out, unsigned ptrA < ptrB - C can "fail" if ptrB - C wraps (unsigned wraparound). This can happen in practice for static addresses that are closer to 0 than the size of C.

Usually the low 64kiB is not mapable (e.g. on Linux, most distros ship with the sysctl vm.mmap_min_addr = 65536, or at least 4096. But some systems have it =0 for WINE). Still, I think it's normal for kernels to not give you a zero page unless you request that address specifically, because it stops NULL deref from faulting (which is normally highly desirable for security and debugability reasons).

This means the loop_stride case is usually not a problem. The sizeA version can usually be done with ptrA + sizeA < ptrB, and as a bonus you can use LEA to add instead of copy + subtract. ptrA+sizeA is guaranteed not to wrap unless you have objects that wrap their pointer from 2^64-1 to zero (which works even with a page-split load at the wraparound, but you'll never see it in a "normal" system because addresses are normally treated as unsigned.)


So when can it fail with a signed compare? When ptrB - C has signed wraparound on overflow. Or if you ever have pointers to high-half objects (e.g. into Linux's vDSO pages), a compare between a high-half and low-half address might give you an unexpected result: you will see "high-half" addresses as less than "low-half" addresses. This happens even though the ptrB - C calculation doesn't wrap.

(We're only talking about asm directly, not C, so there's no UB, I'm just using C notation for sub or lea / cmp / jl.)

Signed wraparound can only happen near the boundary between 0x7FFF... and 0x8000.... But that boundary is extremely far from any canonical address. I'll reproduce a diagram of x86-64 address space (for current implementations where virtual address are 48 bits) from another answer. See also Why in 64bit the virtual address are 4 bits short (48bit long) compared with the physical address (52 bit long)?.

Remember, x86-64 faults on non-canonical addresses. That means it checks that 48-bit virtual address are properly sign-extended to 64 bits, i.e. that bits [63:48] match bit 47 (numbering from 0).

+----------+
| 2^64-1   |   0xffffffffffffffff
| ...      |                       high half of canonical address range
| 2^64-2^47|   0xffff800000000000
+----------+
|          |
| unusable |   Not to scale: this is 2^15 times larger than the top/bottom ranges.
|          |
+----------+
| 2^47-1   |   0x00007fffffffffff
| ...      |                       low half of canonical range
| 0        |   0x0000000000000000
+----------+

Intel has proposed a 5-level page-table extension for 57-bit virtual addresses (i.e. another 9-bit level of tables), but that still leaves most of the address space non-canonical. i.e. any canonical address would still be 2^63 - 2^57 away from signed wraparound.

Depending on the OS, all your addresses might be in the low half or the high half. e.g. on x86-64 Linux, high ("negative") addresses are kernel addresses, while low (signed positive) addresses are user-space. But note that Linux maps the kernel vDSO / vsyscall pages into user space very near the top of virtual address space. (But it leaves pages unmapped at the top, e.g. ffffffffff600000-ffffffffff601000 [vsyscall] in a 64-bit process on my desktop, but the vDSO pages are near the top of the bottom-half canonical range, 0x00007fff.... Even in a 32-bit process where in theory the whole 4GiB is usable by user-space, the vDSO is a page below the highest page, and mmap(MAP_FIXED) didn't work on that highest page. Perhaps because C allows one-past-the-end pointers?)

If you ever take the address of a function or variable in the vsyscall page, you can have a mix of positive and negative addresses. (I don't think anyone ever does that, but it's possible.)

So signed address comparison could be dangerous if you don't have a kernel/user split separating signed positive from signed negative, and your code is running in the distant future when/if x86-64 has been extended to full 64-bit virtual addresses, so an object can span the boundary. The latter seems unlikely, and if you can get a speedup from assuming it won't happen, it's probably a good idea.

This means signed-compare already is dangerous with 32-bit pointers, because 64-bit kernels leave the whole 4GiB usable by user-space. (And 32-bit kernels can be configured with a 3:1 kernel/user split). There's no unusable canonical range. In 32-bit mode, an object can span the signed-wraparound boundary. (Or in the ILP32 x32 ABI: 32-bit pointers in long mode.)


Performance advantages:

Unlike 32-bit mode, there are no CPU where jge is faster than jae in 64-bit mode, or other combo. (And different conditions for setcc / cmovcc never matter). So any perf diff is only from surrounding code, unless you can do something clever with adc or sbb instead of a cmov or setcc.

Sandybridge-family can macro-fuse test / cmp (and sub, add, and various other non-read-only instructions) with signed or unsigned compares (not all JCC, but this isn't a factor). Bulldozer-family can fuse cmp / test with any JCC.

Core2 can only macro-fuse cmp with unsigned compares, not signed, but Core2 can't macro-fuse at all in 64-bit mode. (It can macro-fuse test with signed-compares in 32-bit mode, BTW.)

Nehalem can macro-fuse test or cmp with signed or unsigned compares (including in 64-bit mode).

Source: Agner Fog's microarch pdf.

Numidia answered 7/12, 2017 at 7:45 Comment(14)
It it fair summarize that as saying that today and for the foreseeable future signed comparisons are safer? The intro says "sometimes it can be safe to use signed" which seems to indicate that unsigned is always safe and only sometimes signed is safe, but the rest of the answer seems to indicate the opposite.Gerlac
BTW, thinking about this question makes it clear that the UB behavior in C or C++ with respect to forming out of bounds pointers even if you use them only in "innocent" places like comparisons actually make a lot of sense. I used to think that was mostly purely for exotic or archaic platforms where pointers had an unusual representation or were handled unusually in hardware, but on every platform you could have this wrapping issue, and restricting pointers to live within the bounds of the region they point to nicely avoids it.Gerlac
@BeeOnRope: Signed isn't future proof if we consider the possibility of fully 64-bit virtual addresses and an OS (or bare metal setup) that lets object span the signed wraparound point, the way 32-bit pointers already can. Unsigned is always safe if you use it carefully (i.e. make sure you don't wrap by forming pointers too far below the start of an object, where "too far" depends on how many pages around NULL are guaranteed unused.) I'm not sure any kind of blanket statement is really appropriate.Numidia
Also, a signed compare between high and low addresses gives different results. You didn't limit the question to user-space code for any particular purpose, so we have to consider kernels manipulating pointers, and IDK what they might want to know about pointers.Numidia
The question was intended to be limited to user-space, sorry if that wasn't clear (it's also not clear that unsigned gives "better" or "worse" results there anyways). I guess I don't see how you can use unsigned "always safely" even today: how could you even know how many pages are unused or how linkers and loaders work on every platform? That's also the stuff that changes frequently across platforms and across time. Solutions like "always add" are error-prone and not general anyways (what do you do if have to add a signed value - now you need a runtime check to avoid wrapping?).Gerlac
I guess the chance of having an error on a hypothetical future machine with more than 10 quintillion bytes of virtual address space where the OS choose to somehow allocate objects across the 2^63 boundary just seems like a very small risk to balance the other risks today. Anyways, it would be nice to see some kind of summary in your answer, since I can't really pin down the advice (but the answer is otherwise of high quality).Gerlac
@BeeOnRope: Usually in asm you know the semantic meaning of the thing you want to add or subtract to/from a pointer, and based on that you can choose a way to do it that you know won't wrap, can''t you? Anyway, now that you've narrowed the question down to mostly user-space, it's possible to write some kind of conclusion instead of just pointing out a bunch of specific stuff.Numidia
Usually, of course. Of course you have to be aware of it all the time and not make mistakes. Same is true in C and C++, but not always and unsigned use causes plenty of bugs there (even if only 1 out 1,000 subtractions from an unsigned value cause a bug it's pretty bad). There are enough cases where it isn't easy for it to matter: where you want to subtract and not add because the the other side is used in another context (e.g., rax is use "as is" in a loop to index something so you want to compare to rbx - C on termination, and not rax + C < rbx).Gerlac
And there are various cases where you calculate a difference between two values and you don't know which is larger, so you end up with a signed difference - and if you use that in a pointer calculation you are playing with fire if you do it unsigned. Sometimes one value should be larger than the other, or some value should be positive, but you accept all inputs and if you treat it as signed you just get the right answer, but unsigned you might get a catastrophically wrong answer. None of this applies to 32 bits mind you.Gerlac
@BeeOnRope: Updated with an attempt to make a conclusion (or TL:DR at the start really). Needs another proof-read, gotta go curling. Feel free to edit if needed.Numidia
With vdso uintptr_t compared with < yield the correct answer, unlike intptr_t, is that right?Avocation
@MaximEgorushkin: oh, yes if you want to consider the high half as "above" the low half.Numidia
May be a little table with pointers, operations and results would summarize it nicely.Avocation
@MaximEgorushkin: I added another paragraph to the TL:DR section to remind us about the term "high half" implying unsigned. IDK if an exhaustive table of all 4 cases of sign pairs is really needed. If you think it's a good idea, please leave an edit and let me know; I might want to re-edit.Numidia

© 2022 - 2024 — McMap. All rights reserved.