Using the extra 16 bits in 64-bit pointers

Asked 24/4, 2013 at 17:40 Answered 26/10, 2023 at 10:28

I read that a 64-bit machine actually uses only 48 bits of address (specifically, I'm using Intel core i7).

I would expect that the extra 16 bits (bits 48-63) are irrelevant for the address, and would be ignored. But when I try to access such an address I got a signal EXC_BAD_ACCESS.

My code is:

int *p1 = &val;
int *p2 = (int *)((long)p1 | 1ll<<48);//set bit 48, which should be irrelevant
int v = *p2; //Here I receive a signal EXC_BAD_ACCESS.

Why this is so? Is there a way to use these 16 bits?

This could be used to build more cache-friendly linked list. Instead of using 8 bytes for next ptr, and 8 bytes for key (due to alignment restriction), the key could be embedded into the pointer.

Ruelas answered 24/4, 2013 at 17:40 Comment(4)

Those bits are not ignored, but checked to see if the address is canonical. – Belen 24/4, 2013 at 18:33

How many bits are used depend on the architecture. For example iOS on ARM64 only use 33 bits for addresses. On x86_64 currently only 48 bits are used – Heterozygous 22/11, 2014 at 4:44

You can pack structs if you want, so you don't waste bytes on padding. x86 has fast unaligned accesses. – Reckon 2/11, 2016 at 15:3

Can I use some bits of pointer (x86_64) for custom data? And how if possible? – Heterozygous 27/6, 2018 at 1:49

The high order bits are reserved in case the address bus would be increased in the future, so you can't use it simply like that

The AMD64 architecture defines a 64-bit virtual address format, of which the low-order 48 bits are used in current implementations (...) The architecture definition allows this limit to be raised in future implementations to the full 64 bits, extending the virtual address space to 16 EB (2⁶⁴ bytes). This is compared to just 4 GB (2³² bytes) for the x86.

^{http://en.wikipedia.org/wiki/X86-64#Architectural_features}

More importantly, according to the same article [Emphasis mine]:

... in the first implementations of the architecture, only the least significant 48 bits of a virtual address would actually be used in address translation (page table lookup). Further, bits 48 through 63 of any virtual address must be copies of bit 47 (in a manner akin to sign extension), or the processor will raise an exception. Addresses complying with this rule are referred to as "canonical form."

As the CPU will check the high bits even if they're unused, they're not really "irrelevant". You need to make sure that the address is canonical before using the pointer.

Some recent CPUs can optionally ignore high bits, checking only that the topmost matches bit #47 (PML4) or #56 (PML5). Intel Linear Address Masking (LAM) can be enabled by the kernel on a per-process basis, for either user-space, the kernel, or both. AMD UAI (Upper Address Ignore) is similar. ARM64 has a similar feature, Top Byte Ignore (TBI). This makes it more efficient to store data in pointers, and easier (you don't have to manually strip it out before deref or passing to a function that isn't aware of the tagging.)

That said, in x86_64 you're still free to use the high 16 bits if needed (if the virtual address is not wider than 48 bits, see below), but you have to check and fix the pointer value by sign-extending it before dereferencing.

^{Note that casting the pointer value to long is not the correct way to do because long is not guaranteed to be wide enough to store pointers. You need to use uintptr_t or intptr_t.}

int *p1 = &val; // original pointer
uint8_t data = ...;
const uintptr_t MASK = ~(1ULL << 48);

// === Store data into the pointer ===
// Note: To be on the safe side and future-proof (because future implementations
//     can increase the number of significant bits in the pointer), we should
//     store values from the most significant bits down to the lower ones
int *p2 = (int *)(((uintptr_t)p1 & MASK) | (data << 56));

// === Get the data stored in the pointer ===
data = (uintptr_t)p2 >> 56;

// === Deference the pointer ===
// Sign extend first to make the pointer canonical
// Note: Technically this is implementation defined. You may want a more
//     standard-compliant way to sign-extend the value
intptr_t p3 = ((intptr_t)p2 << 16) >> 16;
val = *(int*)p3;

WebKit's JavaScriptCore and Mozilla's SpiderMonkey engine as well as LuaJIT use this in the nan-boxing technique. If the value is NaN, the low 48-bits will store the pointer to the object with the high 16 bits serve as tag bits, otherwise it's a double value.

Previously Linux also uses the 63^rd bit of the GS base address to indicate whether the value was written by the kernel

In reality you can usually use the 48^th bit, too. Because most modern 64-bit OSes split kernel and user space in half, so bit 47 is always zero and you have 17 top bits free for use

You can also use the lower bits to store data. It's called a tagged pointer. If int is 4-byte aligned then the 2 low bits are always 0 and you can use them like in 32-bit architectures. For 64-bit values you can use the 3 low bits because they're already 8-byte aligned. Again you also need to clear those bits before dereferencing.

int *p1 = &val; // the pointer we want to store the value into
int tag = 1;
const uintptr_t MASK = ~0x03ULL;

// === Store the tag ===
int *p2 = (int *)(((uintptr_t)p1 & MASK) | tag);

// === Get the tag ===
tag = (uintptr_t)p2 & 0x03;

// === Get the referenced data ===
// Clear the 2 tag bits before using the pointer
intptr_t p3 = (uintptr_t)p2 & MASK;
val = *(int*)p3;

One famous user of this is the V8 engine with SMI (small integer) optimization. The lowest bit in the address will serve as a tag for type:

if it's 1, the value is a pointer to the real data (objects, floats or bigger integers). The next higher bit (w) indicates that the pointer is weak or strong. Just clear the tag bits and dereference it
if it's 0, it's a small integer. In 32-bit V8 or 64-bit V8 with pointer compression it's a 31-bit int, do a signed right shift by 1 to restore the value; in 64-bit V8 without pointer compression it's a 32-bit int in the upper half

   32-bit V8
                           |----- 32 bits -----|
   Pointer:                |_____address_____w1|
   Smi:                    |___int31_value____0|
   
   64-bit V8
               |----- 32 bits -----|----- 32 bits -----|
   Pointer:    |________________address______________w1|
   Smi:        |____int32_value____|0000000000000000000|

https://v8.dev/blog/pointer-compression

So as commented below, Intel has published PML5 which provides a 57-bit virtual address space, if you're on such a system you can only use 7 high bits

You can still use some work around to get more free bits though. First you can try to use a 32-bit pointer in 64-bit OSes. In Linux if x32abi is allowed then pointers are only 32-bit long. In Windows just clear the /LARGEADDRESSAWARE flag and pointers now have only 32 significant bits and you can use the upper 32 bits for your purpose. See How to detect X32 on Windows?. Another way is to use some pointer compression tricks: How does the compressed pointer implementation in V8 differ from JVM's compressed Oops?

You can further get more bits by requesting the OS to allocate memory only in the low region. For example if you can ensure that your application never uses more than 64MB of memory then you need only a 26-bit address. And if all the allocations are 32-byte aligned then you have 5 more bits to use, which means you can store 64 - 21 = 43 bits of information in the pointer!

I guess ZGC is one example of this. It uses only 42 bits for addressing which allows for 2⁴² bytes = 4 × 2⁴⁰ bytes = 4 TB

ZGC therefore just reserves 16TB of address space (but not actually uses all of this memory) starting at address 4TB.

A first look into ZGC

It uses the bits in the pointer like this:

 6                 4 4 4  4 4                                             0
 3                 7 6 5  2 1                                             0
+-------------------+-+----+-----------------------------------------------+
|00000000 00000000 0|0|1111|11 11111111 11111111 11111111 11111111 11111111|
+-------------------+-+----+-----------------------------------------------+
|                   | |    |
|                   | |    * 41-0 Object Offset (42-bits, 4TB address space)
|                   | |
|                   | * 45-42 Metadata Bits (4-bits)  0001 = Marked0
|                   |                                 0010 = Marked1
|                   |                                 0100 = Remapped
|                   |                                 1000 = Finalizable
|                   |
|                   * 46-46 Unused (1-bit, always zero)
|
* 63-47 Fixed (17-bits, always zero)

For more information on how to do that see

Side note: Using linked list for cases with tiny key values compared to the pointers is a huge memory waste, and it's also slower due to bad cache locality. In fact you shouldn't use linked list in most real life problems

Heterozygous answered 25/8, 2013 at 7:4 Comment(18)

One very very VERY important warning: The reason why canonical form exists is specifically to make it difficult to re-use those 16 bits for other purposes. One day, they'll open up all 64 bits, and then your code will break. – Wolfy 31/12, 2016 at 16:9

@Wolfy you can use from the most significant bits instead of right from bit 48. That reduces the chance for the code to be broken in the not very near future. It's extremely unlikely that personal CPUs will have full 64-bit bus width in the predictable future – Heterozygous 1/1, 2017 at 9:14

anyway, using the low-order bits will always be safe and should be used instead if one doesn't need so many bits – Heterozygous 1/1, 2017 at 9:22

WARNING! The code "intptr_t p3 = ((intptr_t)p2 << 16) >> 16;" is undefined behavior if any of those top 16 bits aren't zero, because C++ considers it to be signed overflow. You need to use unsigned. Unfortunately, to do sign extension, you'd need to use signed numbers. Also unfortunately, signed right-shift is implementation-defined. Anyway, you want to use either intptr_t p3 = (intptr_t)((uintptr_t)p2 << 16) >> 16; which works on all known x86-64 compilers, or if you want truly well-defined, use division: intptr_t p3 = (intptr_t)((uintptr_t)p2 << 16) / 65536; godbolt.org/g/5P4tJF – Big 8/12, 2017 at 7:57

Also, Use -fsanitize-undefined to get the compiler to generate code which catches UB. Example => godbolt.org/g/N8ax6q – Big 8/12, 2017 at 7:58

In the x32 ABI, uintptr_t is a 32-bit type. But there aren't spare high bits in 32-bit pointers anyway, so that hardly matters. Mostly just a warning that #ifdef __x86_64__ doesn't always mean 64/48-bit pointers. (You can still use the low bits of aligned pointers in x32, though.) – Reckon 18/7, 2018 at 11:54

@PeterCordes the OP isn't using x32 ABI. And even on x32 ABI you can use a few low bits for the tag – Heterozygous 18/7, 2018 at 14:37

I know the OP isn't using x32, but your suggestion to use uintptr_t got me thinking about types. I guess that's irrelevant because the question is only asking about 64-bit pointers. You're right that long is wrong (e.g. breaks on the Windows ABI) and uintptr_t correct: it's always the width of a pointer even if that's not 64 bits. – Reckon 18/7, 2018 at 15:21

Generally an excellent comment - thanks. I would prefer that the side note on linked-list should be removed. It's a C++ specific comment and only relevant for doubly-linked lists. In general, singly linked lists are a very important data type for application programmer and this advice could easily be misconstrued. It breaks the flow of an otherwise very good post and putting in the appropriate caveats so it will not mislead would make it bigger, more intrusive and even less useful. – Conchoidal 6/4, 2020 at 9:56

@Conchoidal it's not about C++ but any languages that allows linked lists. There are very few cases where singly-linked lists are useful, most are for elements that are huge and are removed/inserted very often. The cost to manage memory allocation in this case is far higher than the cost to move it in a linear list. Every time you allocate 2 bytes a huge block of 16 or 32 bytes or more is reserved for you, and then the cost of the pointer which are many times bigger than the data itself – Heterozygous 6/4, 2020 at 10:10

@Heterozygous Your claim that there are few cases is not helpful. The important use cases arise when it is advantageous to share the tails of the lists. A good example arises when doing search algorithms when sharing common trails make the algorithm practical. Another good example arises in parsing ambiguous context-free grammars e.g. Earley's algorithm. Such storage sharing makes these practical. – Conchoidal 7/4, 2020 at 15:38

PML5 is already documented, maybe even available in hardware already. An extra level of page tables gives us 57-bit virtual addresses, if the OS chooses to enable it. Leaving 7 high bits. (And for 16-byte aligned allocations, 4 low bits which can be cleared more easily and efficiently.) Also note that best-case having to modify a pointer before dereferencing takes pointer-chasing latency from 4 cycles to 6: 1 ALU op plus defeating SnB-family's fast-path for simple [reg+0..2047] addressing modes. – Reckon 25/8, 2020 at 20:50

Any idea why V8 chose their low-bit flag such that both pointer and small-integer need an extra ALU operation to make it usable? If 0 means pointer, it's directly usable, no instruction needed. A right-shift will shift out a 1 as easily as a 0. It also introduces an extra 1 or 2 cycles of latency for the pointer case if this pointer was loaded from memory: 1 for and reg, -2 itself, plus another 1 on Intel CPUs for defeating the special-case [reg+0..2047] load-use latency fast path – Reckon 11/9, 2020 at 8:37

TL:DR: unless I'm missing something, V8's way seems like the opposite of what you should suggest. Especially for values that could be pointers to pointers (part of a linked list or tree, where pointer-chasing load-use latency is relevant). 4 vs. 6 cycle pointer-chasing latency is a huge difference. I guess it's still relevant to describe it as something that a practical real-world implementation really is doing, though. – Reckon 11/9, 2020 at 8:39

@PeterCordes that's a good question. I guess because they use both low bits for the tag, the second lowest bit indicates that the reference is weak or strong. But yes they can encode it so that strong pointers (which is the most common type of pointers) don't need any ALU operations. Perhaps you should ask them about that? – Heterozygous 11/9, 2020 at 10:20

Ah, that makes a lot more sense. Unless there are cases where you could statically prove the pointers were all strong (and thus could omit an and from that path), it would require another test and branch to skip the and. That may not be worth it vs. just doing it unconditionally even at the cost of latency. If these are pointers to integers, the deref might often not be part of a loop-carried dep chain. – Reckon 11/9, 2020 at 10:49

@Heterozygous it's almost certainly faster to store the spare 16 physical memory bits in the low 16 bits of the packed pointer. They can be accessed with just movzwl and the pointer value can be accessed with shrq $imm8. This is faster than the alternative of shrq $imm8 and movabs + andq for the data and pointer values respectively. – Glister 19/4, 2022 at 5:40

default alignment of new on Windows is 16 bytes, which gives 4 bottom bits to play with. Related, some versions of Java make all pointers 8 byte aligned, allowing them full use of 32GB of RAM in 32 bit pointers (Compressed Oops) – Use 18/2, 2023 at 1:51

I guess no-one mentioned possible use of bit fields ( https://en.cppreference.com/w/cpp/language/bit_field ) in this context, e.g.

template<typename T>
struct My64Ptr
{
    signed long long ptr : 48; // as per phuclv's comment, we need the type to be signed to be sign extended
    unsigned long long ch : 8; // ...and, what's more, as Peter Cordes pointed out, it's better to mark signedness of bit field explicitly (before C++14)
    unsigned long long b1 : 1; // Additionally, as Peter found out, types can differ by sign and it doesn't mean the beginning of another bit field (MSVC is particularly strict about it: other type == new bit field)
    unsigned long long b2 : 1;
    unsigned long long b3 : 1;
    unsigned long long still5bitsLeft : 5;

    inline My64Ptr(T* ptr) : ptr((long long) ptr)
    {
    }

    inline operator T*()
    {
        return (T*) ptr;
    }
    inline T* operator->()
    {
        return (T*)ptr;
    }
};

My64Ptr<const char> ptr ("abcdefg");
ptr.ch = 'Z';
ptr.b1 = true;
ptr.still5bitsLeft = 23;
std::cout << ptr << ", char=" << char(ptr.ch) << ", byte1=" << ptr.b1 << 
  ", 5bitsLeft=" << ptr.still5bitsLeft << " ...BTW: sizeof(ptr)=" << sizeof(ptr);

// The output is: abcdefg, char=Z, byte1=1, 5bitsLeft=23 ...BTW: sizeof(ptr)=8
// With all signed long long fields, the output would be: abcdefg, char=Z, byte1=-1, 5bitsLeft=-9 ...BTW: sizeof(ptr)=8

I think it may be quite a convenient way to try to make use of these 16 bits, if we really want to save some memory. All the bitwise (& and |) operations and cast to full 64-bit pointer are done by compiler (though, of course, executed in run time).

Martinsen answered 25/8, 2020 at 14:10 Comment(8)

you need long long ptr : 48 instead of unsigned long long to sign-extend the pointer – Heterozygous 25/8, 2020 at 23:39

thanks, phuclv. I have updated the code accordingly. Unfortunately it makes usage of other fields slightly less convenient, because they also have to be signed (bit field requires all the same types) – Martinsen 26/8, 2020 at 9:6

I'd recommend using the top 7 for your own data, not the 7 or 8 bits just above the 48. The top bits can be more cheaply extracted with just a shift, not leaving any high garbage that needs clearing. (With PML5 for another level of page tables, virtual addresses are 57 bits wide, leaving only 7 unused bits. But if you assume your pointers are all user-space in the lower canonical range, you can use the 8 high bits and zero-extend by using unsigned long long ptr, always clearing the top significant bit. – Reckon 26/8, 2020 at 9:8

Note that the signedness of a bitfield is no guaranteed unless you make it explicit, before C++14. So signed long long would be better. (See the Notes at the bottom of en.cppreference.com/w/cpp/language/bit_field) – Reckon 26/8, 2020 at 9:14

I didn't see cppreference mention that all the bitfield member have to be the same type. clang for x86-64 System V (godbolt.org/z/djP86v) still packs them into one 8-byte object when you have a mix of signed long long ptr and unsigned int, even when there's a type difference not at a byte boundary. Is that not guaranteed by ISO C++? Oh apparently not; MSVC make the struct 16 bytes when it has signed long long and unsigned int members. But still 8 when it's signed and unsigned long long members: godbolt.org/z/6GWrM4 – Reckon 26/8, 2020 at 9:20

So it's an ABI choice whether or not members of different type are packed together or whether that starts a new chunk. IDK about only differing in signedness, that might also be a choice where x86-64 SysV and Windows x64 both happen to choose the same. But anyway, those are the only x86-64 ABIs so you can still use it. (Except for x32, ILP32 in long mode where all 32 bits of pointers are significant, but you could store them in 8-byte ptr:tag pairs.) – Reckon 26/8, 2020 at 9:28

Thanks, Peter, I have updated the code snippet (48-bit pointer is explicitly signed and other fields are unsigned, and still 8 bytes total) – Martinsen 26/8, 2020 at 9:53

Update: hardware support for ignoring some high bits still check the top bit, so only using the 6 bits below that lets you deref pointers without any masking on systems that support Intel LAM. (phoronix.com/news/Torvalds-Cleans-Up-LAM-Linux-64 has a bitfield diagram for the LAM57 version which is future-compatible with PML5). But AMD UAI (phoronix.com/news/AMD-Linux-UAI-Zen-4-Tagging) is I think more like ARMv8.5's Top Byte Ignore, so yeah, use all 7 high bits (above the PML5 most significant bit). – Reckon 26/10, 2023 at 12:24

A standards-compliant way to canonicalize AMD/Intel x64 pointers (based on the current documentation of canonical pointers and 48-bit addressing) is

int *p2 = (int *)(((uintptr_t)p1 & ((1ull << 48) - 1)) |
    ~(((uintptr_t)p1 & (1ull << 47)) - 1));

This first clears the upper 16 bits of the pointer. Then, if bit 47 is 1, this sets bits 47 through 63, but if bit 47 is 0, this does a logical OR with the value 0 (no change).

Centripetal answered 21/5, 2020 at 8:18 Comment(11)

Note that in user-space in most OSes, you can count on your pointer being in the low half of virtual address space, where sign extension is equivalent to zero-extension. So you actually just need the AND-mask part. – Reckon 25/8, 2020 at 20:41

If you were going to do the full general case, it would be more efficient to just redo sign-extension with ((intptr_t)p1 << 16) >> 16. (Or use uintptr_t for the left shift if you care about compilers that don't define the behaviour of shifting bits out the top of a signed integer). That also avoids needing any 64-bit constants, although it has worse critical-path latency than (uintptr_t)p1 & ((1ULL<<48) - 1) – Reckon 26/10, 2023 at 12:34

@PeterCordes The problem with that is not just the undefined behavior of the left shift on signed types that you point out, but also (in C, and any C++ prior to C++20) the implementation-defined behavior of right shift of negative numbers (it is NOT guaranteed to be arithmetic shift, so it isn't guaranteed to sign-extend the number at all!). – Centripetal 22/11, 2023 at 4:47

If you were worried about that level of portability, you couldn't make assumptions about pointer bit-patterns and what conversion from T* to uintptr_t does to the bits. So you couldn't really be doing this in the first place. All the x86-64 compilers anyone cares about use arithmetic right-shift on signed types. GNU C documents this, and MSVC of course does it that way, too, and I'd be shocked if any work differently unless intentionally Deathstation 9000. Avoiding UB is potentially useful because of overly aggressive optimizers, unlike with relying on implementation-defined behaviour. – Reckon 22/11, 2023 at 4:59

Apparently it's possible to write a fully portable arithmetic right shift which GCC/clang can even compile to a single instruction: github.com/Rupt/c-arithmetic-right-shift/blob/master/sar.c (MIT license) / godbolt.org/z/YasMG5WYe includes a fully safe version using it which does compile to two shifts. (Or AArch64 sbfx). Unfortunately your code compiles with 2 64-bit constants, two ANDs, a NEG, and an OR. – Reckon 22/11, 2023 at 5:10

@PeterCordes There is a different between platform-specific portability (in this case, we know the platform and its pointer behavior) and compiler-specific portability (in this case, we DON'T know the compiler or its right-shift behavior). That said, I believe C++20 has moved in the right direction by mandating arithmetic-shift-right for signed integers, and if this question were tagged C++ (and C++20 had been released when I answered), your shifting approach would be a feasible alternative for some situations, with appropriate caveats. – Centripetal 22/11, 2023 at 6:10

ISO C leaves the door wide open for implementation choices that nobody actually wants. Like logical right shifts on signed types, and char* to uintptr_t conversion that doesn't preserve the bit-pattern. I'm arguing that if we assume a compiler that's trying to be useful, not actively hostile, we can in practice assume arithmetic right shifts on x86-64 because the hardware can do them efficiently so there's no reason for a sane implementation to pick anything else. An ISO C compliant compiler could I think rotate or NOT the bits when converting between pointers and uint64_t, if it wanted. – Reckon 22/11, 2023 at 6:42

All ISO C says is that conversion from pointer-to-void to (u)intptr_t and back must produce a pointer that compares equal to the original. (And pointer<->integer conversion in general is implementation-defined). I'll grant you that it's more plausible (at least not so intentionally programmer-hostile) for an implementation to use logical right shifts than to implement pointer-to-integer conversion in a way that changes the bit-pattern on x86-64. To work around this, the char* object representation is also accessible via memcpy, but implementations could make deref do some bit-manip... – Reckon 22/11, 2023 at 7:2

Also, note that my approach (2 ANDs against constants, 1 BITWISE-NOT, and 1 OR) is already pretty efficient; back of the envelope, 2 shift operations probably only retire 1 cycle faster, given the dependencies involved. Anyway, given the OP was asking for how to use the bits, rather than how to micro-optimize instruction cycles, having a solution that could quietly (but legally) fail according to the quirks of specific compilers (for C or pre-C++20) is probably counterproductive. – Centripetal 22/11, 2023 at 7:2

Using up 2 registers for constants is bad, and so is running two 10-byte movabs instructions for every deref if the compiler doesn't dedicate two registers to the constants (the I-cache footprint is bad). Even if the right shift did run as logical instead of arithmetic on some hypothetical deathstation 9000, it would still work anyway except for kernel code because user-space addresses are in the low half (on all the major OSes.) Not relying on arithmetic right shift is a fun exercise in bit-manipulation, but it's not something I think is reasonable for production use. – Reckon 22/11, 2023 at 7:7

Let us continue this discussion in chat. – Centripetal 22/11, 2023 at 7:10

According to the Intel Manuals (volume 1, section 3.3.7.1) linear addresses has to be in the canonical form. This means that indeed only 48 bits are used and the extra 16 bits are sign extended. Moreover, the implementation is required to check whether an address is in that form and if it is not generate an exception. That's why there is no way to use those additional 16 bits.

The reason why it is done in such way is quite simple. Currently 48-bit virtual address space is more than enough (and because of the CPU production cost there is no point in making it larger) but undoubtedly in the future the additional bits will be needed. If applications/kernels were to use them for their own purposes compatibility problems will arise and that's what CPU vendors want to avoid.

Stipend answered 25/4, 2013 at 10:0 Comment(1)

there is no way to use those additional 16 bits is not correct. There are several ways that can be used in the foreseeable future – Heterozygous 27/6, 2018 at 1:46

-1

Try to print out your changed ptr after bit shift:

 int var{ 1 };
 int* p{ &var };
 cout << p;
 p = (int*)((uintptr_t)p | 1ll << 50);
 cout << " shifted: " << p;

I had this output: So pointer value changed, but what the error is "Access violation"?

This error means, that someone is trying to access memory, that was not reserved: https://mcmap.net/q/161745/-what-does-access-violation-mean and when you are dereferencing at the third line, you get this error. For example, I had this error, when i == 10:
```
for (int i = 1; i < 64; i++) {
    p = (int*)((uintptr_t)p | 1ll << i);
    int v = *p;
}
```

Sketchy answered 26/10, 2023 at 10:28 Comment(3)

You're missing the point. For i == 48 or greater, the address is non-canonical so there's no way it could be reserved in the first place. The hardware fault in #GP(0) rather than #PF (page fault), because that address can't ever be valid no matter what's in the page tables; it's outside the virtual address range that the page tables cover. See also Why in x86-64 the virtual address are 4 bits shorter than physical (48 bits vs. 52 long)? . Your test with small i is doing something qualitatively different from setting high bits. – Reckon 26/10, 2023 at 12:41

Also, I'm surprised you got a fault with i == 10; that's just setting a bit in the offset-within-page part of an address. For any valid p you can dereference, p | (1<<10) should also be a valid address you can deref without faulting. – Reckon 26/10, 2023 at 12:43

Oh, you're updating the same p every time in a loop, so the low bits of the pointer are set and you're doing an unaligned dword (4-byte) load. So just setting all the offset-within-page bits and then doing a 4-byte load starting at the 2nd-last byte of a page will fault. (With i==11, though, when I tested; that makes sense where i==10 doesn't, since 4K pages means 12 page-offset bits.) You start with i=1 so the lowest bit of the pointer doesn't get set, so it's still aligned by 2. The fault address was 0x7fffffffeffe when I tested with int x = 0; int *p = &x; locals. – Reckon 26/10, 2023 at 12:49

-2

Physical memory is 48 bit addressed. That's enough to address a lot of RAM. However between your program running on the CPU core and the RAM is the memory management unit, part of the CPU. Your program is addressing virtual memory, and the MMU is responsible for translating between virtual addresses and physical addresses. The virtual addresses are 64 bit.

The value of a virtual address tells you nothing about the corresponding physical address. Indeed, because of how virtual memory systems work there's no guarantee that the corresponding physical address will be the same moment to moment. And if you get creative with mmap() you can make two or more virtual addresses point at the same physical address (wherever that happens to be). If you then write to any of those virtual addresses you're actually writing to just one physical address (wherever that happens to be). This sort of trick is quite useful in signal processing.

Thus when you tamper with the 48th bit of your pointer (which is pointing at a virtual address) the MMU can't find that new address in the table of memory allocated to your program by the OS (or by yourself using malloc()). It raises an interrupt in protest, the OS catches that and terminates your program with the signal you mention.

If you want to know more I suggest you Google "modern computer architecture" and do some reading about the hardware that underpins your program.

Orville answered 24/4, 2013 at 18:27 Comment(2)

On current x86_64 implementations virtual memory is actually 48 bit addressed (Intel Manuals, vol 1, 3.3.7.1) the remaining 16 bits are sign extended. The size of the physical address range is implementation-specific (Intel Manuals, vol 3, 3.3.1). – Dilapidate 25/4, 2013 at 9:54

Related: Why in x86-64 the virtual address are 4 bits shorter than physical (48 bits vs. 52 long)? - the upper limit on phys address space is set by the page table entry format, the 48 significant bits of virtual addresses is set by the page-table depth. (4 level, or 5 levels with PML5 for 57-bit virtual addresses.) – Reckon 25/8, 2020 at 20:44

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags