On AMD64 compliant architectures, addresses need to be in canonical form before being dereferenced.
From the Intel manual, section 3.3.7.1:
In 64-bit mode, an address is considered to be in canonical form if address bits 63 through to the most-significant implemented bit by the microarchitecture are set to either all ones or all zeros.
Now, the most significat implemented bit on current operating systems and architectures is the 47th bit. This leaves us with a 48-bit address space.
Especially when ASLR is enabled, user programs can expect to receive an address with the 47th bit set.
If optimizations such as pointer tagging are used and the upper bits are used to store information, the program must make sure the 48th to 63th bits are set back to whatever the 47th bit was before dereferencing the address.
But consider this code:
int main()
{
int* intArray = new int[100];
int* it = intArray;
// Fill the array with any value.
for (int i = 0; i < 100; i++)
{
*it = 20;
it++;
}
delete [] intArray;
return 0;
}
Now consider that intArray
is, say:
0000 0000 0000 0000 0111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1100
After setting it
to intArray
and increasing it
once, and considering sizeof(int) == 4
, it will become:
0000 0000 0000 0000 1000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
The 47th bit is in bold. What happens here is that the second pointer retrieved by pointer arithmetic is invalid because not in canonical form. The correct address should be:
1111 1111 1111 1111 1000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
How do programs deal with this? Is there a guarantee by the OS that you will never be allocated memory whose address range does not vary by the 47th bit?