x86-64 canonical address?

Asked 15/9, 2014 at 16:11 Answered 27/9, 2019 at 10:21

Solved assembly x86-64 intel memory-address virtual-address-space

During reading of an Intel manual book I came across the following:

On processors that support Intel 64 architecture, the IA32_SYSENTER_ESP field and the IA32_SYSENTER_EIP field must each contain a canonical address.

What is a 'canonical address'?

Tantalus answered 15/9, 2014 at 16:11 Comment(3)

I've never heard of it, but Google showed me this - don't know if that's what you were looking for, or not. – Testis 15/9, 2014 at 16:14

My guess is a physical memory address rather than one translated by the page table. – Charming 15/9, 2014 at 16:22

@marko: no, canonical or not only applies to virtual addresses. With a 4-level page table (Why in 64bit the virtual address are 4 bits short (48bit long) compared with the physical address (52 bit long)?), there's only enough room to translate 48 bits, and canonical = correctly sign-extended to 64. An extension to add the option of a 5th level of page tables is coming in IceLake I think, widening the virtual address space to 57 bits. With non-volatile DIMMs, demand for huge virtual and physical address spaces is growing. – Anzio 21/4, 2019 at 18:2

I suggest that you download the full software developer's manual. The documentation is available in separate volumes, but that link gives you all seven volumes in a single massive PDF, which makes it easier to search for things.

The answer is in section 3.3.7.1. The first line of that section states

In 64-bit mode, an address is considered to be in canonical form if address bits 63 through to the most-significant implemented bit by the microarchitecture are set to either all ones or all zeros.

It goes on from there...

You can use cpuid to query the supported virtual address width on that CPU. (i.e. "implemented by the microarchitecture".) Or you can normally just assume 48-bit.

I.e. a canonical virtual address is 48 bits correctly sign-extended to 64. If the high bits don't match, it's non-canonical and will fault if you attempt to dereference it.

(Or with Intel's upcoming 5-level page table extension, 57 bits sign-extended to 64).

Mcnair answered 15/9, 2014 at 16:28 Comment(0)

This answer less detailed then previous ones but IMHO easier to understand:

While 64-bit processors have 64-bit wide registers, systems generally do not implement all 64-bits for addressing (16 exabytes of theoretical physical memory).

Thus most architectures define an unimplemented region of the address space which the processor will consider invalid for use. x86-64 (...) define the most-significant valid bit of an address, which must then be sign-extended (...) to create a valid address. The result of this is that the total address space is effectively divided into two parts, an upper and a lower portion, with the addresses in-between considered invalid. (...) Valid addresses are termed canonical addresses (invalid addresses being non-canonical).

From https://www.bottomupcs.com/virtual_memory_is.xhtml

Sign-extended is same bit most significant bit copied to the upper bits address. Upper is 11111... lower 00000....

Lyudmila answered 27/9, 2019 at 10:21 Comment(1)

Heh, I just edited the top answer to add something similar before scrolling down and seeing this. Still upvoted this; good summary, and useful link to more details. – Anzio 27/9, 2019 at 10:39

Section 3.3.7.1 of the Intel Manual covers this with 5 (difficult to digest) paragraphs, for me it's page 74 on the 4 volume set you can download from Intel site or go directly here: https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf

What these paragraphs say is canonical addresses are anything less than a full 64 bit address. There are different implementations of addressing such as 48 bit or 57 bit. (57-bit requires an extra level of page tables, increasing the cost of page-walks. See https://en.wikipedia.org/wiki/Intel_5-level_paging for more about this new CPU feature that can be left disabled).

A 48-bit implementation would have a high half canonical address starting at

0xFFFF800000000000

while the lower half would be

0x00007FFFFFFFFFFF

Bit 63 to whatever will signify it as a canonical address if you see all ones or all zeros. In a 57-bit implementation I'd immediately know I'm looking at a canonical address when I see 0xFF____ or 0x00____. (The low bit of the top byte is a significant address bit, and the other 7 are copies of it: i.e. correctly sign extended)

Maybe a helpful way to remember this is the word canonical itself means relating to a general rule, or way of doing something. In general, no one needs as many addresses as 64 bits can provide, so they are generally not used. Also if something is according to canon like in Star Trek or comic books, it's the way things were seen or done originally.

Now to answer WHY we have canonical addresses? No one will need to address up to 16 Exabytes (the theoretical limit of a 64 bit machine) so the second paragraph of that manual just says Intel architecture "defines" a 64 bit linear address but it looks like no one will use it. Now just in case, the third paragraph says the implementation will still check those first few bits and if NOT in canonical form, to generate a "general-protection" exception.

The main reason for checking for canonical addresses instead of silently ignoring the upper bits is to make sure software is forward compatible with future hardware that supports more virtual address bits.

Ruling answered 21/4, 2019 at 16:49 Comment(0)

Recommended topics

Hot tags