TLB usage with multiple page sizes in x86_64 architecture

Asked 15/11, 2012 at 13:11 Answered 19/3, 2015 at 21:5

Does anybody know if TLBs (L1 and L2) support simultaneous accesses with multiple page sizes in modern x86_64 microprocessor (Intel SandyBridge, AMD Bulldozer)? Does x86 core pipeline provides information about page size to MMU?

Best regards,

Alex

Usk answered 15/11, 2012 at 13:11 Comment(8)

What do you mean by "simultaneous accesses"? What do you mean by "L1" and "L2" in the context of TLBs? These are commonly used to refer to caches, not TLBs. – Slayton 15/11, 2012 at 13:59

In the modern x86 processors – Usk 15/11, 2012 at 17:42

In the modern x86 processors there are also double level tlbs. For example, in AMD bulldozer there are L1 DTLB 32 entry, full associative and L2 TLB 1024 entry 8-way associative. – Usk 15/11, 2012 at 19:21

OK, I see what you mean. The answer to your question is 100% implementation dependent. There's no guarantee that Sandybridge will behave the same as Ivybridge or Nehalem. There's no guarantee that AMD will behave the same as either one. – Slayton 15/11, 2012 at 19:51

By the way, what do you mean "support"? What do you mean "simultaneous"? Does that mean a single access that crosses page boundaries? Does that mean configuring more than one page size in the OS? – Slayton 15/11, 2012 at 19:52

By "simultaneous" I mean that at the same time there multiple page size descriptors in the TLB, i.e. 4K and 2M desciptors present in the TLB simultaneously. That is possible only if LSU provides information to the MMU about page size with request for address translation, so it will be used for "tag bits" in CAM search. – Usk 16/11, 2012 at 7:24

I've never thought about single access crossing page boundaries. Is it possible in x86? – Usk 16/11, 2012 at 7:25

From what I have observed in Linux one can have virtual pages with multiple sizes at the same time in single application. I doubt that when accessing huge page after small that core should flush TLB. It would be too costly. So I think TLB is capable of preserving descriptors with different page sizes. – Usk 16/11, 2012 at 7:29

This is not a question of what the TLBs allow, but rather of what the architecture allows. The architecture says that you can mix small (4k), large (2M) and huge (1G) pages in the same page hierarchy, by setting the PS bit in the page directory entry at the appropriate level.

Not all levels of TLBs will necessarily be able to cache pages of all sizes, but that shouldn't stop you from mixing pages if you so wish.

Now, there's nothing in the x86 pipe before the MMU that should actually require data about the page size. That is all encoded in the page hierarchy itself.

Regarding page splits, if you have a page boundary at address x, and you have a memory access that starts at x - 1 that is more than 1 byte wide, it'll access both pages. This will work even if the two pages are different sizes.

Slayton answered 16/11, 2012 at 14:30 Comment(5)

Actualy I am interested in TLB behaviour when there are multiple page sizes in access sequence. More precisely I want to know if it will be tlb flush when after series of 4k accesses there comes 2M access. I fully understand that it is a implimentation dependent as TLB in itself is (x86_64 can be implemented without TLB at all). – Usk 16/11, 2012 at 21:51

If translation request does not include information about its page size then it is impossible to detect tag bits for associative search in tlb (tag bits should be equivalent to page number). – Usk 16/11, 2012 at 21:59

Re your 1st comment, this is not only implementation dependent, but it's also dependent on what the other thread is doing, as the TLBs are shared. Note that there are also various microarchitectural events that may be invisible to the software that may cause a TLB flush. The answer is that in most Intel processors that I am familiar with, accessing pages of different sizes doesn't trigger a TLB flush. – Slayton 17/11, 2012 at 12:19

Re your 2nd comment, the TLB should keep the page size as part of the TLB (for instance, by keeping a tag + mask for address matching). Each linear address matches only one physical address, so if a linear address hits a TLB entry with one size, it will only hit that one entry. There should be no confusion. – Slayton 17/11, 2012 at 12:23

I think that your last comment fully answers to the original question. That is the tlb support information about page size and according to this tag mask is applied. So there can be different page size entries in tlb at the same time and no additional information is needed from LSU to TLB. Thank you for the discussion. – Usk 17/11, 2012 at 13:15

The TLB is typically divided in two: code and data. Each of these might be divided into a number of levels but typically a L1 and possibly a L2. Each level might support a single page size or mixed page sizes.

For example on my processor, I have I-L1 TLB for 2mb/4mb pages, mixed, a D-L1 TLB for 2mb/4mb pages, mixed, a I-L1 TLB for 4kb pages, a D-L1 TLB for 4kb pages and finally a D-L2 TLB for 4kb pages.

When a TLB supports mixed pages, the TLB stores the page size associated with a particular virtual address tag.

When a TLB level has multiple separate caches per page size, the lookup is performed in parallel since the page size is yet unknown.

In either case, if the L1 TLB misses, the L2 will be checked before attempting a page table walk.

Now that the specifics are out of the way, we can finally answer your question. You can use multiple page sizes at the same time, they however can never overlap (the OS will not let you map 2 virtual pages at the same location). In fact, the kernel internally uses multiple page sizes for various things.

Depending on the OS, using multiple page sizes in a user space process can be easy or painful. See Linux Huge page support and Windows Large Pages support for further details about this. Other OSes will have details about this in their documentation.

Recommendation answered 19/3, 2015 at 21:5 Comment(2)

So, in the TLB during the translation of a single virtual address, it's not possible for hits to occur for multiple page sizes (1GB, 2MB, 4KB) simultaneously, correct? @Nicholas Frechette – Turbidimeter 26/3 at 14:25

Each virtual address lives within a single memory page determined during its allocation. Once the page is freed and returned to the OS to be un-mapped, it will purge any stale entries from the TLB. This is done on every core (you can search google for TLB drive by shooting). It prevents another core from seeing a potentially stale entry. From that point on, that address can no longer 'hit' in the TLB cache until it is allocated again with its new page size. – Recommendation 29/3 at 13:55

Recommended topics

Hot tags