Who performs the TLB shootdown?

P

2

10

I read this SO question describing what a TLB shootdown is. I'm trying to understand if this is an operation performed by the kernel or by the processor or both?

My questions are :-

Does a TLB shootdown happen upon context switch? I would assume no, because there is a need to be able to execute multiple processes concurrently on multiprocessor CPUs. Is this assumption correct?
When exactly does a TLB shootdown happen?
Who performs the actual TLB shootdown? Is it the kernel(if so, where can I find the code that performs the flushing?) or is it the CPU(if so, what triggers the action) or is it both(the kernel executes an instruction which causes an interrupt, which in turns causes the CPU to perform the TLB shootdown)

Pinkard answered 9/5, 2018 at 15:14 Comment(0)

C

14

The x86 TLB's are not shared across cores and are not synchronized among themselves at the hardware level.
It is the OS that instructs a processor to flush its TLB.
Instructing the "current" processor amounts to calling a function, instructing another processor amounts to making an IPI.

The term "TLB shootdown" refers explicitly to this (even more than normal) expensive case where, to keep system consistency, the OS has to tell other processors to invalidate their TLBs in order to reach the same mapping of a specific processor.

I think this is only necessary if the new mapping affects some shared memory, otherwise each processor is executing a different instance of a process, each one with its mapping.

During a context switch, the TLB is flushed to remove the old mapping, this must be done independently of the last processor the scheduled program ran on.
Since the processor is flushing its own TLB, this is not a TLB shootdown.

Shared areas that must be kept consistent all the time between processor can be: kernel pages, memory mapped IO, shared memory mapped files.

The execution of the instructions invlpg, invpcid, a move to cr0, cr3 (including during an hw task switch) or cr4 and a VMX transition, all invalidate the TLB.
For the exact granularity and semantic, see section 4.10.4 of the Intel Manual 3.

Cita answered 9/5, 2018 at 16:0 Comment(0)

S

10

When exactly does a TLB shootdown happen?

It happens when the operating system or hypervisor requests it.

At the ISA level, certain operations can perform TLB shootdowns (see the Intel manual V3 4.10.4 and AMD manual V2 5.5.2), thereby invalidating one or more TLB entries in one or more local or remote TLB caches (those of other logical cores of the same CPU and all other kinds of processors that have TLBs and share the same physical memory address space).

Note also that any paging structure entry can be cached even if it has not been accessed by any retired instruction. This can happen due to speculative execution or MMU prefetching. Therefore, in general, any entry can be cached or invalidated at any time. Of course, there are specific guarantees given so that the MMU caches can be managed and kept coherent with in-memory paging structures.

Who performs the actual TLB shootdown? Is it the kernel(if so, where can I find the code that performs the flushing?) or is it the CPU(if so, what triggers the action) or is it both(the kernel executes an instruction which causes an interrupt, which in turns causes the CPU to perform the TLB shootdown)

As I said before, the CPU itself can invalidate any entry any time. In addition, software with current privilege level (CPL) = 0 can perform any of the operations related to TLB management.

An Introduction to TLB Invalidation in the Linux Kernel

The Linux kernel defines TLB-invalidation functions that are architecture-dependent (/arch/x86/mm/tlb.c) and functions that are architecture-dependent (/arch/x86/include/asm/tlbflush.h). That's because different architectures offer wildly different mechanisms for managing the TLBs. To see some examples of when the Linux kernel performs TLB invalidations, refer to the tlb_flush_reason enum (comments are mine):

enum tlb_flush_reason {

    // The memory descriptor structure mm of the current process is about to change.
    // This occurs when switching between threads of different processes.
    // Note that when mm changes, the ASID changes as well (CR3[11:0]).
    // I'd rather not discuss when context switches occur because it's a whole different topic.
    // TLB shootdown only occurs for the current logical core.
    // The kernel sometimes can optimize away TLB flushes on a process-context switch.
    TLB_FLUSH_ON_TASK_SWITCH,

    // Another logical core has sent a request to the current logical core
    // to perform a TLB shootdown on its TLB caches.
    // This occurs due to a KVM hypercall. See TLB_REMOTE_SEND_IPI.
    TLB_REMOTE_SHOOTDOWN,

    // Occurs when one or more pages have been recently unmapped.
    // Affects only the local TLBs.
    TLB_LOCAL_SHOOTDOWN,

    // This occurs when making changes to the paging structures.
    // Affects only the local TLBs.
    TLB_LOCAL_MM_SHOOTDOWN,

    // Occurs when the current logical core uses a KVM hypercall to request
    // from other logical cores to perform TLB shootdowns on their respective TLBs.
    TLB_REMOTE_SEND_IPI,

    // This equals to the number of reasons. Currently not used.
    NR_TLB_FLUSH_REASONS,
};

There are other cases where the kernel flushes TLBs. It's hard to make a complete list and I don't think anyone has made a list like that.

The Linux kernel implements a lazy TLB flushing technique. The basic idea is that when paging structures of a process are modified, the kernel attempts to delay TLB shootdowns to the point when a thread from that process is about to be scheduled to execute in use-mode.

The Linux kernel currently uses one of the following four methods to flush the TLBs associated with the current logical core when required:

Write to CR3 the current value of CR3. While this does not change the value in CR3, it instructs the logical core to flush all non-global TLB entries that have the same PCID as the one in CR3.
Disable CR4.PGE, then write to CR4 the current value of CR4, and then reenable CR4.PGE. This has the effect of flushing all TLB entries for all PCIDs and global entries. This method is not used if INVPCID is supported.
Invalidate TLB entries for a given PCID and virtual address using the INVPCID instruction type 0.
Invalidate all TLB entries including globals and all PCIDs using the INVPCID instruction type 2.

Other types of INVPCID are currently not used.

Other than software-initiated invalidations of TLB entries, the Intel manual Volume 3 Section 4.10.2.2 for the P6 microarchitecture and most later microarchitectures:

Processors need not implement any TLBs. Processors that do implement TLBs may invalidate any TLB entry at any time. Software should not rely on the existence of TLBs or on the retention of TLB entries.

There is no such statement in the AMD manual as far as I know. But also no guarantees regarding TLB entires retention are given, so we can conclude the same statement for AMD processors.

Sunlight answered 10/5, 2018 at 1:57 Comment(3)

Both: /arch/x86/mm/tlb.c and /arch/x86/include/asm/tlbflush.h are x86-specific. I have no idea why you gave /arch/x86/mm/tlb.c as an example of "architecture-independent" code. – Decameter 29/4, 2020 at 8:36

Minor quibble: I would not say "[a TLB shootdown] can happen at any time, even if the operating system or hypervisor did not request it." I would call that a TLB invalidation or miss or perhaps a TLB fill that gets a value different from (a) TLB entries for the same virtual address in other TLBs or (b) the translation in the current TLB at some other time. // TLB shootdown is a SW construct or algorithm, only mentioned in HW manuals to show how SW can do it. At least up until you add TLB shootdown instructions (like ARMv8.4-A TLBI broadcast across coherence domains. – Religieuse 4/3, 2021 at 22:32

P6 added "SW should not rely on the existence of TLBs or on the retention of TLB entries" because earlier processors like P5 did guarantee retention, and a minimum TLB capacity/associativity (with no speculative TLB misses). Which allowed SW to do things like switch between virtual address spaces that had no virtual addresses in common (because retention allowed you to briefly use stale TLB entries), whereas since P6 SW doing this was encouraged to have at least one page, mapping the code executing CR3 change, identity mapped in old and new virtual address spaces. // Makes me a bit sad. – Religieuse 4/3, 2021 at 22:38

An Introduction to TLB Invalidation in the Linux Kernel

Recommended topics

Hot tags