I am curious about the vastly different performance characteristics of running x86-64 binaries on the Apple M1 platform using Rosetta 2 vs. emulation, for example what Docker Desktop currently does using QEMU.
I understand why emulation is so slow, but an explanation for why Rosetta 2 is so fast has been detailed in this Twitter thread: https://twitter.com/ErrataRob/status/1331735383193903104
The gist of that explanation is that under usual circumstances, arm and x86 have opposite (and incompatible) memory addressing schemes which require significant emulation overhead, but the M1 chip addresses this with a hardware optimization that allows it to access memory using both addressing schemes. Effectively, when Rosetta 2-emulated instructions are being run, a flag is set to let the processor know to use the x86-style addressing scheme.
Assuming this explanation is reasonable (and if anyone has better-sourced reporting than the above Twitter thread I would appreciate it in the comments for inclusion), is it technically plausible that this optimization could be leveraged for full hardware emulation, for example running x86-64 Linux Docker containers, or running a full x86-64 Windows desktop virtual machine a la VMware Fusion/VirtualBox? Or, does the separate operating system layer in those scenarios preclude being able to leverage the memory ordering optimization?
Separately, is this processor mode (flags or instructions) documented and published for 3rd-party use, or is it private to Apple only?
Edit, as of 2023-01-31 for future reference:
Apple released support for Rosetta 2 Linux VMs using the Virtualization framework, perhaps sometime around June 2022:
Also, Docker Desktop for Mac released a beta feature for x86 containers using this feature in version 4.16.0 on 2023-01-12.