Is pipelining/OoOE available on modern x86 processors when running in real mode?
Asked Answered
F

1

3

When running a boot-loader program on a modern-day x86 processor, the processor will be running in real-address mode. Will its instruction pipelining features be active in real mode, or not?

Forta answered 15/6, 2016 at 7:41 Comment(1)
I asking about the whether pipelining is available or not in the real-mode; when booting a systemForta
U
4

Yes, the out-of-order core in modern microarchitectures operates basically the same regardless of mode. Most of the difference is in the decoders. See Agner Fog's microarch pdf and other links in the tag wiki for details of how modern CPUs actually do work internally.

It would probably take extra silicon to behave differently in 16bit mode, since it's very similar to 32bit mode with paging disabled, but with a different default address-size and operand-size.

I've read that AMD CPUs are slightly slower when segments have a non-zero base. (Or I guess in 16bit mode: when segment registers themselves are set to non-zero values, since in 16bit mode they're used directly, rather than being selectors for descriptors.)


Keep in mind that many common 16bit idioms like loop are terrible.

Also, partial-register slowdowns can easily interfere with out-of-order execution if you aren't careful. Intel P6-family and SnB-family CPUs rename partial registers separately, so writing to AX doesn't have a false dependency on the full contents of EAX/RAX. There can be stalls when merging later on CPUs before SnB, or just minor slowdowns on SnB before Haswell.

All other microarchitectures treat mov ax, 5 as a read-modify-write of eax, so it doesn't break the dependency chain on the old value of ax. This can be a huge problem for out-of-order execution if you aren't careful.

Read Agner Fog's manuals to learn more.

16bit addressing modes might not perform well, I forget. 32bit code doesn't need them to be fast, and 64bit code can't use 16bit addresses at all. (The address-size prefix in 64bit code means address-size = 32bits.)


VEX-coded instructions (including all AVX and some BMI1 and BMI2 integer instructions like blsr and pext) aren't available in real or VM86 mode. This Intel forum topic (dead link1) suggested that may be due to existing software (NTVDM) using the machine code as a trap to protected mode. (i.e. the same illegal operands to LDS/LES that VEX uses). Making VEX-coded instructions still generate #UD is thus important for backwards compatibility. Michael Petch commented:

0xc4 0xc4 0x60 (Vm version number) and 0xc4, 0xc4, 0x58 were in pretty common use in 16-bit code in the mid 90s even before NTVDM. They were commonly used by those of us trying to determine if we were running in SoftPC. Back then they were sparsely documented as BOP codes. Microsoft semi-documented them with the NT Device driver kit back in the mid 90s. This wasn't unsurprising because NTVDM was based on SoftPC. I pulled out the old NT DDK CD and they can be found in the file ISVBOP.h

SSE is still available in real mode, though, if you enable it with the right CR setting.

(VEX/EVEX are available in 16-bit protected mode, but not real or virtual-8086 mode. Is x86 32-bit assembly code valid x86 64-bit assembly code?)


Footnote 1: dead link, not archived in wayback machine. Intel may have just reorganized their forum URLs, but I didn't go looking.

Ubald answered 15/6, 2016 at 9:26 Comment(3)
I always learn so darn much from reading your answers, even when I already know the answer to the question! :) For example, I was not aware that VEX-encoded instructions weren't available in real mode.Butadiene
@CodyGray: SO would be a pretty boring place if answers didn't go outside the lines into related information :D. I'd been meaning to get to the bottom of why some instructions weren't available in real mode for a while now. The instruction set manual documents that fact for some BMI/BMI2 integer instructions, but not for AVX2 instructions like VPBLENDD. I wasn't sure until finally searching while writing this answer. I wondered if it had something to do with not being available in 16bit operand size, and ignoring operand-size prefixes vs. requiring one would take transistors.Ubald
0xc4 0xc4 0x60 (Vm version number) and 0xc4, 0xc4, 0x58 were in pretty common use in 16-bit code in the mid 90s even before NTVDM. They were commonly used by those of us trying to determine if we were running in SoftPC. Back then they were sparsely documented as BOP codes. Microsoft semi-documented them with the NT Device driver kit back in the mid 90s. This wasn't unsurprising because NTVDM was based on SoftPC. I pulled out the old NT DDK CD and they can be found in the file ISVBOP.h .Lambency

© 2022 - 2024 — McMap. All rights reserved.