How do I enable SSE for my freestanding bootable code?
Asked Answered
A

2

9

(This question was originally about the CVTSI2SD instruction and the fact that I thought it didn't work on the Pentium M CPU, but in fact it's because I'm using a custom OS and I need to manually enable SSE.)

I have a Pentium M CPU and a custom OS which so far used no SSE instructions, but I now need to use them.

Trying to execute any SSE instruction results in an interruption 6, illegal opcode (which in Linux would cause a SIGILL, but this isn't Linux), also referred to in the Intel architectures software developer's manual (which I refer from now on as IASDM) as #UD - Invalid Opcode (UnDefined Opcode).

Edit: Peter Cordes actually identified the right cause, and pointed me to the solution, which I resume below:

If you're running an ancient OS that doesn't support saving XMM regs on context switches, the SSE-enabling bit in one of the machine control registers won't be set.

Indeed, the IASDM mentions this:

If an operating system did not provide adequate system level support for SSE, executing an SSE or SSE2 instructions can also generate #UD.

Peter Cordes pointed me to the SSE OSDev wiki, which describes how to enable SSE by writing to both CR0 and CR4 control registers:

clear the CR0.EM bit (bit 2) [ CR0 &= ~(1 << 2) ]
set the CR0.MP bit (bit 1) [ CR0 |= (1 << 1) ]
set the CR4.OSFXSR bit (bit 9) [ CR4 |= (1 << 9) ]
set the CR4.OSXMMEXCPT bit (bit 10) [ CR4 |= (1 << 10) ]

Note that, in order to be able to write to these registers, if you are in protected mode, then you need to be in privilege level 0. The answer to this question explains how to test it: if in protected mode, that is, when bit 0 (PE) in CR0 is set to 1, then you can test bits 0 and 1 from the CS selector, which should be both 0.

Finally, the custom OS must properly handle XMM registers during context switches, by saving and restoring them when necessary.

Aurelia answered 22/7, 2015 at 12:24 Comment(14)
CVTSI2SD—Convert Dword Integer to Scalar Double-Precision FP Value belongs to the SSE2 instruction set, and this is confirmed in the Intel Software Developer Manuals.Transgress
I have a program which uses it and it crashes on a real Pentium M. Also, its Intel user manual (of which I have a paper copy) does not include that instruction.Aurelia
What is the cause of the crash - SIGILL ("illegal instruction") or something else ?Galloromance
Can you please run the application under GDB, and give us the error and the output of (gdb) disas /r at the crash site?Transgress
Are you sure you don't actually have a Pentium III-M?Transgress
I got an interruption 6, which from the Intel user manual means "invalid opcode (undefined opcode)".Aurelia
Can you post the value of eax after executing mov eax, 1 / cpuid?Trousseau
cpuid returns 0xA7E9FBBF, that is 0010 0111 1110 1001 1111 1011 1011 1111 in binary.Aurelia
Would it be possible that SSE instructions could be disabled/forbidden during runtime? I found no references to that, but I get interruption 6 when I run something newer than MMX instructions.Aurelia
@IwillnotexistIdonotexist unfortunately my setup is a bit complex, I compile on one machine and run it via a custom kernel on another, so I cannot easily run GDB (although it should be possible), but I'm trying to slowly obtain information about it.Aurelia
@anol: Ahhh, that's probably it. If you're running an ancient OS that doesn't support saving XMM regs on context switches, the SSE-enabling bit in one of the machine control registers won't be set. In that case all instructions that touch xmm regs will fault with undefined instruction.Lockwood
Wow, it that possible? How can I obtain more information about that? I tried searching for it but every website mentioned people who actually wanted their compiler to avoid emitting SSE code, not hardware deactivation of SSE. So I thought it was not possible.Aurelia
I updated my answer with a link. Yeah, it's a thing. It got more discussion in really old docs from when SSE was brand new. Introducing new architectural state that must be saved on context switches was a Big Deal. Presumably there are similar bits for 256b ymm regs, because an OS that only saves/restores the low 128 would be a big problem.Lockwood
@Aurelia it's reversed actually, you don't disable SSE in hardware, you enable it (or not, as happened here)Mixed
L
8

If you're running an ancient or custom OS that doesn't support saving XMM regs on context switches, it won't have set the SSE-enabling bits in the machine control registers. In that case all instructions that touch xmm regs will fault.

http://wiki.osdev.org/SSE explains how to alter CR0 and CR4 to allow SSE instructions to run on bare metal without #UD.

Note that VEX prefixes won't decode in real-mode, so you can't enable AVX there even if your CPU supports it. You have to be in protected or long mode if you want AVX on CPUs that support it.


My first thought on your old version of the question was that you might have compiled your program with -mavx, -march=sandybridge or equivalent, causing the compiler to emit the VEX-encoded version of everything.

CVTSI2SD   xmm1, xmm2/m32         ; SSE2
VCVTSI2SD  xmm1, xmm2, xmm3/m32   ; AVX

See https://stackoverflow.com/tags/x86/info for links, including to Intel's insn set ref manual.

Most real-world kernels are built with options that stop the compiler from using SSE or x87 instructions on its own, for example gcc -mgeneral-regs-only. Or in older GCC, -mno-sse -mno-mmx and avoid any use of float or double types to avoid x87. This is so kernels only have to save/restore integer registers on interrupts and system calls, only doing the SIMD/FP state on a full context switch to a different user-space task. Before that option existed and was used, Linux kernel code that used double could silently corrupt user-space state!

If you have a freestanding program that isn't trying to context-switch between user-space tasks, go ahead and let the compiler use SSE / AVX.


Related: Which versions of Windows support/require which CPU multimedia extensions? (How to check if SSE or AVX are fully usable?) has some details about how to check for support for AVX and AVX512 (which also introduce new architectural state, so the OS has to set a bit or the HW will fault). It's coming at it from the other angle, but the links should indicate how to activate / disable AVX support.

Lockwood answered 22/7, 2015 at 12:34 Comment(10)
I'm using an old GCC and it does not seem to have that architecture. And I looked at the assembly and also tried inserting the instruction directly via asm(), so there's little chance of that being the case.Aurelia
It's -mavx, oops. Look at the disassembly. If it's vcvt..., and your program dies with SIGILL, then it's an AVX problem. Otherwise, you're probably getting a SIGSEGV, not SIGILL. If it's SIGILL, then there's something weird going on, and you should run it under gdb, so it stops at the exact instruction that faulted.Lockwood
I tried compiling in some assembly code containing specifically CVTSI2SD, to ensure it is the culprit, and I got interruption 6. But it also happened with CVTSI2SS, so it's actually an SSE-related issue. Instructions containing references to %xmmN registers do not work.Aurelia
There's an answer on stackoverflow.com/questions/6121792/… about detecting OS support. But IIRC, Intel's suggested way to detect OS support is to try running an SSE instruction and see if it faults with #UD. Yeah, pretty terrible, esp. for a library or something, because that forces the calling program to handle SIGILL! IIRC, the machine control registers that the OS has to set aren't even readable by unprivileged code.Lockwood
I'm trying to enable it as described in the link, but so far no good. The assembly code seems to write the right values to CR0/CR4, but trying something as movss afterwards still fails.Aurelia
I don't know any more than that, sorry. Better look at the full Intel manuals, www-ssl.intel.com/content/www/us/en/processors/…. You are writing to CR0/CR4 in priv level 0, right? mov cr* faults (with #GP(0)) if priv level isn't 0, according to Intel's insn set ref manual. IDK if there are any other requirements, like running in protected mode, for using SSE. Or maybe edit this question into: "How do I enable SSE for my freestanding bootable code?"Lockwood
@IwillnotexistIdonotexist: He never said his custom kernel was based on Linux. He took my suggestion for a question title, which really doesn't suggest Linux. Anol: you should prob. not call it SIGILL if you aren't actually running a Unix kernel that uses those signals. The way you were talking about interrupts, with different code numbers from Linux signals, was my clue you weren't talking about Linux user-space at all.Lockwood
@anol: I made an edit like what I was suggesting. Hopefully that will attract some expert help from OS types. You might want to mention what state you have the CPU in when you try to use the mov cr instructions. (real mode? 32bit protected mode?)Lockwood
@PeterCordes That clears things up. My clues that this was Linux were precisely SIGILL and talk of a custom kernel (which I interpreted as Linux with a personalized kernel).Transgress
@PeterCordes Could you please complement your solution with the details I added up? Just for completion's sake. Also, maybe move the CVTSS2SD part down, since it is not relevant for the actual problem I had but could be useful for other people.Aurelia
C
2

I suggest that you consult Intel's manual when you have such questions.

It's clearly stated in the manual that CVTSI2SD is an SSE2 instruction.

Continental answered 22/7, 2015 at 12:28 Comment(2)
What does CPUID report for supported instructions ? less /proc/cpuinfo | grep flags if you're on Linux.Galloromance
@Aurelia Within the manual, the section for you to read is Volume 3, Section 13.1 PROVIDING OPERATING SYSTEM SUPPORT FOR SSE EXTENSIONS. This section has a very clear description and list of what must be done for SSE support. Michael, you should definitely add a reference to this section in your answer.Transgress

© 2022 - 2024 — McMap. All rights reserved.