Combining prefixes in SSE

Asked 8/3, 2010 at 20:9 Answered 4/12, 2019 at 13:59

Solved assembly x86 sse machine-code prefixes

In SSE the prefixes 066h (operand size override) 0F2H (REPNE) and 0F3h (REPE) are part of the opcode.

In non-SSE 066h switches between 32-bit (or 64-bit) and 16-bit operation. 0F2h and 0F3h are used for string operations. They can be combined so that 066h and 0F2h (or 0F3h) can be used in the same instruction, because this is meaningful. What is the behavior in an SSE instruction? For instance, we have (ignoring mod/rm for now):

0f 58      addps
66 0f 58   addpd
f2 0f 58   addsd
f3 0f 58   addss

But what is this?

66 f2 0f 58

And how about?

f2 66 0f 58

Not to mention the following which has two conflicting REP prefixes:

f2 f3 0f 58

What is the spec for these?

Congou answered 8/3, 2010 at 20:9 Comment(1)

Isn't f3 0f 58 ADDSS rather than ADDPD? – Severini 9/3, 2010 at 11:14

I do not remember having seen any specification on what you should expect in the case of wildly combining random prefixes, so I guess CPU behaviour may be "undefined" and possibly CPU-specific. (Clearly, some things are specified in e.g. Intel's docs, but many cases aren't covered). And some combinations may be reserved for future use.

My naive assumptions would generally have been that additional prefixes would be no-ops but there's no guarantee. That seems reasonable given that e.g. some optimising manuals recommend multi-byte NOP (canonically 90h) by prefixing with 66h, e.g.:

db 66h, 90h; 2-byte NOP
db 66h, 66h, 90h; 3-byte NOP
db 66h, 66h, 66h, 90h; 4-byte NOP

However, I also know that CS and DS segment override prefixes have aquired novel functions as SSE2 branch hint prefixes (predict branch taken = 3Eh = DS override; predict branch not taken = 2Eh = CS override) when applied to conditional jump instructions.

Anyway, I looked at your examples above, always setting XMM1 to all 0 and XMM7 to all 0FFh by

pxor xmm1, xmm1    ; xmm1 <- 0s
pcmpeqw xmm7, xmm7 ; xmm7 <- FFs

and then the code in question, with xmm1, xmm7 arguments. What I observed (32bit code on Win64 system and Intel T7300 Core 2 Duo) was:

1) no change observed for addsd by adding 66h prefix

db 66h 
addsd xmm1, xmm7 ;total sequence = 66 F2 0F 58 CF

2) no change observed for addss by adding 0F2h prefix

db 0f2h     
addss xmm1,xmm7 ;total sequence = F2 F3 0F 58 CF

3) However, I observed a change by prefixing addpd by 0F2h:

db 0f2h    
addpd xmm1, xmm7 ;total sequence = F2 66 0F 58 CF

In this case, the result in XMM1 was 0000000000000000FFFFFFFFFFFFFFFFh instead of FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFh.

So my conclusion is that one shouldn't make any assumptions and expect "undefined" behaviour. I wouldn't be surprised, however, if you could find some clues in Agner fog's manuals.

Severini answered 9/3, 2010 at 11:46 Comment(2)

In the last case, apparently, the 0F2h took precedence over the 066h, and converted the instruction into addsd, which is why only one chunk was written. – Congou 11/3, 2010 at 10:52

That'd be one hypothesis, yes. However, I think it'd a bad idea if anybody wanted to rely on such behaviour. – Severini 11/3, 2010 at 11:59

Intel's SDM vol.2 manual (the instruction set reference) refers to these as mandatory prefixes. Think of them as part of the opcode.

But yes, they are prefixes and can be mixed with other prefixes ahead of the actual escape-byte+opcode. In fact a REX prefix must go after other prefixes.

As usual, using multiple conflicting prefixes from the same group happens to decode with the last one taking priority on current Intel hardware. I think Intel manuals say that doing this can give unpredictable behaviour so it's not guaranteed or future proof. It's not a meaningful thing to do; if you want to pad an instruction to make it longer for alignment reasons, I think repeating the same prefix a couple times is safe.

B.8 SSE INSTRUCTION FORMATS AND ENCODINGS

The SSE instructions use the ModR/M format and are preceded by the 0FH prefix byte. In general, operations are not duplicated to provide two directions (that is, separate load and store variants).

The following three tables (Tables B-22, B-23, and B-24) show the formats and encodings for the SSE SIMD floating-point, SIMD integer, and cacheability and memory ordering instructions, respectively. Some SSE instructions require a mandatory prefix (66H, F2H, F3H) as part of the two-byte opcode. Mandatory prefixes are included in the tables.

And also

2.1.2 Opcodes

A primary opcode can be 1, 2, or 3 bytes in length. An additional 3-bit opcode field is sometimes encoded in the ModR/M byte. Smaller fields can be defined within the primary opcode. Such fields define the direction of operation, size of displacements, register encoding, condition codes, or sign extension. Encoding fields used by an opcode vary depending on the class of operation.

Two-byte opcode formats for general-purpose and SIMD instructions consist of one of the following:

An escape opcode byte 0FH as the primary opcode and a second opcode byte.

A mandatory prefix (66H, F2H, or F3H), an escape opcode byte, and a second opcode byte (same as previous bullet).

For example, CVTDQ2PD consists of the following sequence: F3 0F E6. The first byte is a mandatory prefix (it is not considered as a repeat prefix). Three-byte opcode formats for general-purpose and SIMD instructions consist of one of the following:

An escape opcode byte 0FH as the primary opcode, plus two additional opcode bytes.

A mandatory prefix (66H, F2H, or F3H), an escape opcode byte, plus two additional opcode bytes (same as previous bullet).

For example, PHADDW for XMM registers consists of the following sequence: 66 0F 38 01. The first byte is the mandatory prefix.

Draftee answered 4/12, 2019 at 13:59 Comment(0)

Recommended topics

Hot tags