Can someone explain this directly assembled x86 JMP opcode?
Asked Answered
W

3

8

At school we have been using a bootstrap program to run stand-alone programs without an operating system. I have been studying this program and when protected mode is enabled there is a far jump executed by directly assembling the opcode and operands as data within the program. This was for the GNU assembler:


         /* this code immediately follows the setting of the PE flag in CR0 */

.byte   0x66, 0xEA
.long   TARGET_ADDRESS
.word   0x0010          /* descriptor #2, GDT, RPL=0 */

First of all, why would one want to do this (instead of the instruction mnemonic)?

I have been looking at the Intel manuals, but am still a little confused by the code. Specifically in Volume 2A, page 3-549, there is a table of opcodes. The relevant entry:

EA *cp* JMP ptr16:32  Inv.  Valid  Jump far, absolute, address given in
operand

The actual opcode is obvious, but the the first byte, 0x66, has me confused. Referring to the table in the Intel manual, the cp apparently means that a 6 byte operand will follow. And obviously 6 bytes follow in the next two lines. 0x66 encodes an 'Operand-size override prefix'. What does this have to do with the cp in the table? I was expecting there to be some hex value for the cp, but instead there is this override prefix. Can someone please clear this up for me?

Here is a dump from od:

c022    **ea66    0000    0001    0010**    ba52    03f2    c030

TARGET_ADDRESS was defined as 0x00010000.

I am also confused a bit by the significance of the last two bytes. However, that seems to be another question altogether. It is getting quite late, and I have been staring at code and the Intel manuals for hours, so I hope I got my point across.

Thanks for looking!

Ways answered 13/2, 2009 at 7:49 Comment(2)
People use opcodes (instead of instructions) for 2 reasons. The first reason is when the assembler is "less than adequate" and doesn't provide support for the instruction they need (this is/was common when new instructions are added and older assemblers don't support them yet). The second reason is when the assembler does support the instruction they need but the programmer doesn't know how to convince the assembler to generate it. Basically, it's either bad tools (including old tools, confusing syntax and/or bad documentation) or bad programmers.Inequity
Note: My comment above is "in general" and applies to all assemblers. I don't use GAS, and have no idea if it supports the "32-bit far jump in 16-bit code" instruction or not (or how good/bad the documentation is).Inequity
I
13

The 0x66 indicates that the JMP (0xEA) refers to six bytes. The default is refering to 64K (16 bits) in real mode or to 32 bits in protected mode (if I recall well). Having it increased, it also includes the segment descriptor, the index of the segment either in the GDT or the LDT, which means, that this code is making what is traditionally called a "long jump": a jump that cross beyond segments in the x86 architecture. The segment, in this case, points to the second entry on the GDT. If you look before in that program, you'll likely see how the GDT is defined in terms of the segment starting address and length (look in the Intel manual to study the GDT and LDT tables, 32 bit entry describing each segment).

Inexpugnable answered 13/2, 2009 at 7:59 Comment(3)
Ah, this makes sense now. Earlier, when the GDT is defined the first entry is null (like the manual says), but the second is the code segment. After re-reading some parts of the manual I am seeing how this works. Thanks for clearing this up.Ways
Then again, I am still curious why the author chose to do this instead of using the mnemonics.Ways
It's the operand-size prefix, but it changes it from jmp ptr16:16 into jmp ptr16:32. This answer claims the no-prefix version would be jmp rel16 or jmp rel32, but that's a different opcode, E9 not EA. EA is always a far-jmp with immediate offset and segment.Shadowy
R
2

I run into this a bit. Some assemblers will only jump to a LABEL . In this case the person wants to make an absolute jump to a specific hard coded offset. jmp TARGET_ADDRESS won't work I am guessing, so they just put it as bytes to get around this issue.

Rete answered 14/5, 2009 at 16:58 Comment(0)
E
0

0x66 specifies operand size override of the current code segment size. Assuming that current code size is 16-bit, the new instruction pointer will be 32-bit, not 16-bit. If current code segment size is 32-bit, the 0x66 will render target instruction pointer as 16-bit. The current code size attribute depends on CS selector in use and its attributes loaded from GDT/LDT table. In real mode the code segment size is usually 16-bit except special cases of "unreal" mode.

Extol answered 19/8, 2014 at 8:58 Comment(1)
Unreal mode is real mode with cached descriptors still set with limit > 64k from switching to protected mode and back. A 32-bit CS definitely means protected mode, but if paging is disabled it still uses physical addresses directly.Shadowy

© 2022 - 2024 — McMap. All rights reserved.