Get size of assembly instructions
Asked Answered
F

4

9

I need to read instructions one-by-one from a small code segment in memory and I have to find out the size of the instructions which I have in memory.

The following is just a example of raw disassembled code to explain my problem:

 (gdb) disas /r 0x400281,+8
 Dump of assembler code from 0x400281 to 0x400289:
    0x0000000000400281:  48 89 c7       movq   %rax, %rdi
    0x0000000000400284:  b0 00          movb   $0, %al
    0x0000000000400286:  e8 f2 48 00 00 callq  0x10001f30a
 End of assembler dump.

I know the memory address of the first instruction (p = 0x0000000000400281 in this case) and I can read every memory address from p. The problem is that I cannot know if the value of *(p + offset) is the opcode or not and I know that the size information for every opcode is not fixed.

So, can I get the size of every assembly instruction? Or can I know if the value that I read is opcode or information?

Flambeau answered 21/5, 2014 at 15:55 Comment(8)
You need a disassembler library.Myogenic
Every disassembler has this knowledge (among much much more), but you don't want to write a x86 disassembler yourself and I can't recommend a library that does it (and even if I could, it would be off topic).Harrar
There is no general solution to your problem. You will need to implement an instruction decoder that understands x86_64 instructions. I suggest looking at the LLVM project for library support.Hasson
Sounds like an XY problem. Why are you parsing the instructions? What is the overall task you're trying to solve?Piatt
@IgorSkochinsky Yes, it could be a XY problem. I need to know which instructions are call instructions and I want to change their offsets.Flambeau
That's still not an explanation.Piatt
how did you end up solving the problem?Aluminum
@IgorSkochinsky: Sure it is. And if that's his problem, he likely has to parse instructions to find the CALLs he wants to patch, unless he has magic access to knowledge about where the CALL instructions are. See my answer (it will even tell him which instructions are CALLs).Buyers
B
16

@AlexisWilke's response is right: this is messy. He provides the right insights and references to do the work, too.

I have done this work in C. The code follows; this is used in production contexts.

Caveats: It does a good part of the traditional x86 instruction set, but not all, in particular none of the instructions involving the vector register sets. And it contains decoding for a few "virtual" instructions that we happen to use in our code. I don't think extending this to x86-64 would be difficult, but it would get messier. Lastly, this is lifted directly, but I don't make any guarantees this will compile out-of-the box.

/* (C) Copyright 2012-2014 Semantic Designs, Inc.
   You may freely use this code provided you retain this copyright message
*/

typedef unsigned int natural;

natural InstructionLength(BYTE* pc)
{ // returns length of instruction at PC
   natural length=0;
   natural opcode, opcode2;
   natural modrm;
   natural sib;
   BYTE* p=pc;

   while (true)
    {  // scan across prefix bytes
       opcode=*p++;
       switch (opcode)
       {  case 0x64: case 0x65: // FS: GS: prefixes
      case 0x36: // SS: prefix
      case 0x66: case 0x67: // operand size overrides
      case 0xF0: case 0xF2: // LOCK, REPNE prefixes
          length++;
              break;
          case 0x2E: // CS: prefix, used as HNT prefix on jumps
          case 0x3E: // DS: prefix, used as HT prefix on jumps
              length++;
              // goto process relative jmp // tighter check possible here
              break;
           default: 
              goto process_instruction_body;
       } 
    }

process_instruction_body:
switch(opcode) // switch on main opcode
{
       // ONE BYTE OPCODE, move to next opcode without remark
       case 0x27: case 0x2F:
       case 0x37: case 0x3F:
       case 0x40: case 0x41: case 0x42: case 0x43: case 0x44: case 0x45: case 0x46: case 0x47:
       case 0x48: case 0x49: case 0x4A: case 0x4B: case 0x4C: case 0x4D: case 0x4E: case 0x4F:
       case 0x50: case 0x51: case 0x52: case 0x53: case 0x54: case 0x55: case 0x56: case 0x57:
   case 0x58: case 0x59: case 0x5A: case 0x5B: case 0x5C: case 0x5D: case 0x5E: case 0x5F:
       case 0x90: // nop
       case 0x91: case 0x92: case 0x93: case 0x94: case 0x95: case 0x96: case 0x97: // xchg
   case 0x98: case 0x99:
       case 0x9C: case 0x9D: case 0x9E: case 0x9F:
       case 0xA4: case 0xA5: case 0xA6: case 0xA7: case 0xAA: case 0xAB: // string operators
       case 0xAC: case 0xAD: case 0xAE: case 0xAF:
   /* case 0xC3: // RET handled elsewhere */ 
       case 0xC9:
       case 0xCC: // int3
       case 0xF5: case 0xF8: case 0xF9: case 0xFC: case 0xFD: 
          return length+1; // include opcode

       case 0xC3: // RET
           if (*p++ != 0xCC)
              return length+1;
           if (*p++ != 0xCC)
              return length+2;
           if (*p++ == 0xCC
               && *p++ == 0xCC)
            return length+5;
        goto error;

    // TWO BYTE INSTRUCTION
    case 0x04: case 0x0C: case 0x14: case 0x1C: case 0x24: case 0x2C: case 0x34: case 0x3C:
    case 0x6A:
    case 0xB0: case 0xB1: case 0xB2: case 0xB3: case 0xB4: case 0xB5: case 0xB6: case 0xB7:
        case 0xC2:
           return length+2;

    // TWO BYTE RELATIVE BRANCH
       case 0x70: case 0x71: case 0x72: case 0x73: case 0x74: case 0x75: case 0x76: case 0x77:
       case 0x78: case 0x79: case 0x7A: case 0x7B: case 0x7C: case 0x7D: case 0x7E: case 0x7F:
       case 0xE0: case 0xE1: case 0xE2: case 0xE3: case 0xEB:
           return length+2;

       // THREE BYTE INSTRUCTION (NONE!)

   // FIVE BYTE INSTRUCTION:
       case 0x05: case 0x0D: case 0x15: case 0x1D: 
       case 0x25: case 0x2D: case 0x35: case 0x3D:
       case 0x68:
       case 0xA9:
       case 0xB8: case 0xB9: case 0xBA: case 0xBB: case 0xBC: case 0xBD: case 0xBE: case 0xBF:
        return length+5;

   // FIVE BYTE RELATIVE CALL
   case 0xE8:
         return length+5;

   // FIVE BYTE RELATIVE BRANCH
   case 0xE9:
         if (p[4]==0xCC)
                return length+6; // <jmp near ptr ...  int 3>
         return length+5; // plain <jmp near ptr>

       // FIVE BYTE DIRECT ADDRESS
       case 0xA1: case 0xA2: case 0xA3: // MOV AL,AX,EAX moffset...
         return length+5;
         break;

      // ModR/M with no immediate operand
      case 0x00: case 0x01: case 0x02: case 0x03: case 0x08: case 0x09: case 0x0A: case 0x0B:
      case 0x10: case 0x11: case 0x12: case 0x13: case 0x18: case 0x19: case 0x1A: case 0x1B:
      case 0x20: case 0x21: case 0x22: case 0x23: case 0x28: case 0x29: case 0x2A: case 0x2B:
      case 0x30: case 0x31: case 0x32: case 0x33: case 0x38: case 0x39: case 0x3A: case 0x3B:
      case 0x84: case 0x85: case 0x86: case 0x87: case 0x88: case 0x89: case 0x8A: case 0x8B: case 0x8D: case 0x8F:
      case 0xD1: case 0xD2: case 0xD3:
      case 0xFE: case 0xFF: // misinterprets JMP far and CALL far, not worth fixing
        length++; // count opcode
            goto modrm;

      // ModR/M with immediate 8 bit value
      case 0x80: case 0x82: case 0x83:
      case 0xC0: case 0xC1: 
      case 0xC6:  // with r=0?
          length+=2; // count opcode and immediate byte
            goto modrm;

      // ModR/M with immediate 32 bit value
      case 0x81: 
      case 0xC7:  // with r=0?
        length+=5; // count opcode and immediate byte
            goto modrm;

      case 0x9B: // FSTSW AX = 9B DF E0
           if (*p++==0xDF)
              { if (*p++==0xE0)
               return length+3;
            printf("InstructionLength: Unimplemented 0x9B tertiary opcode %2x at %x\n",*p,p);
                goto error;
          }
           else { printf("InstructionLength: Unimplemented 0x9B secondary opcode %2x at %x\n",*p,p);
                  goto error;
            }

      case 0xD9: // various FP instructions
           modrm=*p++;
           length++; //  account for FP prefix
           switch (modrm)
           {  case 0xC9: case 0xD0: 
          case 0xE0: case 0xE1: case 0xE4: case 0xE5: 
              case 0xE8: case 0xE9: case 0xEA: case 0xEB: case 0xEC: case 0xED: case 0xEE:
              case 0xF8: case 0xF9: case 0xFA: case 0xFB: case 0xFC: case 0xFD: case 0xFE: case 0xFF:
                  return length+1;
          default:  // r bits matter if not one of the above specific opcodes
                  switch((modrm&0x38)>>3)
                  {  case 0: goto modrm_fetched;  // fld
                 case 1: return length+1; // fxch
                 case 2: goto modrm_fetched; // fst
                 case 3: goto modrm_fetched; // fstp
                 case 4: goto modrm_fetched; // fldenv
                 case 5: goto modrm_fetched; // fldcw
                 case 6: goto modrm_fetched; // fnstenv
                 case 7: goto modrm_fetched; // fnstcw
                  }
                  goto error; // unrecognized 2nd byte
           }

      case 0xDB: // various FP instructions
           modrm=*p++;
           length++; //  account for FP prefix
           switch (modrm)
           {  case 0xE3: 
                  return length+1;
          default:  // r bits matter if not one of the above specific opcodes
#if 0
                  switch((modrm&0x38)>>3)
                  {  case 0: goto modrm_fetched;  // fld
                 case 1: return length+1; // fxch
                 case 2: goto modrm_fetched; // fst
                 case 3: goto modrm_fetched; // fstp
                 case 4: goto modrm_fetched; // fldenv
                 case 5: goto modrm_fetched; // fldcw
                 case 6: goto modrm_fetched; // fnstenv
                 case 7: goto modrm_fetched; // fnstcw
                  }
#endif
                  goto error; // unrecognized 2nd byte
           }

      case 0xDD: // various FP instructions
           modrm=*p++;
           length++; //  account for FP prefix
           switch (modrm)
           {  case 0xE1: case 0xE9: 
              return length+1;
          default:  // r bits matter if not one of the above specific opcodes
                  switch((modrm&0x38)>>3)
                  {  case 0: goto modrm_fetched;  // fld
                 // case 1: return length+1; // fisttp
                 case 2: goto modrm_fetched; // fst
                 case 3: goto modrm_fetched; // fstp
                 case 4: return length+1; // frstor
                 case 5: return length+1; // fucomp
                 case 6: goto modrm_fetched; // fnsav
                 case 7: goto modrm_fetched; // fnstsw
                  }
                  goto error; // unrecognized 2nd byte
           }

      case 0xF3: // funny prefix REPE
           opcode2=*p++;  // get second opcode byte
           switch (opcode2)
       {  case 0x90: // == PAUSE
          case 0xA4: case 0xA5: case 0xA6: case 0xA7: case 0xAA: case 0xAB: // string operators
             return length+2;
              case 0xC3: // (REP) RET
                 if (*p++ != 0xCC)
                    return length+2; // only (REP) RET
                 if (*p++ != 0xCC)
                    goto error;
                 if (*p++ == 0xCC)
                    return length+5; // (REP) RET CLONE IS LONG JUMP RELATIVE
                 goto error;
              case 0x66: // operand size override (32->16 bits)
         if (*p++ == 0xA5) // "rep movsw"
                    return length+3;
                 goto error;
              default: goto error;
           }

      case 0xF6: // funny subblock of opcodes
            modrm=*p++;
            if ((modrm & 0x20) == 0)
               length++; // 8 bit immediate operand
            goto modrm_fetched; 

      case 0xF7: // funny subblock of opcodes
            modrm=*p++;
            if ((modrm & 0x30) == 0)
               length+=4; // 32 bit immediate operand
            goto modrm_fetched; 

      // Intel's special prefix opcode
      case 0x0F:
        length+=2; // add one for special prefix, and one for following opcode
            opcode2=*p++;
        switch(opcode2) 
        { case 0x31: // RDTSC
             return length;

          // CMOVxx
          case 0x40: case 0x41: case 0x42: case 0x43: case 0x44: case 0x45: case 0x46: case 0x47: 
              case 0x48: case 0x49: case 0x4A: case 0x4B: case 0x4C: case 0x4D: case 0x4E: case 0x4F:
              goto modrm;

              // JC relative 32 bits
              case 0x80: case 0x81: case 0x82: case 0x83: case 0x84: case 0x85: case 0x86: case 0x87: 
              case 0x88: case 0x89: case 0x8A: case 0x8B: case 0x8C: case 0x8D: case 0x8E: case 0x8F:
                  return length+4; // account for subopcode and displacement

          // SETxx rm32
              case 0x90: case 0x91: case 0x92: case 0x93: case 0x94: case 0x95: case 0x96: case 0x97: 
              case 0x98: case 0x99: case 0x9A: case 0x9B: case 0x9C: case 0x9D: case 0x9E: case 0x9F:
                  goto modrm;

              case 0xA2: // CPUID
                  return length+2;

              case 0xAE: // LFENCE, SFENCE, MFENCE
                  opcode2=*p++;
                  switch (opcode2)
                  { case 0xE8: // LFENCE
                case 0xF0: // MFENCE
                    case 0xF8: // SFENCE
                  return length+1;
                    default: 
                      printf("InstructionLength: Unimplemented 0x0F, 0xAE tertiary opcode in clone  %2x at %x\n",opcode2,p-1);
                  goto error;
                  }

              case 0xAF: // imul
              case 0xB0: // cmpxchg 8 bits
                  goto error;

              case 0xB1: // cmpxchg 32 bits
              case 0xB6: case 0xB7: // movzx
              case 0xBC: /* bsf */ case 0xBD: // bsr
              // case 0xBE: case 0xBF: // movsx 
              case 0xC1: // xadd
              case 0xC7: // cmpxchg8b
                  goto modrm;

              default:
                  printf("InstructionLength: Unimplemented 0x0F secondary opcode in clone %2x at %x\n",opcode,p-1);
                  goto error;
    } // switch

 // ALL THE THE REST OF THE INSTRUCTIONS; these are instructions that runtime system shouldn't ever use
     default: 
     /* case 0x26: case 0x36: // ES: SS: prefixes
        case 0x9A:
        case 0xC8: case 0xCA: case 0xCB: case 0xCD: case 0xCE: case 0xCF:
        case 0xD6: case 0xD7:
        case 0xE4: case 0xE5: case 0xE6: case 0xE7: case 0xEA: case 0xEB: case 0xEC: case 0xED: case 0xEF:
        case 0xF4: case 0xFA: case 0xFB:
         */
     printf("InstructionLength: Unexpected opcode %2x\n",opcode);
         goto error;
    }

modrm:
    modrm=*p++;
modrm_fetched:
    if (trace_clone_checking)
       printf("InstructionLength: ModR/M byte %x %2x\n",pc,modrm);
    if (modrm >= 0xC0)
       return length+1;  // account for modrm opcode
    else
    {  /* memory access */
        if ((modrm & 0x7) == 0x04)
    { /* instruction with SIB byte */
                length++; // account for SIB byte
                sib=*p++; // fetch the sib byte
                if ((sib & 0x7) == 0x05)
                   {  if ((modrm & 0xC0) == 0x40)
                     return length+1+1; // account for MOD + byte displacment
                  else return length+1+4; // account for MOD + dword displacement
                   }
            }
        switch(modrm & 0xC0)
        {  case 0x0:
          if ( (modrm & 0x07) == 0x05)
                  return length+5; // 4 byte displacement
              else return length+1; // zero length offset
           case 0x80:
              return length+5;  // 4 byte offset
          default:
      return length+2;  // one byte offset
        }
   }

error:
    {  printf("InstructionLength: unhandled opcode at %8x with opcode %2x\n",pc,opcode);
    }
    return 0; // can't actually execute this
}
Buyers answered 24/5, 2014 at 9:34 Comment(3)
Please add x86-64 support.Bern
If I added a bounty would it motivate you?Bern
Not enough in the short term. I probably have to extend this logic to 64 bits sometime in next 2 years, but I don't know when I have to do it. In the near term, there are lots of other things presently demanding my attention.Buyers
W
5

Decoding instructions is not that complicated. However, because the Intel family of processors are CISC, it makes the task rather daunting.

First of all, you should not write it in assembler, because it's going to take you a year or two, but maybe you have the time to do that. Since you only need to scan the code, not print out the results, you can do the work much faster than an actual disassembler would do. That being said you'll bump in the same main problems.

First of all, the manuals are there:

http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html?iid=tech_vt_tech+64-32_manuals

I suggest this one:

http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf

Then, all you have to do is read one byte and understand it. You have a table on page 770 that shows you the encoding from the op-code to the instruction.

So for example, 0x33 represents an XOR with Gb,Ev as parameters. G means General register defined in the following ModR/M. Then the b is the size (byte). The E means that you have a ModR/M after that one byte (same byte for G and E). So you will have to read that one byte to determine the addressing mode and from that you can determine the register (Can be ignored) and the address size. The address (Ev) may be another register (then no extra byte), it could be immediate data (1, 2, 4, 8 bytes) or it could be an address (again 1, 2, 4, 8 bytes). Pretty simple, right? Note that ALL instructions use the exact same ModR/M so you have to implement that just once. Also the order in which bytes are added after the instruction code is always exactly the same.

Before the address or immediate (if I'm correct) comes the extra Mod for 64 bit instructions. That one defines additional modes and support for the extended registers. All of that is described in detail in the document I mentioned earlier.

More or less, you need your parser to understand the ModR/M, SIB, prefixes, and voilà. It's not that complicated. Then the first byte tells you the instruction (first 2 bytes if the first byte is 0x0F...)

Some instructions also support prefixes to tweak the size of the operands and other similar things. As far as I know, only the 0x66 (op size) and 0x67 (addr size) have an effect on the size of the address and immediate data. The other prefixes will not affect the number of bytes used by the instruction so you can simply ignore them (well count them, but no need to know what follows).


All of that said, using the LLVM library (As someone mentioned in the comments) is probably a better/easier option, although it may be much bigger than what you'd need if your stuff is limited.

Whenever answered 24/5, 2014 at 7:58 Comment(4)
LLVM is way too big a sledgehammer for OP's request. You can do this in a few hundred lines of C. See my answer.Buyers
Yeah, I wrote my own disassembler too. For 6502 (way back) and then Intel, something like Pentium 4 or so. It's just that you have tons of instructions (if you want to handle them all, that is). Plus if you want to be correct, you'd need to know the processor because some codes work differently in different version (0x0F for example.)Whenever
@IraBaxter: Using a library that someone else maintains has the huge advantage that you don't have to keep updating it for new extensions like AVX and AVX512 which redefine existing illegal sequences into new prefixes. Or new SSE* extensions where new mandatory prefixes make new instructions. Besides LLVM, the GNU opcodes library is used by GDB and objdump, according to gnu.org/software/binutilsTuning
@PeterCordes: Agreed, as long as a) it does what you need in a straightforward way and b) they keep it up to date. Likely LLVM and GNU stay pretty up to date. They are still pretty big hammers (In my particular case, I didn't need to decode every instruction, just what my compiler could generate).Buyers
T
3

There is XED library from Intel to work with x86/x86_64 instructions: https://github.com/intelxed/xed, and it is the only correct way to work with intel machine codes both in x86 and x86_64 modes. It is used by Intel (and was part of their Pin): https://software.intel.com/en-us/articles/xed-x86-encoder-decoder-software-library

https://software.intel.com/sites/landingpage/pintool/docs/67254/Xed/html/main.html XED User Guide (2014) https://software.intel.com/sites/landingpage/pintool/docs/56759/Xed/html/main.html XED2 User Guide (2011)

xed_decode function will provide you all information about instruction: https://intelxed.github.io/ref-manual/group__DEC.html https://intelxed.github.io/ref-manual/group__DEC.html#ga9a27c2bb97caf98a6024567b261d0652

And xed_ild_decode will only decode instruction for its length: https://intelxed.github.io/ref-manual/group__DEC.html#ga4bef6152f61997a47c4e0fe4327a3254

XED_DLL_EXPORT xed_error_enum_t xed_ild_decode    (   xed_decoded_inst_t *    xedd,
const xed_uint8_t *   itext,
const unsigned int    bytes 
)     

This function just does instruction length decoding. It does not return a fully decoded instruction.

Parameters

  • xedd the decoded instruction of type xed_decoded_inst_t . Mode/state sent in via xedd; See the xed_state_t .
  • itext the pointer to the array of instruction text bytes
  • bytes the length of the itext input array. 1 to 15 bytes, anything more is ignored.

Returns:

xed_error_enum_t indiciating success (XED_ERROR_NONE) or failure. Only two failure codes are valid for this function: XED_ERROR_BUFFER_TOO_SHORT and XED_ERROR_GENERAL_ERROR. In general this function cannot tell if the instruction is valid or not. For valid instructions, XED can figure out if enough bytes were provided to decode the instruction. If not enough were provided, XED returns XED_ERROR_BUFFER_TOO_SHORT. From this function, the XED_ERROR_GENERAL_ERROR is an indication that XED could not decode the instruction's length because the instruction was so invalid that even its length may across implementations.

To get length from xedd struct, filled by xed_ild_decode, use xed_decoded_inst_get_length: https://intelxed.github.io/ref-manual/group__DEC.html#gad1051f7b86c94d5670f684a6ea79fcdf

static XED_INLINE xed_uint_t xed_decoded_inst_get_length  (   const xed_decoded_inst_t *  p   )   

Return the length of the decoded instruction in bytes.

Example code ("Apache License, Version 2.0", by Intel 2016): https://github.com/intelxed/xed/blob/master/examples/xed-ex-ild.c

#include "xed/xed-interface.h"
#include <stdio.h>

int main()
{
    xed_bool_t long_mode = 1;
    xed_decoded_inst_t xedd;
    xed_state_t dstate;
    unsigned char itext[15] = { 0xf2, 0x2e, 0x4f, 0x0F, 0x85, 0x99,
                                0x00, 0x00, 0x00 };

    xed_tables_init(); // one time per process

    if (long_mode) 
        dstate.mmode=XED_MACHINE_MODE_LONG_64;
    else 
        dstate.mmode=XED_MACHINE_MODE_LEGACY_32;

    xed_decoded_inst_zero_set_mode(&xedd, &dstate);
    xed_ild_decode(&xedd, itext, XED_MAX_INSTRUCTION_BYTES);
    printf("length = %u\n",xed_decoded_inst_get_length(&xedd));

    return 0;
}

Any other solution like manual prefix/opcode parsing or using third-party disassembler may give you wrong results for some rare cases. We don't know which library is used inside Intel to verify their hardware instruction decoders, but xed is the library used by their software decoders in various binary tools. The ild decoder of xed has more than 1600 lines of code: https://github.com/intelxed/xed/blob/master/src/dec/xed-ild.c, and should be more precise than any other library.

Teasel answered 28/5, 2017 at 15:34 Comment(1)
It may give wrong results for some newer proprietary third-party x86/x86_64 extensions like VIA's and AMD's: github.com/intelxed/xed/issues/44. But xed's instruction database is very good: github.com/intelxed/xed/tree/master/datafiles (and not well-documented: github.com/intelxed/xed/issues/39 github.com/intelxed/xed/pull/42 and groups.google.com/forum/#!topic/golang-dev/GFv83tlhb4A; but there is github.com/intelxed/xed/pull/42/commits/… misc/engineering-notes.txt)Teasel
A
2

There's a small disassembly library called udis86: http://udis86.sourceforge.net/.

It's small and has decent documentation. If you set the translator to NULL via ud_set_syntax, then the function ud_disassemble should only decode the instruction and return the number of bytes.

Aluminum answered 23/9, 2015 at 0:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.