Metamorphic Code Examples

Asked 11/4, 2012 at 20:14 Answered 24/8, 2020 at 16:53

I understand the concept of Polymorphic and Metamorphic code but I recently read the Wikipedia page on both (for what ever reason I hadn't done this previously!). Now I really want to have a go at writing some metamorphic code for myself.

I am a master of no language, dabbler of many. I know some PHP, MySQL, c/c++, Java, Bash scripting, Visual Basic 6, VBScripting, Perl, JavaScript.

Can anyone provide an example of metamorphic code in any of these languages. I would like to see a working example, even where the output of the program is just "Hello World", to understand through example how this is happening (I am struggling to theorise how these techniques can be achieved through mental thought alone). Any language would do really, those are just preferred ones.

Additionally, searching the Internet has only returned a limited number of examples in c/c++ (not even complete working examples, more partial snippets of code), is that because the other languages I have suggested aren't low level enough to have the power/flexibility required to make metamorphic code?

Recall answered 11/4, 2012 at 20:14 Comment(3)

Another helpful tip for Poly/Metamorphic programming: Use a compiled language like C/C++. Do not use an interpreted language like PHP, Perl, VB, etc. – Lattermost 11/11, 2012 at 20:5

Maybe you noticed as much, but there are examples of assembler metamorphasis in the Wikipedia-linked Symantec PDF at the bottom of the Metamorphic Code article. Also, more sample code seems to exist for quines, which look like a good place to start insofar as metamorphic code seems (more) understandable relative to quines. – Accelerando 10/5, 2013 at 7:37

@Accelerando +1 from me, yes I have briefly skimmed the Symantec PDF, I need to read it again. And yes, I do find quines very interesting. I understand their concept already though and they are more easily to come by (via a search engine for example). I didn't include quines directly in my question but I agree they are relevant (and fascinating! :D ) – Recall 10/5, 2013 at 8:42

Below is an example of what I believe would classify as metamorphic code written in C. I'm afraid I don't have a great deal of experience writing portable C code, so it may require some modification to compile on other platforms (I'm using an old version of Borland on Windows). Also, it relies on the target platform being x86 since it involves some machine code generation. In theory it should compile on any x86 OS though.

How it works

Each time the program is run, it generates a randomly modified copy of itself, with a different filename. It also prints out a list of offsets that have been modified so you can see it actually doing something.

The modification process is very simplistic. The source code is just interpreted with sequences of assembly instructions that effectively do nothing. When the program is run, it finds these sequences and randomly replaces them with different code (which obviously also does nothing).

Hardcoding a list of offsets obviously isn't realistic for something that other people need to be able to compile, so the sequences are generated in a way that makes them easy to identify in a search through the object code, hopefully without matching any false positives.

Each sequence starts with a push operation on a certain register, a set of instructions that modify that register, and then a pop operation to restore the register to its initial value. To keep things simple, in the original source all of the sequences are just PUSH EAX, eight NOPs, and POP EAX. In all subsequent generations of the app, though, the sequences will be entirely random.

Explaining the code

I've split the code up into multiple parts so I can try to explain it step by step. If you want to compile it yourself, you'll just need to join all the parts together.

First some fairly standard includes:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

Next we have defines for various x86 opcodes. These will typically be combined with other values to generate a full instruction. For example, the PUSH define (0x50) by itself is PUSH EAX, but you can derive the values for other registers by adding an offset in the range 0 to 7. Same thing for POP and MOV.

#define PUSH 0x50
#define POP  0x58
#define MOV  0xB8
#define NOP  0x90

The last six are the prefix bytes of several two-byte opcodes. The second byte encodes the operands and will be explained in more detail later.

#define ADD  0x01
#define AND  0x21
#define XOR  0x31
#define OR   0x09
#define SBB  0x19
#define SUB  0x29

const unsigned char prefixes[] = { ADD,AND,XOR,OR,SBB,SUB,0 };

JUNK is a macro that inserts our sequence of junk operations anywhere we want in the code. As I explained before, it's initially just writing out PUSH EAX, NOP, and POP EAX. JUNKLEN is the number of NOPs in that sequence - not the full length of the sequence.

And in case you're not aware, __emit__ is a pseudo-function that injects literal values directly into the object code. I suspect it may be something you need to port if you're using a different compiler.

#define JUNK __emit__(PUSH,NOP,NOP,NOP,NOP,NOP,NOP,NOP,NOP,POP)
#define JUNKLEN 8

Some global variables where our code will be loaded. Global variables are bad, but I'm not a particularly good coder.

unsigned char *code;
int codelen;

Next we have a simple function that will read our object code into memory. I never free that memory because I just don't care.

Notice the JUNK macro calls inserted at random points. You're going to see a lot more of these throughout the code. You can insert them almost anywhere, but if you're using a real C compiler (as opposed to C++) it'll complain if you try to put them before or in-between variable declarations.

void readcode(const char *filename) {
  FILE *fp = fopen(filename, "rb");    JUNK;
  fseek(fp, 0L, SEEK_END);             JUNK;
  codelen = ftell(fp);
  code = malloc(codelen);              JUNK;
  fseek(fp, 0L, SEEK_SET);
  fread(code, codelen, 1, fp);         JUNK;
}

Another simple function to write the application out again after it has been modified. For the new filename we just replace the last character of the original filename with a digit that is incremented each time. No attempt is made to check whether the file already exists and that we're not overwriting a crucial piece of the operating system.

void writecode(const char *filename) {
  FILE *fp;
  int lastoffset = strlen(filename)-1;
  char lastchar = filename[lastoffset];
  char *newfilename = strdup(filename);  JUNK;
  lastchar = '0'+(isdigit(lastchar)?(lastchar-'0'+1)%10:0);
  newfilename[lastoffset] = lastchar;
  fp = fopen(newfilename, "wb");         JUNK;
  fwrite(code, codelen, 1, fp);          JUNK;
  fclose(fp);
  free(newfilename);
}

This next function writes out a random instruction for our junk sequence. The reg parameter represents the register we're working with - what will be pushed and popped at either end of the sequence. The offset is the offset in the code where the instruction will be written. And space gives the number of bytes we have left in our sequence.

Depending on how much space we have, we may be restricted to which instructions we can write out, otherwise we choose at random whether it's a NOP, MOV or one of the others. NOP is just a single byte. MOV is five bytes: our MOV opcode (with the reg parameter added), and 4 random bytes representing the number moved into the register.

For the two byte sequences, the first is just one of our prefixes chosen at random. The second is a byte in the range 0xC0 to 0xFF where the least significant 3 bits represent the primary register - i.e. that must be set to the value of our reg parameter.

int writeinstruction(unsigned reg, int offset, int space) {
  if (space < 2) {
    code[offset] = NOP;                         JUNK;
    return 1;
  }
  else if (space < 5 || rand()%2 == 0) {
    code[offset] = prefixes[rand()%6];          JUNK;
    code[offset+1] = 0xC0 + rand()%8*8 + reg;   JUNK;
    return 2;
  }
  else {
    code[offset] = MOV+reg;                     JUNK;
    *(short*)(code+offset+1) = rand();
    *(short*)(code+offset+3) = rand();          JUNK;
    return 5;
  }
}

Now we have the equivalent function for reading back one of these instructions. Assuming we've already identified the reg from the PUSH and POP operations at either end of the sequence, this function can attempt to validate whether the instruction at the given offset is one of our junk operations and that the primary register matches the given reg parameter.

If it finds a valid match, it returns the instruction length, otherwise it returns zero.

int readinstruction(unsigned reg, int offset) {
  unsigned c1 = code[offset];
  if (c1 == NOP)
    return 1;                     JUNK;
  if (c1 == MOV+reg)
    return 5;                     JUNK;
  if (strchr(prefixes,c1)) {
    unsigned c2 = code[offset+1]; JUNK;
    if (c2 >= 0xC0 && c2 <= 0xFF && (c2&7) == reg)
      return 2;                   JUNK;
  }                               JUNK;
  return 0;
}

This next function is the main loop the searches for and replaces the junk sequences. It starts by looking for a PUSH opcode followed by a POP opcode on the same register eight bytes later (or whatever JUNKLEN was set to).

void replacejunk(void) {
  int i, j, inc, space;
  srand(time(NULL));                                 JUNK;

  for (i = 0; i < codelen-JUNKLEN-2; i++) {
    unsigned start = code[i];
    unsigned end = code[i+JUNKLEN+1];
    unsigned reg = start-PUSH;

    if (start < PUSH || start >= PUSH+8) continue;   JUNK;
    if (end != POP+reg) continue;                    JUNK;

If the register turns out to be ESP, we can safely skip it because we'll never use ESP in our generated code (stack operations on ESP need special consideration that isn't worth the effort).

    if (reg == 4) continue; /* register 4 is ESP */

Once we've matched a likely looking PUSH and POP combination, we then try to read the instructions in-between. If we successfully match the length of bytes we're expecting, we consider that a match that can be replaced.

    j = 0;                                           JUNK;
    while (inc = readinstruction(reg,i+1+j)) j += inc;
    if (j != JUNKLEN) continue;                      JUNK;

We then pick one of 7 registers at random (as explained before we don't consider ESP), and write out the PUSH and POP operations for that register at either end of the sequence.

    reg = rand()%7;                                  JUNK;
    reg += (reg >= 4);
    code[i] = PUSH+reg;                              JUNK;
    code[i+JUNKLEN+1] = POP+reg;                     JUNK;

Then all we need to do is fill in the space in-between using our writeinstruction function.

    space = JUNKLEN;
    j = 0;                                           JUNK;
    while (space) {
      inc = writeinstruction(reg,i+1+j,space);       JUNK;
      j += inc;
      space -= inc;                                  JUNK;
    }

And here's where we display the offset that we just patched.

    printf("%d\n",i);                                JUNK;
  }
}

Finally we have the main function. This just calls the functions previously described. We read in the code, replace the junk, then write it out again. The argv[0] argument contains the application filename.

int main(int argc, char* argv[]) {

  readcode(argv[0]);     JUNK;
  replacejunk();         JUNK;
  writecode(argv[0]);    JUNK;

  return 0;
}

And that's all there is to it.

Some final notes

When running this code, obviously you need to make sure the user has the appropriate permissions to write out a file in the same location as the original code. Then once the new file has been generated, you'll typically need to rename it if you're on a system where the file extension is important, or set its execute attributes if that is needed.

Finally, I suspect you may want to run the generated code through a debugger rather than just executing it directly and hoping for the best. I found that if I copied the generated file over the original executable, the debugger was happy to let me step through it while still viewing the original source code. Then whenever you get to a point in the code that says JUNK, you can pop into the assembly view and look at the code that has been generated.

Anyway, I hope my explanations have been reasonably clear, and this was the kind of example you were looking for. If you have any questions, feel free to ask in the comments.

Bonus update

As a bonus, I thought I'd also include an example of metamorphic code in a scripting language. This is quite different from the C example, since in this case we need to mutate the source code, rather than the binary executable, which is a little easier I think.

For this example, I've made extensive use of php's goto function. Every line starts with a label, and ends with a goto pointing to the label of the following line. That way each line is essentially self contained, and we can happily shuffle them and still have the program work exactly as before.

Conditions and loop structures are a little more complicated, but they just need to be rewritten in the form of a condition that jumps to one of two different labels. I've included comment markers in the code where the loops would be to try and make it easier to follow.

Example code on ideone.com

All the code does is echo the shuffled copy of itself, so you can easily test it on ideone just by cutting and pasting the output back into the source field and running it again.

If you wanted it to mutate even more, it would be fairly easy to do something like replace all the labels and variables with a different set of random strings every time the code was run. But I thought it best to try and keep things as simple as possible. These examples are just meant to demonstrate the concept - we're not actually trying to avoid detection. :)

Keg answered 4/5, 2013 at 23:43 Comment(7)

A fantastic answer with great details and breakdown. Many thanks for your sterling effort James. I have had a quick read all the way through. When I get a chance I will post a reply to your answer, after I have altered this code to compile on x86_64 Linux (I have just started with the ASM parts first, for example there is now __emit__ on Linux, it'd probably be __volatile__). I will post a response containing the Linux port of your code. – Recall 9/5, 2013 at 11:11

This stackoverflow answer looks promising: #731928 The point about optimisations is good too. I would definitely recommend compiling without optimisation. – Keg 9/5, 2013 at 12:38

Oh, I love the PHP example! That has made my day :D Bravo! Plenty of food for thought here between these two different examples 100 points well deserved sir! – Recall 11/5, 2013 at 9:37

Thank you. I really enjoyed working on this. It was a great question. – Keg 11/5, 2013 at 10:11

Cool, well it's not over yet. I will keep updating it and posting my answers evolving it as time goes on. I first asked this question two years ago, so progress is a bit slow though ;) – Recall 11/5, 2013 at 13:50

I have finally found some more time to look at this question. I have examined the excellent PHP example provided and made my own extension to it. I will look at the C example in the coming weeks and update again. I use Linux and the code doesn't run on Linux, so my aim will be to adjust for Linux then post that back. – Recall 29/9, 2013 at 14:57

Here's a discussion of getting it to work on more mainstream compilers like gcc. The only real trick is replacing __emit__ with an asm block. – Springy 18/10, 2017 at 16:32

Publically available metamorphic code samples are limited by several factors:

1) Expertise: Metamorphic coding is an extremely advanced technique in computer programming. The number of programmers capable of coding coherent and clean metamorphic code suitable for sampling is a very small number.
2) Financial Incentives: Metamorphic coding has limited use in commercial application. Because of this the number of programmers who have sufficient skill to create metamorphic code have no professional exposure/incentive to create/learn metamorphic coding techniques.
3) Legitamicy: Metamorphic coding has large applications in potent virus creation. Hence any responsible professional who created metamorphic code would have ethical issues freely distributing samples as an ametuer hacker may be able to use the code to enhance a malicious attack. Conversely, any hacker who was competent enough to create metamorphic code would have no incentive to advertise his skill, should one of his attacks be uncovered as he would then be on a very short list of suspects based on competency.
4) Secrecy: Lastly, and probably the most realist reason metamorphic code is so difficult to find is because any programmer who demonstrates competency in metamorphic programming, and is not apprehended by authorities for cyber crimes, is likely to be recruited by a government security agency, private security firm, or anti-virus company and the programmer's subsequent research/knowledge is then subject to a non-disclosure agreement to maintain a competitive edge.

Why only C/C++ examples?

You mention finding only C/C++ code examples of poly/metamorphic programming and inferred that only languages close to the hardware can be poly/metamorphic. This is true for the strictest definitions of poly/metamorphic code. Interpreted languages can have poly/metamorphic behavior but rely on a statically complied interpreter to execute, hence a large portion of the 'run-time signature' is not mutable. Only compiled low level languages offer the computational flexibility to have a highly mutable 'run time signature.'

Here is some 'polymorphic' PHP code I wrote. PHP being an interpreted language and not a compiled language makes true polymorphism impossible.

PHP Code:

<?php
// Programs functional Execution Section
system("echo Hello World!!\\n");
// mutable malicious payload goes here (if you were devious)

// Programs Polymorphic Engine Section
recombinate();
?>
<?php

function recombinate() {
  $file      = __FILE__;                    //assigns file path to $file using magic constant
  $contents  = file_get_contents($file);    //gets file contents as string
  $fileLines = explode("\n", $contents);    //splits into file lines as string array
  $varLine   = $fileLines[2];               //extracts third file line as string
  $charArr   = str_split($varLine);         //splits third line into char array
  $augStr    = augmentStr($charArr);        //recursively augments char array
  $newLine   = implode("",$augStr);         //rebuilds char array into string
  $fileLines[2] = $newLine;                 //inserts string back into array
  $newContents  = implode("\n",$fileLines); //rebuilds array into single string
  file_put_contents($file,$newContents);    //writes out augmented file
  sleep(1);                                 //let the CPU rest
  $pid = pcntl_fork();                      //forks process
  if($pid) {                                //if in parent:
    exit(0);                                //exit parent process
  }                                         //WARNING: creates 'Zombie' child process
  else {                                    //else in child process
    system("nohup php -f " .$file . " 2> /dev/null"); //executes augmented file
    exit(0);                                //exits exit child process
  }
}

function augmentStr($inArr) {
  if (mt_rand(0,6) < 5) {               //determines mutability
    /*$startIndex & $endIndex define mutable parts of file line as Xs
     * system("echo XXXXX ... XXXXX\\n");
     * 01234567890123            -7654321
     */
    $startIndex  = 13;
    $endIndex    = count($inArr)-7;
    $targetIndex = mt_rand($startIndex,$endIndex);     //choose mutable index
    $inArr[$targetIndex] = getSafeChar(mt_rand(0,62)); //mutate index
    $inArr = augmentStr($inArr);               //recurse
  }
  return $inArr;
}

function getSafeChar($inNum) {      //cannot use escaped characters
  $outChar;                 //must be a standard PHP char
       if ($inNum >=  0 && $inNum <= 9 ) { $outChar = chr($inNum + 48); }
  else if ($inNum >= 10 && $inNum <= 35) { $outChar = chr($inNum + 55); }
  else if ($inNum >= 36 && $inNum <= 61) { $outChar = chr($inNum + 61); }
  else if ($inNum == 62)                 { $outChar = " ";              }
  else                                   { $outChar = " ";              }
  return $outChar;
}

?>

WARNING: Creates a zombie process, know how to kill a zombie process before running code

Information Finding Techniques:

This article contains more specific information then Wikipedia. This article does not, however, contain true source code. If you would like my advice, though it is highly unlikely that would will find sample source code, you may be able to find sufficient academic documentation to create your own metamorphic code. Consider this to start (google scholar):

When reading academic articles/papers be sure to look at the sources at the end of the document as these sources my also have valuable information.

Best of luck in your quest for knowledge!

Lattermost answered 5/8, 2012 at 5:3 Comment(4)

I'm not ignoring your great answer by the way, I have been busy of late and no time to fully read it. I will get back to you when I can though. In the mean time I up-voted it a while back for the effort. Cheers! – Recall 27/4, 2013 at 16:52

@javano I would suggest considering putting a high bounty on this question, not so that I would get the reputation, but in an attempt to get another user to provide some small sample poly/metamorphic code in a compiled language. – Lattermost 27/4, 2013 at 17:18

I am back on this question now as I have some more time on my hands; I completely agree with your points 1-4 explaining why I am having difficulties locating working code. I was shocked to be honest though because I figured I would be able to find some purely for academic purposes, there isn't much you can't find on the Internet these days (bomb designs, complex encryption algorithms, reverse engineering tools, etc), so I do find it a bit odd. Still, good argument.I understand why a compiled language like C is likely required for true metamorphism but your PHP example is a good one! – Recall 4/5, 2013 at 15:29

I don't agree with points 3 and 4. I think they are pure conjecture with no actual data to back them up. – Sodomy 9/11, 2015 at 12:10

This answer is not finished, I will continue to expand on it over time, until this question's answer is complete

Scripted Example - PHP

I have made my own copy of the PHP script James Holderness provided, so that I could see for my self through demonstration how a metamorphic script could work. A full write up of the code is here; http://null.53bits.co.uk/index.php?page=php-goto-replicator

Simply, after initially executing the script it copies itself to a new file with a random file name, with the lines of code in a new random order, it then forks a new process which is executing the new copy of the script file and the original copy exits. Now there is a new copy of the script running, which is a copy of the original file but with a random file name and the lines of code are in a different order. This is a perpetual process; reordering and replicating, then executing a new instance (process) killing the previous one.

I aimed to extend James Holderness's PHP answer a little, into a working self replicating and morphing code example.

This is the raw PHP code I have come up with;

<?php goto a01;
a01: $characters = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';    goto a02;
a02: $randomString = __DIR__."/";                                                       goto a03;
a03: $i = 0;                                                                            goto a04;
a04: if ($i < 10)                                                                       goto a05; else goto a07;
a05:   $randomString .= $characters[rand(0, strlen($characters) - 1)];                  goto a06;
a06:   $i++;                                                                            goto a04;
a07: $randomString .= ".php";                                                           goto a08;
a08: $ARGS=Array("-f",$randomString);                                                   goto a09;
a09: $handle_out = fopen("$randomString", "w");  goto l01;
l01: $filename = __FILE__;                       goto l02;
l02: $contents = file_get_contents($filename);   goto l03;
l03: $lines = explode("\n",$contents);           goto l04;
l04: $collection = array();                      goto l05;
l05: $pattern = '%^[^:]+:.*goto [^;]+;$%';       goto l06;
l06: $i = 0;                                     goto l07;
l07: if ($i < count($lines)-1)                   goto l08; else goto l23;
l08:   $line = $lines[$i];                       goto l09;
l09:   $line = trim($line);                      goto l10;
l10:   if (substr($line,0,2) != '//')            goto l11; else goto l22;
l11:     if (preg_match($pattern, $line) === 1)  goto l12; else goto l13;
l12:       $collection[] = $line;                goto l22;
l13:       shuffle($collection);                 goto l14;
l14:       $j = 0;                               goto l15;
l15:       if ($j < count($collection))          goto l16; else goto l19;
l16:         echo $collection[$j]."\n";          goto l17;
l17:         fwrite($handle_out, $collection[$j]."\n");    goto l18;
l18:         $j++;                               goto l15;
l19:       $collection = array();                goto l20;
l20:       fwrite($handle_out, $line."\n");      goto l21;
l21:       echo $line."\n";                      goto l22;
l22:   $i++;                                     goto l07;
l23: fclose($handle_out);                        goto f01;
f01: $pid = pcntl_fork();                        goto f02;
f02: if ($pid == -1)                             goto f03; else goto f04;
f03:   die("Could not fork a new child\n");      goto f03;
f04: if ($pid)                                   goto f05; else goto f06;
f05:   exit(0);                                  goto f05;
f06: $sid = posix_setsid();                      goto f07;
f07: if ($sid < 0)                               goto f08; else goto f09;
f08:   die("Child posix_setsid error\n");        goto f08;
f09: sleep(10);                                  goto f10;
f10: pcntl_exec(PHP_BINARY, $ARGS);
l24: exit(0);                                    goto l24;
?>

Recall answered 29/9, 2013 at 14:55 Comment(0)

As per my comment on the 29th of September 2013:

I will look at the C example in the coming weeks and update again. I use Linux and the code doesn't run on Linux, so my aim will be to adjust for Linux then post that back.

Roughly a month short of 7 years later (on-time as usual!), thanks to the global pandemic I have finally found time to port the answer supplied by @james-holderness to Linux (and modified to be 64-bit Intel compliant).

A full write up is available at the following location with additional explanation and some other variations on the same theme: https://github.com/jwbensley/metamorphic_polymorphic/tree/master/meta_001_c

A full copy of the code is also copied below:

#include <ctype.h>     // isdigit()
#include <inttypes.h>  // intN_t, PRIuN, uintN_t
#include <sys/stat.h>  // chmod()
#include <stdio.h>     // fp, fclose(), fopen(), fprintf(), fread(), fseek(),
                       // ftell(), fwrite(), printf(),
                       // SEEK_END, SEEK_SET, stderr
#include <stdlib.h>    // EXIT_FAILURE, EXIT_SUCCESS, free(), malloc()
#include <string.h>    // strchr(), strcmp(), strdup(), strlen()


#define B_PUSH_RAX ".byte 0x50\n\t" // push rax
#define B_PUSH_RBX ".byte 0x53\n\t" // push rbx
#define B_POP_RAX  ".byte 0x58\n\t" // pop rax
#define B_POP_RBX  ".byte 0x5b\n\t" // pop rbx
//#define B_NOP  ".byte 0x90\n\t" // nop
/*
 REX.W + 0x90+r64 == XCHG RAX, r64
 "xchg rax, rax" is NOP (0x90)
 Hackaround by using REX.W flag + opcode 0x87 + r64 to avoid the use of 0x90
*/
#define B_NOP ".byte 0x48,0x87,0xc0\n\t" // REX.W xchg rax,rax

#define H_PUSH     0x50 // push + reg
#define H_POP      0x58 // pop + reg
//#define H_NOP      0x90 // nop
#define H_NOP_0    0x48 // --------------------
#define H_NOP_1    0x87 // REX.W xchg rax,rax |
#define H_NOP_2    0xC0 // --------------------

#define JUNK_ASM __asm__ __volatile__ (B_PUSH_RBX B_PUSH_RAX B_NOP B_NOP B_POP_RAX B_POP_RBX)
#define JUNKLEN 10


int local_rand () {
  
  /*
    This program has such a short lifetime, srand(time(0)) simply isn't random
    enough within a single execution of the program
  */
  int digit;
  FILE *fp;
  fp = fopen("/dev/urandom", "r");
  fread(&digit, 1, 1, fp);
  fclose(fp);
  
  return digit;

}


void insert_junk(uint8_t *file_data, uint64_t junk_start) {

  JUNK_ASM;

  /*
  This is the matrix of source and destination register opcodes for Intel.
  For example;
  0xB8 == "mov"
  0xB8 + 0xC0 == 0x178 "mov eax, eax"
  0xB8 + 0xC8 == 0x180 "mov eax, ebx"

      EAX ECX EDX EBX ESP EBP ESI EDI
  EAX C0  C8  D0  D8  E0  E8  F0  F8
  ECX C1  C9  D1  D9  E1  E9  F1  F9
  EDX C2  CA  D2  DA  E2  EA  F2  FA
  EBX C3  CB  D3  DB  E3  EB  F3  FB
  ESP C4  CC  D4  DC  E4  EC  F4  FC
  EBP C5  CD  D5  DD  E5  ED  F5  FD
  ESI C6  CE  D6  DE  E6  EE  F6  FE
  EDI C7  CF  D7  DF  E7  EF  F7  FF
  */

  /*
   The junk assembly instructions use the following pattern so that they can be identified:
   r1 = random register from RAX, RBX, RCX or RDX
   r2 = a different random register from RAX, RBX, RCX, RDX
   push r1
   push r2
   xchg r1, r2
   xchg r1, r2
   pop r2
   pop r1
  */


  uint8_t reg_1 = (local_rand()%4);
  uint8_t reg_2 = (local_rand()%4);
  
  while(reg_2 == reg_1) {
    reg_2 = (local_rand()%4);
  }

  uint8_t push_r1 = 0x50 + reg_1;
  uint8_t push_r2 = 0x50 + reg_2;

  uint8_t pop_r1 = 0x58 + reg_1;
  uint8_t pop_r2 = 0x58 + reg_2;

  uint8_t nop[3] = {0x48,0x87,0xC0};
  nop[2] += reg_1;
  nop[2] += (reg_2 * 8);

  file_data[junk_start] = push_r1;
  file_data[junk_start + 1] = push_r2;

  file_data[junk_start + 2] = nop[0];
  file_data[junk_start + 3] = nop[1];
  file_data[junk_start + 4] = nop[2];
  file_data[junk_start + 5] = nop[0];
  file_data[junk_start + 6] = nop[1];
  file_data[junk_start + 7] = nop[2];
  file_data[junk_start + 8] = pop_r2;
  file_data[junk_start + 9] = pop_r1;

}


int32_t load_file(uint8_t **file_data, uint32_t *file_len, const char *filename) {

  JUNK_ASM;

  FILE *fp = fopen(filename, "rb");
  if (fp == NULL) {
    fprintf(stderr, "Error opening file %s for reading\n", filename);
    return(EXIT_FAILURE);
  }

  fseek(fp, 0L, SEEK_END);
  if (ftell(fp) < 1) {
    fprintf(stderr, "File %s 0 bytes in length\n", filename);
  } else {
    *file_len = ftell(fp);
  }

  *file_data = malloc(*file_len);
  if (*file_data == NULL) {
    fprintf(stderr, "Failed to malloc space for file\n");
    return(EXIT_FAILURE);
  }

  fseek(fp, 0L, SEEK_SET);
  if (fread((void*)*file_data, *file_len, 1, fp) != 1) {
    fprintf(stderr, "Failed to fread() file %s\n", filename);
    free(file_data);
    return EXIT_FAILURE;
  }
  fclose(fp);

  printf("Loaded %" PRIu32 " bytes from %s\n\n", *file_len, filename);
  return EXIT_SUCCESS;

}


void replace_junk(uint8_t *file_data, uint64_t file_len)
{

  JUNK_ASM;

  for (uint64_t i = 0; i < file_len; i += 1) {

    // Start of the junk ASM
    if (file_data[i] >= H_PUSH && file_data[i] <= (H_PUSH + 3)) {
      if (file_data[i + 1] >= H_PUSH && file_data[i + 1] <= (H_PUSH + 3)) {
        if (file_data[i + 2 == H_NOP_0]) {
          if (file_data[i + 3] == H_NOP_1) {

            printf(
              "Junk found, from offset 0x%" PRIX64 " (%" PRIu64 ") to "
              "0x%" PRIX64 " (%" PRIu64 ")\n",
              i, i, (i + JUNKLEN), (i + JUNKLEN)
            );

            printf("Replaced: ");
            for (uint64_t j = i; j < (i + JUNKLEN); j += 1) {
              printf("0x%" PRIX32 " ", file_data[j]);
            }
            printf("\n");

            insert_junk(file_data, i);

            printf("With: ");
            for (uint64_t j = i; j < (i + JUNKLEN); j += 1) {
              printf("0x%" PRIX32 " ", file_data[j]);
            }
            printf("\n");

            printf("\n");
          }
        }
      }
    }
  }

  printf("All existing junk sequences have been replaced\n\n");
}


int32_t write_file(uint8_t *file_data, uint32_t file_len, const char *filename) {

  JUNK_ASM;

  FILE *fp;  
  int lastoffset = strlen(filename)-1;
  char lastchar = filename[lastoffset];
  char *newfilename = strdup(filename);
  lastchar = '0'+(isdigit(lastchar)?(lastchar-'0'+1)%10:0);
  newfilename[lastoffset] = lastchar; 
  
  fp = fopen(newfilename, "wb");
  if (fp == NULL){
    fprintf(stderr, "Error opening file %s for writing\n", filename);
    free(newfilename);
    return(EXIT_FAILURE);
  }

  fwrite(file_data, file_len, 1, fp);
  if (ferror (fp))
    fprintf(stderr, "Unable to write to new file %s\n", filename);

  fclose(fp);

  if (chmod(newfilename, S_IRWXU|S_IRGRP|S_IXGRP|S_IROTH|S_IXOTH) != 0) {
    fprintf(stderr, "Unable to chmod file %s\n", newfilename);
    return EXIT_FAILURE;
  }

  printf("Wrote modified binary data to %s\n\n", newfilename);
  free(newfilename);
  return EXIT_SUCCESS;

}


int main(int argc, char* argv[]) {

  JUNK_ASM;

  // Load this file into memory
  uint8_t  *file_data = NULL;
  uint32_t file_len;
  if (load_file(&file_data, &file_len, argv[0]) != EXIT_SUCCESS)
    return EXIT_FAILURE;

  // Replace the existing junk ASM sequences with new ones
  replace_junk(file_data, file_len);

  // Write to disk as new file
  if (write_file(file_data, file_len, argv[0]) != EXIT_SUCCESS) {
    free(file_data);
    return EXIT_FAILURE;
  }

  free(file_data);
  return EXIT_SUCCESS;

}

Recall answered 24/8, 2020 at 16:53 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags