How to write and execute PURE machine code manually without containers like EXE or ELF?

Asked 11/3, 2011 at 1:21 Answered 21/10, 2019 at 15:10

I just need a hello world demo to see how machine code actually works.

Though windows' EXE and linux' ELF is near machine code,but it's not PURE

How can I write/execute PURE machine code?

Onega answered 11/3, 2011 at 1:21 Comment(3)

What do you mean by pure? Something that doesn't have headers? If you write your code at a low enough level, you don't need to link in any libraries, and so the executable will just be your code with the file structure set up so the OS can load it. – Frigid 11/3, 2011 at 1:22

No stuff other than instructions run by OS. – Onega 11/3, 2011 at 2:1

Related but specifically loaded into the virtual memory of another program: #3615892 – Mobility 23/5, 2015 at 13:32

You can write in PURE machine code manually WITHOUT ASSEMBLY

Linux/ELF: https://github.com/XlogicX/m2elf. This is still a work in progress, I just started working on this yesterday.

Source file for "Hello World" would look like this:

b8    21 0a 00 00   #moving "!\n" into eax
a3    0c 10 00 06   #moving eax into first memory location
b8    6f 72 6c 64   #moving "orld" into eax
a3    08 10 00 06   #moving eax into next memory location
b8    6f 2c 20 57   #moving "o, W" into eax
a3    04 10 00 06   #moving eax into next memory location
b8    48 65 6c 6c   #moving "Hell" into eax
a3    00 10 00 06   #moving eax into next memory location
b9    00 10 00 06   #moving pointer to start of memory location into ecx
ba    10 00 00 00   #moving string size into edx
bb    01 00 00 00   #moving "stdout" number to ebx
b8    04 00 00 00   #moving "print out" syscall number to eax
cd    80            #calling the linux kernel to execute our print to stdout
b8    01 00 00 00   #moving "sys_exit" call number to eax
cd    80            #executing it via linux sys_call

WIN/MZ/PE:

shellcode2exe.py (takes asciihex shellcode and creates a legit MZ PE exe file) script location:

https://web.archive.org/web/20140725045200/http://zeltser.com/reverse-malware/shellcode2exe.py.txt

dependency:

https://github.com/radare/toys/tree/master/InlineEgg

extract

python setup.py build




sudo python setup.py install

Jacindajacinta answered 25/8, 2014 at 10:5 Comment(6)

ok, m2elf now supports memory allocations; I just tested "Hello World" in pure machine code, it works. PoC of that is at the bottom of the README on the above mentioned github page – Jacindajacinta 25/8, 2014 at 14:33

Where did you learn about this? Do you have some resources you can share, I'm interested in it :-) – Thorner 5/9, 2015 at 14:34

For ELF header: elf.h, Ange Albertini infographic on ELF, and assemble->hexdump->analyze hacking. – Jacindajacinta 6/9, 2015 at 15:25

For machine code, I've read Vol II of the Intel manual (I've also read Vol I and III, but Vol II is the one that digs into the instructions). I demo some odd x86 tricks on some of my blog posts (xlogicx.net). Just email me if you want to discuss further. – Jacindajacinta 6/9, 2015 at 15:26

Also, if you want to use m2elf.pl without having to set much up, remnux v6 has it built in now (remnux.org) – Jacindajacinta 6/9, 2015 at 15:29

For the hexdump->analyze part you could also use fq instead of hexdump. fq can tell you what each bit is used for if the format is supported. – Sarpedon 7/12, 2023 at 10:49

Real Machine Code

What you need to run the test: Linux x86 or x64 (in my case I am using Ubuntu x64)

Let's Start

This Assembly (x86) moves the value 666 into the eax register:

movl $666, %eax
ret

Let's make the binary representation of it:

Opcode movl (movl is a mov with operand size 32) in binary is = 1011

Instruction width in binary is = 1

Number 666 in signed 32 bits binary is = 00000000 00000000 00000010 10011010

666 converted to little endian is = 10011010 00000010 00000000 00000000

Instruction ret (return) in binary is = 11000011

So finally our pure binary instructions will look like this:

1011(movl)1(width)000(eax)10011010000000100000000000000000(666) 11000011(ret)

Putting it all together:

1011100010011010000000100000000000000000
11000011

For executing it the binary code has to be placed in a memory page with execution privileges, we can do that using the following C code:

#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>

/* Allocate size bytes of executable memory. */
unsigned char *alloc_exec_mem(size_t size)
{
    void *ptr;

    ptr = mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC,
               MAP_PRIVATE | MAP_ANON, -1, 0);

    if (ptr == MAP_FAILED) {
            perror("mmap");
            exit(1);
    }

    return ptr;
}

/* Read up to buffer_size bytes, encoded as 1's and 0's, into buffer. */
void read_ones_and_zeros(unsigned char *buffer, size_t buffer_size)
{
    unsigned char byte = 0;
    int bit_index = 0;
    int c;

    while ((c = getchar()) != EOF) {
            if (isspace(c)) {
                    continue;
            } else if (c != '0' && c != '1') {
                    fprintf(stderr, "error: expected 1 or 0!\n");
                    exit(1);
            }

            byte = (byte << 1) | (c == '1');
            bit_index++;

            if (bit_index == 8) {
                    if (buffer_size == 0) {
                            fprintf(stderr, "error: buffer full!\n");
                            exit(1);
                    }
                    *buffer++ = byte;
                    --buffer_size;
                    byte = 0;
                    bit_index = 0;
            }
    }

    if (bit_index != 0) {
            fprintf(stderr, "error: left-over bits!\n");
            exit(1);
    }
}

int main()
{
    typedef int (*func_ptr_t)(void);

    func_ptr_t func;
    unsigned char *mem;
    int x;

    mem = alloc_exec_mem(1024);
    func = (func_ptr_t) mem;

    read_ones_and_zeros(mem, 1024);

    x = (*func)();

    printf("function returned %d\n", x);

    return 0;
}

Source: https://www.hanshq.net/files/ones-and-zeros_42.c

We can compile it using:

gcc source.c -o binaryexec

To execute it:

./binaryexec

Then we pass the first sets of instructions:

1011100010011010000000100000000000000000

press enter

and pass the return instruction:

11000011

press enter

finally ctrl+d to end the program and get the output:

function returned 666

Bicentenary answered 21/10, 2019 at 15:10 Comment(4)

where did you get the opcode for MOV etc.? Also why do you need the width command in machine language? – Kellda 27/4, 2020 at 16:3

Also do you know if there is a basic basic cross browser file format that can be simply double clicked to execute this machine code, on any platform (assuming no external depednencies)? – Kellda 27/4, 2020 at 17:5

@B''HBi'ezras--BoruchHashem: Intel's or AMD's manuals are a good source for x86. Intel's vol.2 manual isn't written as a tutorial / intro (but there's some of that in vol.1), but vol.2 does have intro chapters that explain what the entries mean. After you've read that, you can look at just the instruction entries in the manual, like felixcloutier.com/x86/mov . movl $666, %eax is Intel syntax mov r32, imm32, opcode B8+ rd (so the register number is the low 3 bits of the opcode byte). – Catton 12/2 at 19:11

See also wiki.osdev.org/X86-64_Instruction_Encoding . And stackoverflow.com/tags/x86/info for more links to docs and guides. – Catton 12/2 at 19:12

Everyone knows that the application we usually wrote is run on the operating system. And managed by it.

It means that the operating system is run on the machine. So I think that is PURE machine code which you said.

So, you need to study how an operating system works.

Here is some NASM assembly code for a boot sector which can print "Hello world" in PURE.

 org
   xor ax, ax
   mov ds, ax
   mov si, msg
boot_loop:lodsb
   or al, al 
   jz go_flag   
   mov ah, 0x0E
   int 0x10
   jmp boot_loop

go_flag:
   jmp go_flag

msg   db 'hello world', 13, 10, 0

   times 510-($-$$) db 0
   db 0x55
   db 0xAA

And you can find more resources here: http://wiki.osdev.org/Main_Page.

END.

If you had installed nasm and had a floppy, You can

nasm boot.asm -f bin -o boot.bin
dd if=boot.bin of=/dev/fd0

Then, you can boot from this floppy and you will see the message. (NOTE: you should make the first boot of your computer the floppy.)

In fact, I suggest you run that code in full virtual machine, like: bochs, virtualbox etc. Because it is hard to find a machines with a floppy.

So, the steps are First, you should need to install a full virtual machine. Second, create a visual floppy by commend: bximage Third, write bin file to that visual floppy. Last, start your visual machine from that visual floppy.

NOTE: In https://wiki.osdev.org , there are some basic information about that topic.

Carbonado answered 11/3, 2011 at 1:48 Comment(6)

Is there a easier to run demo that just do something pretty easy? – Onega 11/3, 2011 at 3:43

You can do all with the GUI tools if you had compiled the assembly code with NASM. Also you are do that with MS windows. You need to get the flowing softwares: 1, Floppy image writer 2, Oracle VM VirtualBox . The key points are, in VirtualBox , first, choose the floppy had selected in boot order list and add a floppy controller, then load the floppy image file which create by Floppy image writer. HOW TO use the two tools, you can read the manuals or google it. It not hard. – Carbonado 11/3, 2011 at 4:51

"Pretty easy"? The guy gave you 16 lines of assembler and a command to run it. Writing your own operating system from scratch doesn't get any easier than that. I would recommend going the virtual machine route than the floppy. For one thing, floppy drives are hard to come by these days. For another, booting your own kernel on a real machine could damage the machine, but it can't damage an emulator. Finally, it's much easier to test without having to reboot your machine all the time. – Impresario 9/6, 2011 at 6:25

Assembly is not 'pure' machine code, it is an abstraction of machine code. – Kathyrnkati 8/11, 2013 at 16:14

@Onega That's as easy as machine code gets, if you think that's too hard then you probably don't have enough experience as a programmer to be attempting machine code. – Kathyrnkati 8/11, 2013 at 16:16

Adding to this method, this link sounds like a great way to start off your first program that's a little challenging than hello world 99-bottles-of-beer.net/language-assembler-(intel-8086)-45.html – Disregard 9/12, 2013 at 1:46

It sounds like you're looking for the old 16-bit DOS .COM file format. The bytes of a .COM file are loaded at offset 100h in the program segment (limiting them to a maximum size of 64k - 256 bytes), and the CPU simply started executing at offset 100h. There are no headers or any required information of any kind, just raw CPU instructions.

Agone answered 11/3, 2011 at 1:37 Comment(4)

Could you (or anyone else) provide an Hello World example of a program written that way? – Entablature 31/7, 2012 at 14:55

Sure, here's an example: 99-bottles-of-beer.net/language-assembler-(intel-8086)-45.html – Agone 31/7, 2012 at 20:2

Greg, that example is just more assembly code, not pure machine code, and beer is gross – Jacindajacinta 11/4, 2016 at 20:23

@XlogicX: Boo hoo, use nasm -f bin hello.asm to get a flat binary of machine code bytes corresponding to the asm instructions, or in that case I guess MASM or TASM syntax. Some Code Golf answers in x86 machine code do show the actual machine code (as a hexdump), like If a program terminates and there is no one to see it, does it halt? (which takes advantage of the fact that .COM mixes code and data and makes self-modifying code possible, although of course it's a performance disaster when running slow isn't your goal.) – Catton 12/2 at 19:33

The OS is not running the instructions, the CPU does (except if we're talking about a virtual machine OS, which do exist, I'm thinking about Forth or such things). The OS however does require some metainformation to know, that a file does in fact contain executable code, and how it expects its environment to look like. ELF is not just near machine code. It is machine code, together with some information for the OS to know that it's supposed to put the CPU to actually execute that thing.

If you want something simpler than ELF but *nix, have a look at the a.out format, which is much simpler. Fun fact: This format is named after the default output executable file name from *nix C compilers and linkers, a.out if no output name is specified, although the file format is still ELF for modern compilers.

Dru answered 25/3, 2012 at 0:23 Comment(5)

is out format the absolute simplest way to make an executable from pure machine language? Is it cross-platform? – Kellda 27/4, 2020 at 16:46

@bluejayke: It's the most simple format that used to be supprted by *nixes. However these days it's hardly supported anymore. It is definitely not cross platform. If you want to go the most simple, bare metal "format" you could write a headerless, raw instruction stream at the reset vector address of a CPU and write that to the BIOS/UEFI firmware flash. However you'll then have to implement a whole BIOS first, in order to do something meaningful. – Dru 27/4, 2020 at 20:28

@bluejayke: As far as simplemost format goes, in good old DOS that would be the COM format, which also is just a raw instruction stream (just like firmware). However those work only in DOS. – Dru 27/4, 2020 at 20:29

Oh interesting, if a whole BIOS would indeed be writen would it be possible to load it within another operating system though? – Kellda 28/4, 2020 at 7:1

@bluejayke: If you were to load it into a machine emulator, then yes. That's essentially what all VMs like VirtualBox, VMWare, Qemu and so on do. However a BIOS is not the kind of program that you can execute as a regular process in the OS. For two reasons: 1st processes running under a modern OS are prevented from doing everything possible; the low level accesses a BIOS does are disallowed. 2nd many of the things a BIOS tries to do in the first place, would severely disrupt the tasks the OS already does. – Dru 28/4, 2020 at 8:12

The next program is an Hello World program I wrote in Machine Code 16 bit (intel 8086), If you want to know machine code, I suggest that you learn Assembly first, because every line of code in Assembly is converted to A code line in Machine Code. For well I know I am from the few people in the world, still programming in Machine Code, instead of Assembly.

BTW, To run it, save the file with a ".com" extension and run on DOSBOX!

So, this is an Hello World Program.

Tiaratibbetts answered 4/1, 2019 at 9:57 Comment(4)

Explanation: the "B4 00 B0 10 CD 10" Is setting the video mode to EGA, 16 colors, 640x350. The "EB" is a short jmp command, it can jmp 127 bytes forward and 128 backwards. the "4B" is how much to jump. the "24 30" ($0) Is an indicator for the printing command for where this string starts. the "00" (null) at the end of the string tells the print method that it reached the end of the string. the "AC 3C 24 72 02 EB F9" tells the method to load the next byte from memory into the AX stack, and if AL (part of AX) equals to "24"($), than goto "AC 38 D0 74 02 EB F2"...... – Jeneejenei 4/1, 2019 at 10:8

... that checks if the AL register equals to the DL register (i moved the 0 or 1 character into it), and if it is than go to "AC 3C 00 74 06 B4 0E CD 10 EB F5 C3" which basically prints the next byte, untill the next byte is "00", in which case, return. "the B2 31 B3 0C" sets the DL register to character 1, and the bl, (used for color) into 0C (light red), and the "E8" tells it to call the method in "CB FF". the rest is just user input, set video mode back to 03, and exit ("B4 4C CD 21"). – Jeneejenei 4/1, 2019 at 10:14

Hi. I am looking for a way to see what this code: [[[ function incrementX(obj) { return 1 + obj.x; } incrementX({x: 42}); ]]] inside the square brackets would look like in machine code. I was able to generate the AST and the bytecode from it, but not the machine code. Do you happen to know how to do that? – Pellagra 25/7, 2022 at 13:58

@גיאכהן where and how do I learn to program in machine code, or better saying: how do I get to know how to create code in machine code? I really want to know how things work under the innermost way! – Scientistic 3/9, 2022 at 13:21

On Windows--at least 32bit Windows--you can execute RAW INSTRUCTIONS using a .com file.

For instance, if you take this string and save it in notepad with a .com extension:

X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*

It will print a string and set off your antivirus software.

Ambitendency answered 11/3, 2011 at 1:29 Comment(4)

This is not machine code. That's just an EICAR test string to test antivirus. It will be detected as a virus for testing purposes. – Entablature 31/7, 2012 at 14:57

It's all valid X86 when run as a com file. – Ambitendency 31/7, 2012 at 23:2

Common misconception that EICAR is JUST a string. Patrick is completely correct, This program uses int 21 to print the $ terminated string "EICAR-STANDARD-ANTIVIRUS-TEST-FILE. But since this full interrupt is not ASCII printable, the code leading up to the string is self-modifying so that INT 21 can be used. Read the following link for a fascinating full step-by-step analysis: thestarman.pcministry.com/asm/eicar/eicarcom.html – Jacindajacinta 11/4, 2016 at 20:19

It should also be noted that it's really not "all valid X86 when run as a com file". It may be sent to a processor for execution, but won't guarantee a processor will be able to do anything with the code, worse still that it might run an undocumented command such as a manufacturer code which may render a device unusable (theoretically although quite improbable). – Fawkes 17/4, 2017 at 14:58

When targeting an embedded system you can make a binary image of the rom or ram that is strictly the instructions and associated data from the program. And often can write that binary into a flash/rom and run it.

Operating systems want to know more than that, and developers often want to leave more than that in their file so they can debug or do other things with it later (disassemble with some recognizable symbol names). Also, embedded or on an operating system you may need to separate .text from .data from .bss from .rodata, etc and file formats like .elf provide a mechanism for that, and the preferred use case is to load that elf with some sort of loader be it the operating system or something programming the rom and ram of a microcontroller.

.exe has some header info as well. As mentioned .com didnt it loaded at address 0x100h and branched there.

to create a raw binary from an executable, with a gcc created elf file for example you can do something like

objcopy file.elf -O binary file.bin

If the program is segmented (.text, .data, etc) and those segments are not back to back the binary can get quite large. Again using embedded as an example if the rom is at 0x00000000 and data or bss is at 0x20000000 even if your program only has 4 bytes of data objcopy will create a 0x20000004 byte file filling in the gap between .text and .data (as it should because that is what you asked it to do).

What is it you are trying to do? Reading a elf or intel hex or srec file are quite trivial and from that you can see all the bits and bytes of the binary. Or disassembling the elf or whatever will also show you that in a human readable form. (objdump -D file.elf > file.list)

Odisodium answered 23/3, 2011 at 1:7 Comment(0)

With pure machine code, you can use any language that has an ability to write files. even visual basic.net can write 8,16,32,64 bit while interchanging between the int types while it writes.

You can even set up to have vb write out machine code in a loop as needed for something like setpixel, where x,y changes and you have your argb colors.

or, create your vb.net program regularly in windows, and use NGEN.exe to make a native code file of your program. It creates pure machine code specific to ia-32 all in one shot throwing the JIT debugger aside.

Alisonalissa answered 30/5, 2012 at 9:19 Comment(0)

This are nice responses, but why someone would want to do this might guide the answer better. I think the most important reason is to get full control of their machine, especially over its cache writing, for maximum performance, and prevent any OS from sharing the processor or virtualizing your code (thus slowing it down) or especially in these days snooping on your code as well. As far as I can tell, assembler doesn't handle these issues and M$/Intel and other companies treat this like an infringement or "for hackers." This is very wrong headed however. If your assembler code is handed over to an OS or proprietary hardware, true optimization (potentially at GHz frequencies) will be out of reach. This is an very important issue with regards to science and technology, as our computers cannot be used to their full potential without hardware optimization, and are often computing several orders of magnitude below it. There probably is some workaround or some open-source hardware that enables this but I have yet to find it. Penny for anyones thoughts.

Florrieflorry answered 3/6, 2017 at 6:42 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Real Machine Code

Recommended topics

Hot tags