Why does inserting characters into an executable binary file cause it to "break"?
Asked Answered
F

3

9

Why does inserting characters into an executable binary file cause it to "break" ?

And, is there any way to add characters without breaking the compiled program?

Background

I've known for a long time that it is possible to use a hex editor to change code in a compiled executable file and still have it run as normal...

Example

As an example in the application below, Facebook could be changed to Lacebook, and the program will still execute just fine:

enter image description here

enter image description here

But it Breaks with new Characters

I'm also aware that if new characters are added, it will break the program and it won't run, or it will crash immediately. For example, adding My in front of Facebook would achieve this:

enter image description here

What I know

What I don't know

  • I don't quite understand the relationship between the operating system and the executable file. I'd guess that when you type in the name of the program and press return you are basically instructing the operating system to "execute" that file, which basically means loading the file into memory, setting the processor's pointer to it, and telling it 'Go!'
  • I understand why having extra characters in a text string of the binary file would cause problems

What I'd like to know

  1. Why do the extra characters cause the program to break?
  2. What thing determines that the program is broken? The OS? Does the OS also keep this program sandboxed so that it doesn't crash the whole system nowadays?
  3. Is there any way to add in extra characters to a text string of a compiled program via a hex editor and not have the application break?
Friede answered 31/12, 2013 at 1:8 Comment(4)
The program counter is typically absolute, so if you move stuff around, everything breaks.Burnham
Very good question but it requires a big answer. If still around I will attempt to answer, in full, later. For now, think about functions as being in memory. Every time you call a function you are telling the code to jump to a particular location. If you add extra bytes before, you shift the function's code by X bytes and hence the instructions will no longer be valid and the function call will definitely not do what you think it will. All function calls are hard-coded pointers.Razz
On Q3: No. Suppose you insert "hello" before that string "Facebook". Then every string after that shifts up by 5 positions. You'd need to find every string pointer that points to a string after your change and increment it. (And "data" is not only easy-recognizable text strings!) You'd also may have to increase the "data section" size, at various points if you're unlucky.Clearcole
If you look at the bit pattern of F (46) and L (4C), you will notice that they have the same number of bits so it looks like there is a checksum somewhere that works on bits. If you try, 34($), 43 (C) or 64 (d) it might work too. Anything else will break the checksumAdventurous
U
5

I don't quite understand the relationship between the operating system and the executable file. I'd guess that when you type in the name of the program and press return you are basically instructing the operating system to "execute" that file, which basically means loading the file into memory, setting the processor's pointer to it, and telling it 'Go!'

Modern operating systems just map the file into memory. They don't bother loading pages of it until it's needed.

Why do the extra characters cause the program to break?

Because they put all the other information in the file in the wrong place, so the loader winds up loading the wrong things. Also, jumps in the code wind up being to the wrong place, perhaps in the middle of an instruction.

What thing determines that the program is broken? The OS? Does the OS also keep this program sandboxed so that it doesn't crash the whole system nowadays?

It depends on exactly what gets screwed up. It may be that you move a header and the loader notices that some parameters in the header have invalid data.

Is there any way to add in extra characters to a text string of a compiled program via a hex editor and not have the application break?

Probably not reliably. At a minimum, you'd need to reliably identify sections of code that need to be adjusted. That can be surprisingly difficult, particularly if someone has attempted to make it so deliberately.

Udometer answered 31/12, 2013 at 1:17 Comment(0)
V
5

When a program is compiled into machine code, it includes many references to the addresses of instructions and data in the program memory. The compiler determines the layout of all the memory of the program, and puts these addresses into the program. The executable file is also organized into sections, and there's a table of contents at the beginning that contains the number of bytes in each section.

If you insert something into the program, the address of everything after that is shifted up. But the parts of the program that contain references to the program and data locations are not updated, they continue to point to the original addresses. Also, the table that contains the sizes of all the sections is no longer correct, because you increased the size of whatever section you modified.

Vermin answered 31/12, 2013 at 1:17 Comment(0)
U
5

I don't quite understand the relationship between the operating system and the executable file. I'd guess that when you type in the name of the program and press return you are basically instructing the operating system to "execute" that file, which basically means loading the file into memory, setting the processor's pointer to it, and telling it 'Go!'

Modern operating systems just map the file into memory. They don't bother loading pages of it until it's needed.

Why do the extra characters cause the program to break?

Because they put all the other information in the file in the wrong place, so the loader winds up loading the wrong things. Also, jumps in the code wind up being to the wrong place, perhaps in the middle of an instruction.

What thing determines that the program is broken? The OS? Does the OS also keep this program sandboxed so that it doesn't crash the whole system nowadays?

It depends on exactly what gets screwed up. It may be that you move a header and the loader notices that some parameters in the header have invalid data.

Is there any way to add in extra characters to a text string of a compiled program via a hex editor and not have the application break?

Probably not reliably. At a minimum, you'd need to reliably identify sections of code that need to be adjusted. That can be surprisingly difficult, particularly if someone has attempted to make it so deliberately.

Udometer answered 31/12, 2013 at 1:17 Comment(0)
V
3

The format of a machine-language executable file is based on hard offsets, rather than on parsing a byte stream (like textual program source code). When you insert a byte somewhere, the file format continues to reference information which follows the insertion point at the original offsets.

Offsets may occur in the file format itself, such as the header which tells the loader where things are located in the file and how big they are.

Hard offsets also occur in machine language itself, such in instructions which refer to the program's data or in branch instructions.

Suppose an instruction says "branch 200 bytes down from where we are now", and you insert a byte into those 200 bytes (because a character string happens to be there that you want to alter). Oops; the branch still covers 200 bytes.

On some machines, the branch couldn't even be 201 bytes even if you fixed it up because it would be misaligned and cause a CPU exception; you would have to add, say, four bytes to patch it to 204 (along with a myriad other things needed to make the file sane).

Vasiliu answered 31/12, 2013 at 1:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.