How to make a C++ EXE larger (artificially)
Asked Answered
T

19

17

I want to make a dummy Win32 EXE file that is much larger than it should be. So by default a boiler plate Win32 EXE file is 80 KB. I want a 5 MB one for testing some other utilities.

The first idea is to add a resource, but as it turns out embedded resources are not the same as 5 MB of code when it comes to memory allocation. I am thinking I can reference a large library and end up with a huge EXE file? If not, perhaps scripting a few thousand similar methods like AddNum1, AddNum2, etc., etc.?

Any simple ideas are very appreciated.

Templet answered 1/10, 2010 at 15:0 Comment(5)
Could you give us an idea on what problem you are looking to solve?Forsterite
The question makes it not clear what the purpose is. So the answer will not help much to others.Exempt
To add more contect to the question:Templet
I am calling CreateProcess. When doing so, i need it to allocate more memory than a simple (empty) win32 project. In this case, I want CreateProcess to load in the target win32 exe and allocate 5MB of memory to it.Templet
Please use the edit button to add more detail to your post.Feverroot
P
5

Use a big array of constant data, like explicit strings:

char *dummy_data[] = {
    "blajkhsdlmf..(long script-generated random string)..",
    "kjsdfgkhsdfgsdgklj..(etc...)...jldsjglkhsdghlsdhgjkh",
};

Unlike variable data, constant data often falls in the same memory section as the actual code, although this may be compiler- or linker-dependent.

Edit: I tested the following and it works on Linux:

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    int i, j;

    puts("char *dummy_data[] = {");
    for (i = 0; i < 5000; i++) {
        fputs("    \"", stdout);
        for (j = 0; j < 1000; j++) putchar('a' + rand() % 26);
        puts("\",");
    }
    puts("};");
    return 0;
}

Both this code and its output compile cleanly.

Pepita answered 1/10, 2010 at 15:18 Comment(3)
I tried something like this and ended up with a C2026 error. Looks like there is a 16K limit on arrays?Templet
If your strings are 1K long, then you only need 5K elements in the array, which makes the array size 20K (it's an array of pointers to constant strings).Pepita
I would've just used inline assembly to create a NOP slide somewhere. It would be cleaner, more semantically correct, and somewhat more self documenting. Not to mention that it's less likely to get messed up by the compiler.Donn
W
18

What about simply defining a large static char array?

char const bigarray[5*1024*1024] = { 1 };

See also my other answer in this thread where I suggest statically linking to big libraries. This surely will pull in real code if you just reference enough code of the libraries.

EDIT: Added a non-zero initialization, as data containing zeros only is treated in an optimized fashion by the compiler/linker.

EDIT: Added reference to my other answer.

EDIT: Added const qualifier, so bigarray will be placed amongst code by many compilers.

Womble answered 1/10, 2010 at 15:2 Comment(11)
Not quite what I want to do. i want the exe on disk to be larger, not the memory usage. Thank you though.Templet
If you never use it, it should never get loaded into physical memory. So unless you're concerned about the impact it has on the available virtual address space, don't worry about it.Gait
@Phil: You say you want the size larger only on the disk and not the memory usage but then in the actual question, you say memory allocation should be 5MiB. Am I missing something?Standpipe
Tyler, i retract my initial response :) This looks like it may work well. testing...Templet
@legends2k: Sorry, the size on disk is important too, I am using Createprocess which allocates memory. I need it to allocate as much memory as the size of the file on disk.Templet
Does it matter whether it's 5MB of code or 5MB of data? Generating 5MB of code is a lot harder.Manon
AFAIUnderstand, this code will not make the executable bigger, only the memory allocated at runtime?Shopkeeper
@Klaim, static POD objects are allocated at link time which means they are in the executable.Womble
Is it true for all compilers?Shopkeeper
@Shopkeeper I know of no exception. It's also true that many compilers will place const static POD objects together with code in a read-only section. I added the const in my code example now.Womble
Thanks, I thought it was not guaranteed but now that I think about my experience in embedded software, I remember that the size of the executable was dependent on the number of elements in a const static table... Thanks for the confirmation.Shopkeeper
M
9
char big[5*1024*1024] = {1};

You need to initialize it to something other than 0 or the compiler/linker may optimize it.

Manon answered 1/10, 2010 at 15:10 Comment(2)
This will only initialize the first element, the rest will be zero. #201601Platinocyanide
That's true, but for the purposes of this question it doesn't matter exactly what it's initialized to. Setting the first element to a non-zero value seems to be enough to prevent the compiler from optimizing that variable. In other words when you set it to all zeros the compiler simply says "there should be 5 million zeroes here". Whereas this forces it to say "there's a one, followed by a zero, followed by a zero..."Manon
J
9

If it's the file size you want to increase then append a text file to the end of the exe of the required size.

I used to do this when customers would complain of small exes. They didn't realize that small exes are just as professional as larger exes. In fact in some languages there is a bloat() command to increase the size of exes, usually in BASIC compilers.

EDIT: Found an old link to a piece of code that people use: http://www.purebasic.fr/english/viewtopic.php?f=12&t=38994

An example: https://softwareengineering.stackexchange.com/questions/2051/what-is-the-craziest-stupidest-silliest-thing-a-client-boss-asked-you-to-do/2698#2698

Joyance answered 1/10, 2010 at 15:45 Comment(5)
What??!! Customers complaining of small EXEs? I don't think I've ever dealt with a customer that dumb.Polythene
Yep, believe it or not! it's similar to a heavy camera. The heavier it is, the 'better' it must be! Bloat initial program releases and with each successive update claim smaller memory footprints due to further optimizations! ;)Joyance
Isn't there some checksum validation for EXEs that will fail if you append a file?Roundish
Not unless you've programmed the exe to check itself.Joyance
There is one advantage to heavy cameras: they are less prone to camera shake (Newton's F=ma and all that!). Can't really say the same about large EXEs though :-)Samarium
R
8

Fill the EXE file with NOPs in assembler.

Rozele answered 1/10, 2010 at 15:32 Comment(0)
P
6

How about just adding binary zeroes to the end of the .exe?

Polacca answered 1/10, 2010 at 15:20 Comment(1)
Why not add some hex zeros? Those ones are bigger :P.Koch
C
5

You can create big static arrays of dummy data. That would bump your exe size, would not be real code though.

Crux answered 1/10, 2010 at 15:2 Comment(8)
That does seem like the simplest and easiest way to control method to do something like this.Pelvic
I thought of this too when I saw the question, but won't it be optimized out?Standpipe
I was thinking of including a boatload of windows libraries to make it bigger. Any merit to that?Templet
@Templet maybe if you can link them statically. Otherwise it's just a bunch of exports/references.Tootle
@sth: I know one can turn optimizations off, but I think there should be some other way to do it, without losing optimizations; like adding a resource binary say a 5 MB res via a .rc to the binary.Standpipe
Reason is, when testing, optimizations might be needed i.e. to match the actual non-bloated code. Also rc should work since OP said it's Win32.Standpipe
Resource will not work. Somehow the Win32 CreateProcess method KNOWS resources are not allocated in the same memory space.Templet
Powerbasic has a compiler macro named #BLOAT that does this. Maybe other compilers can do this as well? I know people use this technique on trojans to attempt to match the real app's size.Tootle
P
5

Use a big array of constant data, like explicit strings:

char *dummy_data[] = {
    "blajkhsdlmf..(long script-generated random string)..",
    "kjsdfgkhsdfgsdgklj..(etc...)...jldsjglkhsdghlsdhgjkh",
};

Unlike variable data, constant data often falls in the same memory section as the actual code, although this may be compiler- or linker-dependent.

Edit: I tested the following and it works on Linux:

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    int i, j;

    puts("char *dummy_data[] = {");
    for (i = 0; i < 5000; i++) {
        fputs("    \"", stdout);
        for (j = 0; j < 1000; j++) putchar('a' + rand() % 26);
        puts("\",");
    }
    puts("};");
    return 0;
}

Both this code and its output compile cleanly.

Pepita answered 1/10, 2010 at 15:18 Comment(3)
I tried something like this and ended up with a C2026 error. Looks like there is a 16K limit on arrays?Templet
If your strings are 1K long, then you only need 5K elements in the array, which makes the array size 20K (it's an array of pointers to constant strings).Pepita
I would've just used inline assembly to create a NOP slide somewhere. It would be cleaner, more semantically correct, and somewhat more self documenting. Not to mention that it's less likely to get messed up by the compiler.Donn
S
3

I've found that even with optimizations, raw strings are kept as is in the compiled executable file.

So the way to go is :

  • go to http://lipsum.org/
  • generate a lot of text
  • add a cpp in your program
  • add a static const string that will have the generated text as value
  • compile
  • check the size.

If your compiler have a limit of raw string size (?) then just make a paragraph per static string.

The added size should be easy to guess.

Shopkeeper answered 1/10, 2010 at 15:29 Comment(0)
D
3

You could try creating some sort of recursive template that would generate a lot of different instantiations. This could possibly cause a big increase in code size.

Dotted answered 1/10, 2010 at 15:30 Comment(1)
Also compilation time; templates are one of the biggest reasons C++ compiles so slowly.Radiosonde
P
2

Use Boost and compile the executable with debug information.

Pearlinepearlman answered 1/10, 2010 at 15:48 Comment(0)
R
1

Write a program that generates a lot of code.

printf("000000000");
printf("000000001");
// ...
printf("010000000");
Roundish answered 1/10, 2010 at 15:15 Comment(2)
Yes that's the most obvious way to produce a lot of extra code (as opposed to just static data). You can also do it using copy-and-paste, leaning on the paste key.Snort
@ChrisW: if you're using copy-and-paste, exponential copy-and-paste is better than leaning on a key: Ctrl-A,C,V,V, repeat log(n) timesRoundish
I
1

If all else fails, you could still create an assembly language source file where you have an appropriate number of db statements emitting bytes into the code segment, and link the resulting code object to your program as extern "C" { ... }.

You might need to play with the compiler/linker to prevent the linker from optimizing away that dummy "code" object.

Impropriety answered 1/10, 2010 at 15:28 Comment(0)
W
1

I admit, I'm a Linux/UNIX guy. Is it possible to statically link an executable in Windows? You then could reference some heavy libs and blow up your code size as much as you want without writing to much code by yourself.

Another idea I pondered while reading your comment to my first answer is appending zeros to your file. As said, I'm no Windows expert, so this might not work.

Womble answered 1/10, 2010 at 15:36 Comment(2)
"Is it possible to statically link an executable in Windows?" -- Yes it is, but the linker will only link/include the objects from the library which are needed (referenced) by the application.Snort
@Snort : There may be an option like "--whole-archive" for ld in linux to force the linker to include everything ??Ecbolic
S
1

Add a 5MB (bmp) image.

Shimberg answered 1/10, 2010 at 16:42 Comment(0)
H
1

After you do all the methods listed here, compile with the debug flag and with the highest optimization flag (gcc -g -O3).

Hermaphroditus answered 1/10, 2010 at 20:13 Comment(0)
M
0

Use #define to define lots of macros which holds string with huge length, and use those macros inside your program in many places.

Machinery answered 1/10, 2010 at 15:32 Comment(0)
H
0

You could do this:

REM generate gibberish of the desired size
dd if=/dev/random of=entropy count=5000k bs=1
REM copy the entropy to the end of the file
copy /b someapp.exe + entropy somefatapp.exe

If it were a batch file, you could even add it as a post compilation step so it happened automatically.

You can generally copy as much information as you want to the end of an exe. All the code / resources are stored as offsets from the beginning of the file, so increasing it's size shouldn't affect it.

(I'm assuming you have dd in Windows. If not, get it).

Hydrodynamic answered 1/10, 2010 at 17:4 Comment(0)
A
0

Write a code generator that generates arbitrary random functions. The only trick then is making sure that it doesn't get optimized out and with separate compilation that shouldn't be hard.

Apple answered 1/10, 2010 at 20:7 Comment(0)
A
0

Statically link wxWidgets to your application. It will instantly become 5 MB large.

Afton answered 1/10, 2010 at 20:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.