Why are generated binaries so large?

Asked 9/5, 2011 at 6:41 Answered 9/5, 2011 at 7:6

Why are the binaries that are generated when I compile my C++ programs so large (as in easily 10 times the size of the source code files)? What advantages does this offer over interpreted languages for which such compilation is not necessary (and thus the program size is only the size of the code files)?

Parnas answered 9/5, 2011 at 6:41 Comment(3)

and what mode - debug/release etc.? This can have a significant impact on the size of the binary - also whether you statically link libraries can have an impact as well. – Jellify 9/5, 2011 at 7:23

What headers do you include? Every bit of included template code gets compiled into your executable and the standard headers are filled with templates. – Ase 9/5, 2011 at 9:0

interpreted languages need something else though: they need the interpreter. Now, you can write a "hello world" in C which takes a couple of kilobytes, or you can write it in Python, which takes maybe 100 byte... plus a 3MB interpreter. There's no such thing as a free lunch :) – Lolalolande 9/5, 2011 at 9:20

Modern interpreted languages do typically compile the code to some manner of representation for faster execution... it might not get written out to disk, but there's certainly no guarantee that the program is represented in a more compact form. Some interpreters go the whole hog and generate machine code anyway (e.g. Java JIT). Then there's the interpreter itself sitting in memory which can be large.

A few points:

The more sophisticated the commands in the source code, the more machine code operations might be required to execute them. Thus, higher level language features tend to have a higher ratio of compiled-code to source code. That's not necessarily a bad thing: think of it as "I only have to say a little about what I want done and it infers all those necessary steps". The challenge in programming is to ensure they are necessary - that requires good library and program design.
The compiler often deliberately decides to trade some executable size for faster expected execution speed: inline vs out-of-line code is part of this compromise, though for small functions neither may be consistently more compact.
More sophisticated run-time environments (e.g. adding support for C++ exceptions) can involve a bit of extra code that runs when the program first starts to construct the necessary environment for that language feature.
Libraries feature may not be comparable. As well as the sort of add-on libraries you're very likely to have had to track down yourself and be very aware of using (e.g. XML, PDF parsing, OpenGL), languages often quietly use supporting libraries for what seem like language features and functions. Any of these can be suprisingly large.
- For example, many interpreters just expose the C library's printf() statement or something similar, while for output formatting C++ has ostream - a more complex, extensible and type-safe system with (for better or worse) persistent state across function calls, routines to query and set that state, an additional layer of customisable buffering, customisable character types and localisation, and generally a lot of small inline functions that can lead to smaller or larger programs depending on the exact use and compiler settings. What's best depends on your application and memory vs performance goals.
Inbuilt language statements may be compiled differently: a switch on an integer expression and have 100 case labels spread randomly between 1 and 1000: one compiler/languages might decide to "pack" the 100 cases and do a binary search for a match, another to use a sparsely populated array of 1000 elements and do direct indexing (which wastes space in the executable but typically makes for faster code). So, it's hard to draw conclusions based on executable size.

Typically, memory usage and execution speed become increasingly important as the program gets larger and more complex. You don't see systems like Operating Systems, enterprise web servers or full-featured commercial word processors written in interpreted languages because they don't have the scalability.

Sundowner answered 9/5, 2011 at 7:3 Comment(0)

Interpreted languages assume an interpreter is available while compiled programs are in most cases standalone.

Dichloride answered 9/5, 2011 at 6:49 Comment(5)

To expand, this means that an interpreted "program" expects all the libraries, etc to exist on the system already, ergo your "program" will only have your code. Languages like C++ will store the code you have referenced in your binary, so that it is a self contained unit. – Eyewash 9/5, 2011 at 6:57

@William: compiled code often depends on shared libraries / DLLs at run-time, so it's not quite that clear-cut. – Sundowner 9/5, 2011 at 7:4

@Tony: so do interpreted languages. The interpreter often depends on much the same shared libs/DLL's, so if you add up all the dependencies, an interpreted language will generally depend on more code – Lolalolande 9/5, 2011 at 9:20

@Tony: Yes, you're right. It's been a while since I did unmanaged code and I was thinking about statically linked libraries. – Eyewash 9/5, 2011 at 9:24

@jalf: definitely that's the general trend - just saying it's not totally black and white. @William: no worries. Cheers. – Sundowner 9/5, 2011 at 10:2

Take a trivial case: Suppose you have a one line program

print("hello world")

what does that "print" do? Surely it's clear that your asking some other code to do some work? And that code isn't free, the sum total of what needs to run is much more than the lines of code you write. In more realistic programs you exploit many sophisticated libraries managing windows and other UI features, networks, databases and so on. Now whether that code is bundled into your application or loaded from DLLs or is present in the interpreter it's got to be somewhere.

There are plenty of trades-off between compilation and interpretation, and intermediate solutions such as Java's compilation/byte-code interpreatation approach. For example, you might consider

the run-time cost of interpreting the source every time you run versus running the compiled code
the portability advantages of interpreters - you need to compile separate versions of an app for different platforms.

Snowbird answered 9/5, 2011 at 7:0 Comment(0)

Usually, programs are written in higher level languages, for these programs to be executed by the CPU, the programs have to be converted to machine code. This conversion is done by a Compiler or an Interpreter.

A Compiler makes the conversion just once, while an Interpreter typically converts it every time a program is executed.

Interpreted programs run much slower than compiled programs because the interpreter must analyze each statement in the program each time it is executed and then perform the desired action, whereas the compiled code just performs the action within a fixed context determined by the compilation(which is the reason for presence of large sized binary files).

Another disadvantage of Interpreters is that they must be present in the enviornment as additional software to run the source code.

Zinnia answered 9/5, 2011 at 7:6 Comment(0)

Recommended topics

Hot tags