Why Compile to an Object File First?
Asked Answered
T

5

36

In the last year I've started programming in Fortran working at a research university. Most of my prior experience is in web languages like PHP or old ASP, so I'm a newbie to compile statements.

I have two different code I'm modifying.

One has an explicit statement creating .o files from modules (e.g. gfortran -c filea.f90) before creating the executable.

Another are creating the executable file directly (sometimes creating .mod files, but no .o files, e.g. gfortran -o executable filea.f90 fileb.f90 mainfile.f90).

  • Is there a reason (other than, maybe, Makefiles) that one method is preferred over the other?
Tubate answered 12/3, 2011 at 16:40 Comment(5)
Generally we use obj files to "cache" compilation. Making it easy to reuse old compiled code to minimize compilation time.Buller
"Code" is a non-countable noun in this context; thus, "two codes" is incorrect.Gustaf
Using "code" as a count noun for programs and libraries is an unfortunately common usage among physicists. Corrections are unlikely to take in that community.Counterstamp
@Counterstamp But we will try gosh darnit. We will try.Florida
Ha, thanks for the correction. I had never called these programs "codes" until I started working here.Tubate
F
33

Compiling to object files first is called separate compilation. There are many advantages and a few drawbacks.

Advantages:

  • easy to transform object files (.o) to libraries and link to them later
  • many people can work on different source files at the same time
  • faster compiling (you don't compile the same files again and again when the source hasn't changed)
  • object files can be made from different language sources and linked together at some later time. To do that, the object files just have to use the same format and compatible calling conventions.
  • separate compilation enables distribution of system wide libraries (either OS libraries, language standard libraries or third party libraries) either static or shared.

Drawbacks:

  • There are some optimizations (like optimizing functions away) that the compiler cannot perform, and the linker does not care about; however, many compilers now include the option to perform "link time optimization", which largely negates this drawback. But this is still an issue for system libraries and third party libraries, especially for shared libraries (impossible to optimize away parts of a component that may change at each run, however other techniques like JIT compilation may mitigate this).
  • in some languages, the programmer has to provide some kind of header for the use of others that will link with this object. For example in C you have to provide .h files to go with your object files. But it is good practice anyway.
  • in languages with text based includes like C or C++, if you change a function prototype, you have to change it in two places. Once in header file, once in the implementation file.
Frasch answered 12/3, 2011 at 16:48 Comment(10)
Some linkers can in fact perform inlining or other optimization at the assembly level.Cytosine
There's compilers that can optimize across object files. Newer VC versions do that. Nevertheless, a good answer, +1 from me.Xenia
+0 for not mentioning templates in drawbacks.Inquiry
@ybungalobill, templates? In fortran?!?Apology
@Tomalak: Frankly, I don't know whether (real) compilation is deferred until the link stage or the linker is that smart.Xenia
@SK: the question is tagged C++ too.Inquiry
@sbi: Fair point. I assume the typical build process still applies, but I never really thought about that: I guess it's possible that the lines between compilation and linking are blurred slightly by toolchains with link-time optimisation enabled.Gustaf
@ybungalobill, anyway, instantiated function templates are marked as weak symbols and collapsed by linker.Apology
One of the motivating advantages over all-in-one compilation is reduced memory overhead (linking does need more memory than compiling, but still not as much as an all-in-one compilation unless you're doing link-time optimization).Sacerdotalism
@Toby Speight: relative use of memory with linking and compiling depends much of the language, of the toolchain and of course of the source code itself. C linking historically used much less memory than compiling, the balance tipped the other way because of many factors with C++ (templates, LTO, security issues, etc)Frasch
T
15

When you have a project with a few 100 source files, you don't want to recompile all of them every time one changes. By compiling each source file into a separate object file and only recompile those source files that are affected by a change, you spend the minimum amount of time from source code change to new executable.

make is the common tool used to track such dependencies and recreate your binary when something changes. Typically you set up what each source file depends on (these dependencies can typically be generated by your compiler - in a format suitable for make), and let make handle the details of creating an up to date binary.

Tameka answered 12/3, 2011 at 16:46 Comment(0)
M
6

The .o file is the Object File. It's an intermediate representation of the final program.

Specifically, typically, the .o file has compiled code, but what it does not have is final addresses for all of the different routines or data.

One of the things that a program needs before it can be run is something similar to a memory image.

For example.

If you have your main program and it calls a routine A. (This is faux fortran, I haven't touched in decades, so work with me here.)

PROGRAM MAIN
INTEGER X,Y
X = 10
Y = SQUARE(X)
WRITE(*,*) Y
END

Then you have the SQUARE function.

FUNCTION SQUARE(N)
SQUARE = N * N
END

The are individually compiled units. You can see than when MAIN is compiled it does not KNOW where "SQUARE" is, what address it is at. It needs to know that so when it calls the microprocessors JUMP SUBROUTINE (JSR) instruction, the instruction has someplace to go.

The .o file has the JSR instruction already, but it doesn't have the actual value. That comes later in the linking or loading phase (depending on your application).

So, MAINS .o file has all of the code for main, and a list of references that it wants to resolved (notably SQUARE). SQUARE is basically stand alone, it doesn't have any references, but at the same time, it had no address as to where it exists in memory yet.

The linker will take all off the .o files and combine them in to a single exe. In the old days, compiled code would literally be a memory image. The program would start at some address and simply loaded in to RAM wholesale, and then executed. So, in the scenario, you can see the linker taking the two .o files, concatenating them together (to get SQUAREs actual address), then it would go back and find the SQUARE reference in MAIN, and fill in the address.

Modern linkers don't go quite that far, and defer much of that final processing to when the program is actually loaded. But the concept is similar.

By compiling to .o files, you end up with reusable units of logic that are then combined later by the linking and loading processes before execution.

The other nice aspect is that the .o files can come from different languages. As long as the calling mechanisms are compatible (i.e. how are arguments passed to and from functions and procedures), then once compiled in to a .o, the source language becomes less relevant. You can link, combine, C code with FORTRAN code, say.

In PHP et all, the process is different because all of the code is loaded in to a single image at runtime. You can consider the FORTRANs .o files similar to how you would use PHPs include mechanism to combine files in to a large, cohesive whole.

Maebashi answered 12/3, 2011 at 17:0 Comment(2)
Very good. This clarified a few things for me. Wish I read that earlier.Predominance
I would note that you can combine Fortran C C++ and perhaps even Ada or Go in one gcc compilation command as well. Not sure if it was the case 10 years ago, but the feature is not very new.Pestiferous
W
2

Another reason, apart from compile time, is that the compilation process is a multi-step process.

The multi-part process.

The object files are just one intermediate output from that process. They will eventually be used by the linker to produce the executable file.

Wildebeest answered 12/3, 2011 at 16:51 Comment(5)
Compilation is indeed a multi-step process, but including the assembler, linker and loader in it (even explicitly) is a source for so many misunderstandings about compilers. Only the first row in your diagram can be attributed to the process of compilation (and even that might be too much for some people/compilers).Agone
You seem to have mis-understood the question. :-)Wildebeest
The question is irrelevant in regards to my comments. I was talking about the diagram and the suggestion that it describes the steps of compilation. I don't claim against your response to the original question (which is more then fair IMO).Agone
Because I think the confusion the chart causes is greater then the information the answer gives, and I encounter this mistake way too often with pupils.Agone
OK. I disagree; but you're the wizard. :-)Wildebeest
P
1

We compile to object files to be able to link them together to form larger executables. That is not the only way to do it.

There are also compilers that don't do it that way, but instead compiles to memory and executes the result immediately. Earlier, when students had to use mainframes, this was standard. Turbo Pascal also did it this way.

Puritanical answered 12/3, 2011 at 17:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.