Is main() really start of a C++ program?
Asked Answered
B

12

137

The section $3.6.1/1 from the C++ Standard reads,

A program shall contain a global function called main, which is the designated start of the program.

Now consider this code,

int square(int i) { return i*i; }
int user_main()
{ 
    for ( int i = 0 ; i < 10 ; ++i )
           std::cout << square(i) << endl;
    return 0;
}
int main_ret= user_main();
int main() 
{
        return main_ret;
}

This sample code does what I intend it to do, i.e printing the square of integers from 0 to 9, before entering into the main() function which is supposed to be the "start" of the program.

I also compiled it with -pedantic option, GCC 4.5.0. It gives no error, not even warning!

So my question is,

Is this code really Standard conformant?

If it's standard conformant, then does it not invalidate what the Standard says? main() is not start of this program! user_main() executed before the main().

I understand that to initialize the global variable main_ret, the use_main() executes first but that is a different thing altogether; the point is that, it does invalidate the quoted statement $3.6.1/1 from the Standard, as main() is NOT the start of the program; it is in fact the end of this program!


EDIT:

How do you define the word 'start'?

It boils down to the definition of the phrase "start of the program". So how exactly do you define it?

Bifacial answered 24/1, 2011 at 14:54 Comment(0)
W
88

No, C++ does a lot of things to "set the environment" prior to the call of main; however, main is the official start of the "user specified" part of the C++ program.

Some of the environment setup is not controllable (like the initial code to set up std::cout; however, some of the environment is controllable like static global blocks (for initializing static global variables). Note that since you don't have full control prior to main, you don't have full control on the order in which the static blocks get initialized.

After main, your code is conceptually "fully in control" of the program, in the sense that you can both specify the instructions to be performed and the order in which to perform them. Multi-threading can rearrange code execution order; but, you're still in control with C++ because you specified to have sections of code execute (possibly) out-of-order.

Whitnell answered 24/1, 2011 at 16:11 Comment(4)
+1 for this "Note that since you don't have full control prior to main, you don't have full control on the order in which the static blocks get initialized.After main, your code is conceptually "fully in control" of the program, in the sense that you can both specify the instructions to be performed and the order in which to perform them". This also makes me to mark this answer as accepted answer... I think these are very important points, that sufficiently justifies main() as "start of the program"Bifacial
@Nawaz: note that on top of no full control over initialization order, you have no control over initialization errors: you can't catch exceptions at a global scope.Hervey
@Nawaz: What is static global blocks? will you please explain it using simple example? ThanksWalden
@meet: The objects declared at namespace level have static storage duration, and as such, these objects belonging to different translation units can be initialized in any order (because the order is unspecified by the standard). I'm not sure if that answers your question, though that is what I could say in the context of this topic.Bifacial
F
94

You are reading the sentence incorrectly.

A program shall contain a global function called main, which is the designated start of the program.

The standard is DEFINING the word "start" for the purposes of the remainder of the standard. It doesn't say that no code executes before main is called. It says that the start of the program is considered to be at the function main.

Your program is compliant. Your program hasn't "started" until main is started. The function is called before your program "starts" according to the definition of "start" in the standard, but that hardly matters. A LOT of code is executed before main is ever called in every program, not just this example.

For the purposes of discussion, your function is executed prior to the 'start' of the program, and that is fully compliant with the standard.

Farwell answered 24/1, 2011 at 16:26 Comment(10)
Sorry, but I disagree with your interpretation of that clause.Fiona
I think Adam Davis is right,"main" is more like some sort of coding restrictions.Longheaded
@LightnessRacesinOrbit I never did follow up, but to me that sentence can be logically boiled down to "a global function called main is the designated start of the program" (emphasis added). What is your interpretation of that sentence?Farwell
@AdamDavis: I don't remember what my concern was. I can't think of one now.Fiona
This looks like a much better explanation than the accepted answer.Liberality
What constructor?Nahtanha
@DonSlowik The OP provides example code with a constructor, that is the constructor my answer is referring to.Farwell
@AdamDavis int user_main() is a function that is called to initialize int main_ret not a ctor which would be called to initialize a (user defined) class. But that's still ok. Not only ctors run before main, various initialization code can run before main as described en.cppreference.com/w/cpp/language/initialization under non-local dynamic initialization 3) ordered within a translation unit.Nahtanha
@DonSlowik You are correct, mea culpa. Edited.Farwell
So cout runs before main, is it considered as running at compile-time or runtime?Menarche
W
88

No, C++ does a lot of things to "set the environment" prior to the call of main; however, main is the official start of the "user specified" part of the C++ program.

Some of the environment setup is not controllable (like the initial code to set up std::cout; however, some of the environment is controllable like static global blocks (for initializing static global variables). Note that since you don't have full control prior to main, you don't have full control on the order in which the static blocks get initialized.

After main, your code is conceptually "fully in control" of the program, in the sense that you can both specify the instructions to be performed and the order in which to perform them. Multi-threading can rearrange code execution order; but, you're still in control with C++ because you specified to have sections of code execute (possibly) out-of-order.

Whitnell answered 24/1, 2011 at 16:11 Comment(4)
+1 for this "Note that since you don't have full control prior to main, you don't have full control on the order in which the static blocks get initialized.After main, your code is conceptually "fully in control" of the program, in the sense that you can both specify the instructions to be performed and the order in which to perform them". This also makes me to mark this answer as accepted answer... I think these are very important points, that sufficiently justifies main() as "start of the program"Bifacial
@Nawaz: note that on top of no full control over initialization order, you have no control over initialization errors: you can't catch exceptions at a global scope.Hervey
@Nawaz: What is static global blocks? will you please explain it using simple example? ThanksWalden
@meet: The objects declared at namespace level have static storage duration, and as such, these objects belonging to different translation units can be initialized in any order (because the order is unspecified by the standard). I'm not sure if that answers your question, though that is what I could say in the context of this topic.Bifacial
R
25

Your program will not link and thus not run unless there is a main. However main() does not cause the start of the execution of the program because objects at file level have constructors that run beforehand and it would be possible to write an entire program that runs its lifetime before main() is reached and let main itself have an empty body.

In reality to enforce this you would have to have one object that is constructed prior to main and its constructor to invoke all the flow of the program.

Look at this:

class Foo
{
public:
   Foo();

 // other stuff
};

Foo foo;

int main()
{
}

The flow of your program would effectively stem from Foo::Foo()

Rail answered 24/1, 2011 at 14:58 Comment(9)
+1. But note that if you have multiple global objects in different translation units, this will get you in trouble quickly since the order in which the constructors are called is undefined. You can get away with singletons and lazy initialization, but in a multithreaded environment, things get very ugly quickly. In one word, don't do this in real code.Cheffetz
Whilst you should probably give main() a proper body in your code and allow it to run the execution, the concept of objects outside that start up is what a lot of LD_PRELOAD libraries are based on.Rail
Anyway, the program will not prevent the execution of main() is executed, i.e., you cannot avoid at all the execution of main. It is true that for global objects, their constructors and initializations are executed before main() (as @Alexande noted) is called by the runtime, but I just see this whole matter as an abuse of the way that the C++ compiler works.Sunshade
@Baltasarq: it can be abused in a good way. I used the technique of global objects whose constructors run before main to implement a quick and dirty testing framework within 20 lines of preprocessor macros. You put the .cpp files with your unit tests in one folder, you write a makefile which compile them all blindly and link them into one big executable, and you watch your test units execute before main automatically. Then, main collects the results.Cheffetz
@Alex: you can sometimes (usually?) control initialization order of static objects in different translation units through object file link order.Clarindaclarine
@Thomas: no, the standard says it is undefined.Cheffetz
@Alex: The standard says undefined, but as a practical matter link order (usually, depending on compiler) contols initiazation order.Clarindaclarine
@Thomas: I surely would not even remotely try to rely on that. I also surely wouldn't try to manually control the build system.Cheffetz
@Alex: not so important anymore, but back in the day we would use link order to control the build image so as to decrease physical memory paging. There are other side reasons you might want to control initization order even when it doesn't affect program semantics, such as startup performance comparison testing.Clarindaclarine
L
16

You tagged the question as "C" too, then, speaking strictly about C, your initialization should fail as per section 6.7.8 "Initialization" of the ISO C99 standard.

The most relevant in this case seems to be constraint #4 which says:

All the expressions in an initializer for an object that has static storage duration shall be constant expressions or string literals.

So, the answer to your question is that the code is not compliant to the C standard.

You would probably want to remove the "C" tag if you were only interested to the C++ standard.

Longship answered 24/1, 2011 at 15:13 Comment(8)
@Longship could you tell us what is in that section. Not all of us have C standard :).Subreption
Since you are so picky: Alas, ANSI C has been obsolete since 1989. ISO C90 or C99 are the relevant standards to cite.Marchelle
@Lundin: Nobody is ever picky enough :) I was reading ISO C99 but I'm pretty confident it applies to C90 too.Longship
@Ashot. You're right, added the sentence that I think is most relevant here.Longship
@Nawaz. Yes, so this could be useful for C programmers too.Longship
@Remo: +1 for providing the info that it's not valid C; i didn't know that. See this is how people learn, sometimes by plan, sometimes by chance!Bifacial
@Longship +1 for usefull information. So it's C99 standart yeah? Please add standart version to your answer too. Also I think, that this post is very usefull and C tag must stay.Subreption
did you compile with -ansi and -pedantic? or perhaps even -W -Wall in addition?Decalcify
N
11

Section 3.6 as a whole is very clear about the interaction of main and dynamic initializations. The "designated start of the program" is not used anywhere else and is just descriptive of the general intent of main(). It doesn't make any sense to interpret that one phrase in a normative way that contradicts the more detailed and clear requirements in the Standard.

Name answered 24/1, 2011 at 15:37 Comment(0)
M
9

The compiler often has to add code before main() to be standard compliant. Because the standard specifies that initalization of globals/statics must be done before the program is executed. And as mentioned, the same goes for constructors of objects placed at file scope (globals).

Thus the original question is relevant to C as well, because in a C program you would still have the globals/static initialization to do before the program can be started.

The standards assume that these variables are initialized through "magic", because they don't say how they should be set before program initialization. I think they considered that as something outside the scope of a programming language standard.

Edit: See for example ISO 9899:1999 5.1.2:

All objects with static storage duration shall be initialized (set to their initial values) before program startup. The manner and timing of such initialization are otherwise unspecified.

The theory behind how this "magic" was to be done goes way back to C's birth, when it was a programming language intended to be used only for the UNIX OS, on RAM-based computers. In theory, the program would be able to load all pre-initialized data from the executable file into RAM, at the same time as the program itself was uploaded to RAM.

Since then, computers and OS have evolved, and C is used in a far wider area than originally anticipated. A modern PC OS has virtual addresses etc, and all embedded systems execute code from ROM, not RAM. So there are many situations where the RAM can't be set "automagically".

Also, the standard is too abstract to know anything about stacks and process memory etc. These things must be done too, before the program is started.

Therefore, pretty much every C/C++ program has some init/"copy-down" code that is executed before main is called, in order to conform with the initialization rules of the standards.

As an example, embedded systems typically have an option called "non-ISO compliant startup" where the whole initialization phase is skipped for performance reasons, and then the code actually starts directly from main. But such systems don't conform to the standards, as you can't rely on the init values of global/static variables.

Marchelle answered 24/1, 2011 at 15:47 Comment(0)
A
4

main() is a user function called by the C runtime library.

see also: Avoiding the main (entry point) in a C program

Anthe answered 24/1, 2011 at 14:59 Comment(0)
K
4

Your "program" simply returns a value from a global variable. Everything else is initialization code. Thus, the standard holds - you just have a very trivial program and more complex initialization.

Kala answered 24/1, 2011 at 15:14 Comment(0)
P
2

Seems like an English semantics quibble. The OP refers to his block of code first as "code" and later as the "program." The user writes the code, and then the compiler writes the program.

Predicament answered 27/6, 2014 at 1:31 Comment(0)
H
2

Ubuntu 20.04 glibc 2.31 RTFS + GDB

glibc does some setup before main so that some of its functionalities will work. Let's try to track down the source code for that.

hello.c

#include <stdio.h>

int main() {
    puts("hello");
    return 0;
}

Compile and debug:

gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o hello.out hello.c
gdb hello.out

Now in GDB:

b main
r
bt -past-main

gives:

#0  main () at hello.c:3
#1  0x00007ffff7dc60b3 in __libc_start_main (main=0x555555555149 <main()>, argc=1, argv=0x7fffffffbfb8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffbfa8) at ../csu/libc-start.c:308
#2  0x000055555555508e in _start ()

This already contains the line of the caller of main: https://github.com/cirosantilli/glibc/blob/glibc-2.31/csu/libc-start.c#L308.

The function has a billion ifdefs as can be expected from the level of legacy/generality of glibc, but some key parts which seem to take effect for us should simplify to:

# define LIBC_START_MAIN __libc_start_main

STATIC int
LIBC_START_MAIN (int (*main) (int, char **, char **),
         int argc, char **argv,
{

      /* Initialize some stuff. */

      result = main (argc, argv, __environ MAIN_AUXVEC_PARAM);
  exit (result);
}

Before __libc_start_main are are already at _start, which by adding gcc -Wl,--verbose we know is the entry point because the linker script contains:

ENTRY(_start)

and is therefore is the actual very first instruction executed after the dynamic loader finishes.

To confirm that in GDB, we an get rid of the dynamic loader by compiling with -static:

gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o hello.out hello.c
gdb hello.out

and then make GDB stop at the very first instruction executed with starti and print the first instructions:

starti
display/12i $pc

which gives:

=> 0x401c10 <_start>:   endbr64 
   0x401c14 <_start+4>: xor    %ebp,%ebp
   0x401c16 <_start+6>: mov    %rdx,%r9
   0x401c19 <_start+9>: pop    %rsi
   0x401c1a <_start+10>:        mov    %rsp,%rdx
   0x401c1d <_start+13>:        and    $0xfffffffffffffff0,%rsp
   0x401c21 <_start+17>:        push   %rax
   0x401c22 <_start+18>:        push   %rsp
   0x401c23 <_start+19>:        mov    $0x402dd0,%r8
   0x401c2a <_start+26>:        mov    $0x402d30,%rcx
   0x401c31 <_start+33>:        mov    $0x401d35,%rdi
   0x401c38 <_start+40>:        addr32 callq 0x4020d0 <__libc_start_main>

By grepping the source for _start and focusing on x86_64 hits we see that this seems to correspond to sysdeps/x86_64/start.S:58:


ENTRY (_start)
    /* Clearing frame pointer is insufficient, use CFI.  */
    cfi_undefined (rip)
    /* Clear the frame pointer.  The ABI suggests this be done, to mark
       the outermost frame obviously.  */
    xorl %ebp, %ebp

    /* Extract the arguments as encoded on the stack and set up
       the arguments for __libc_start_main (int (*main) (int, char **, char **),
           int argc, char *argv,
           void (*init) (void), void (*fini) (void),
           void (*rtld_fini) (void), void *stack_end).
       The arguments are passed via registers and on the stack:
    main:       %rdi
    argc:       %rsi
    argv:       %rdx
    init:       %rcx
    fini:       %r8
    rtld_fini:  %r9
    stack_end:  stack.  */

    mov %RDX_LP, %R9_LP /* Address of the shared library termination
                   function.  */
#ifdef __ILP32__
    mov (%rsp), %esi    /* Simulate popping 4-byte argument count.  */
    add $4, %esp
#else
    popq %rsi       /* Pop the argument count.  */
#endif
    /* argv starts just at the current stack top.  */
    mov %RSP_LP, %RDX_LP
    /* Align the stack to a 16 byte boundary to follow the ABI.  */
    and  $~15, %RSP_LP

    /* Push garbage because we push 8 more bytes.  */
    pushq %rax

    /* Provide the highest stack address to the user code (for stacks
       which grow downwards).  */
    pushq %rsp

#ifdef PIC
    /* Pass address of our own entry points to .fini and .init.  */
    mov __libc_csu_fini@GOTPCREL(%rip), %R8_LP
    mov __libc_csu_init@GOTPCREL(%rip), %RCX_LP

    mov main@GOTPCREL(%rip), %RDI_LP
#else
    /* Pass address of our own entry points to .fini and .init.  */
    mov $__libc_csu_fini, %R8_LP
    mov $__libc_csu_init, %RCX_LP

    mov $main, %RDI_LP
#endif

    /* Call the user's main function, and exit with its value.
       But let the libc call main.  Since __libc_start_main in
       libc.so is called very early, lazy binding isn't relevant
       here.  Use indirect branch via GOT to avoid extra branch
       to PLT slot.  In case of static executable, ld in binutils
       2.26 or above can convert indirect branch into direct
       branch.  */
    call *__libc_start_main@GOTPCREL(%rip)

which ends up calling __libc_start_main as expected.

Unfortunately -static makes the bt from main not show as much info:

#0  main () at hello.c:3
#1  0x0000000000402560 in __libc_start_main ()
#2  0x0000000000401c3e in _start ()

If we remove -static and start from starti, we get instead:

=> 0x7ffff7fd0100 <_start>:     mov    %rsp,%rdi
   0x7ffff7fd0103 <_start+3>:   callq  0x7ffff7fd0df0 <_dl_start>
   0x7ffff7fd0108 <_dl_start_user>:     mov    %rax,%r12
   0x7ffff7fd010b <_dl_start_user+3>:   mov    0x2c4e7(%rip),%eax        # 0x7ffff7ffc5f8 <_dl_skip_args>
   0x7ffff7fd0111 <_dl_start_user+9>:   pop    %rdx

By grepping the source for _dl_start_user this seems to come from sysdeps/x86_64/dl-machine.h:L147

/* Initial entry point code for the dynamic linker.
   The C function `_dl_start' is the real entry point;
   its return value is the user program's entry point.  */
#define RTLD_START asm ("\n\
.text\n\
    .align 16\n\
.globl _start\n\
.globl _dl_start_user\n\
_start:\n\
    movq %rsp, %rdi\n\
    call _dl_start\n\
_dl_start_user:\n\
    # Save the user entry point address in %r12.\n\
    movq %rax, %r12\n\
    # See if we were run as a command with the executable file\n\
    # name as an extra leading argument.\n\
    movl _dl_skip_args(%rip), %eax\n\
    # Pop the original argument count.\n\
    popq %rdx\n\

and this is presumably the dynamic loader entry point.

If we break at _start and continue, this seems to end up in the same location as when we used -static, which then calls __libc_start_main.

When I try a C++ program instead:

hello.cpp

#include <iostream>

int main() {
    std::cout << "hello" << std::endl;
}

with:

g++ -ggdb3 -O0 -std=c++11 -Wall -Wextra -pedantic -o hello.out hello.cpp

the results are basically the same, e.g. the backtrace at main is the exact same.

I think the C++ compiler is just calling into hooks to achieve any C++ specific functionality, and things are pretty well factored across C/C++.

TODO:

Heptangular answered 29/9, 2020 at 9:19 Comment(0)
I
1

main is called after initializing all the global variables.

What the standard does not specify is the order of initialization of all the global variables of all the modules and statically linked libraries.

Inspect answered 24/1, 2011 at 22:36 Comment(0)
A
0

Yes, main is the "entry point" of every C++ program, excepting implementation-specific extensions. Even so, some things happen before main, notably global initialization such as for main_ret.

Aeroembolism answered 24/1, 2011 at 15:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.