Where are the the argv strings of the main function's parameters located?

M

10

63

In C/C++, the main function receives parameters which are of type char*.

int main(int argc, char* argv[]){
  return 0;
}

argv is an array of char*, and points to strings. Where are these string located? Are they on the heap, stack, or somewhere else?

Manche answered 16/11, 2010 at 16:6 Comment(5)

I'm not getting the downvotes. There are no votes to close, and this seems to be a perfectly reasonable question. Downvoters, please epxlain. – Salome 16/11, 2010 at 16:12

Probably the downvotes are for asking a question that's implementation-specific without mentioning an implementation. FWIW I'm not one of the downvoters. – Flan 16/11, 2010 at 16:28

@R..: How does one know it's implementation-specific until one asks the question and gets an answer that says it's implementation-specific? – Musicology 16/11, 2010 at 16:44

@Fred: That's why I didn't join the downvoters. :-) – Flan 16/11, 2010 at 18:32

I'm curious why you want to know where these are. Idle curiosity, or some "good" reason? – Million 16/11, 2010 at 18:58

S

19

It's actually a combination of compiler dependence and operating system dependence. main() is a function just like any other C function, so the location of the two parameters argc and argv will follow standard for the compiler on the platform. e.g. for most C compilers targeting x86 they will be on the stack just above the return address and the saved base pointer (the stack grows downwards, remember). On x86_64 parameters are passed in registers, so argc will be in %edi and argv will be in %rsi. Code in the main function generated by the compiler then copies them to the stack, and that is where later references point. This is so the registers can be used for function calls from main.

The block of char*s that argv points to and the actual sequences of characters could be anywhere. They will start in some operating system defined location and may be copied by the pre-amble code that the linker generates to the stack or somewhere else. You'll have to look at the code for exec() and the assembler pre-amble generated by the linker to find out.

Stereotype answered 16/11, 2010 at 16:59 Comment(4)

"main() is a function just like any other C function" Not in c++, it's illegal to call it from another function, and even though it's declared to return int you don't actually need to return anything – Hiero 16/11, 2010 at 17:27

@John, @JeremyP, main is also not like any other function in C. At least C99 also permits to omit the return statement and clearly defines what is happening, then. – Artillery 16/11, 2010 at 20:18

@John, @Jens, in terms of the way the arguments are passed, main() is the same as any other function even if other special semantics are also defined by the relevant standards. – Stereotype 17/11, 2010 at 9:28

I'm not convinced they have to be. They could not be "passed" to main at all, and the compiler could insert code at the start of main to go retrieve them. As you can't call main yourself it could do this and you'd never know. It probably doesn't in practice though – Hiero 17/11, 2010 at 11:40

C

31

Here's what the C standard (n1256) says:

5.1.2.2.1 Program startup
...
2 If they are declared, the parameters to the main function shall obey the following constraints:

The value of argc shall be nonnegative.

argv[argc] shall be a null pointer.

If the value of argc is greater than zero, the array members argv[0] through argv[argc-1] inclusive shall contain pointers to strings, which are given implementation-defined values by the host environment prior to program startup. The intent is to supply to the program information determined prior to program startup from elsewhere in the hosted environment. If the host environment is not capable of supplying strings with letters in both uppercase and lowercase, the implementation shall ensure that the strings are received in lowercase.

If the value of argc is greater than zero, the string pointed to by argv[0] represents the program name; argv[0][0] shall be the null character if the program name is not available from the host environment. If the value of argc is greater than one, the strings pointed to by argv[1] through argv[argc-1] represent the program parameters.

The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.

The last bullet is the most interesting wrt where the string values are stored. It doesn't specify heap or stack, but it does require that the strings be writable and have static extent, which places some limits on where the string contents may be located. As others have said, the exact details will depend on the implementation.

Costmary answered 16/11, 2010 at 17:9 Comment(2)

Interesting thing I never noticed... while argv (the pointer-to-pointer argument) and the strings pointed to are modifiable, the standard does not seem to indicate that the array of pointers is modifiable. As a consequence, use of GNU getopt (with its argv permutation) on a system where it's not explicitly allowed to modify the argv array is likely undefined behavior. – Flan 16/11, 2010 at 18:39

@R: It is just not mentioned, and thus it is not specified by the standard what would be happen when you change it, yes. But this doesn't mean that it is UB, but just that it is implementation specific. And since the specification is char** argv and not char*const* argv one might have the idea that it may be modified. This is different from the situation of string literals where standard explicitly states that changing them is UB. – Artillery 16/11, 2010 at 20:29

L

30

They are compiler magic, and implementation-dependent.

Lazarus answered 16/11, 2010 at 16:8 Comment(6)

+1: This is pretty much as close to a non-super-detailed answer you're going to get... – Coaction 16/11, 2010 at 16:11

Gotta love it how SO always seems to upvote the "witty" non-answer instead of the ones actually providing useful information, background or examples. – Blanket 2/7, 2013 at 11:59

Ah, please don't take it personally, I really didn't mean to bash you or your answer at all. I guess I should've worded that more carefully in my previous comment - sorry about that. I was merely wondering why this kind of answers tend to get the most upvotes instead of more comprehensive (and often more useful) answers explaining the situation in greater detail - even if a complete explanation is not feasible like here. – Blanket 12/7, 2013 at 12:33

Fair enough. I can give you my personal answer to that: a lot of times, the "proper" answer (like John Bode's, below) makes the average questioner's eyes glaze over -- hard. My initial "answer" would be "why the hell do you want to know?", but that never seems to work -- so this is my compromise. And for a lot of people, it seems to work just fine. – Lazarus 24/7, 2013 at 20:58

Such reasons may be lengthy (almost won't fit in Comment). For example for me 1. link, which almost state c++ devs would write public static void main(String[] args)". 2. Me: "no relevance for that info" + my c++ knowledge limited to g++ -std=c++11 would fail to compile (need char**) -> made me find link 3. Me: Would lack of memory for CLI args behave same if no mem for char ** param, vs std::vector allocation? – Gilbart 15/11, 2017 at 15:54

-1 for not being specific enough. Any sufficiently advanced technology is considered as magic by someone who does not understand it, or does not have the patience to dig into the details. It's your job as the answerer to explain these details to us. – Hedgehop 29/12, 2018 at 5:50

S

19

It's actually a combination of compiler dependence and operating system dependence. main() is a function just like any other C function, so the location of the two parameters argc and argv will follow standard for the compiler on the platform. e.g. for most C compilers targeting x86 they will be on the stack just above the return address and the saved base pointer (the stack grows downwards, remember). On x86_64 parameters are passed in registers, so argc will be in %edi and argv will be in %rsi. Code in the main function generated by the compiler then copies them to the stack, and that is where later references point. This is so the registers can be used for function calls from main.

The block of char*s that argv points to and the actual sequences of characters could be anywhere. They will start in some operating system defined location and may be copied by the pre-amble code that the linker generates to the stack or somewhere else. You'll have to look at the code for exec() and the assembler pre-amble generated by the linker to find out.

Stereotype answered 16/11, 2010 at 16:59 Comment(4)

"main() is a function just like any other C function" Not in c++, it's illegal to call it from another function, and even though it's declared to return int you don't actually need to return anything – Hiero 16/11, 2010 at 17:27

@John, @JeremyP, main is also not like any other function in C. At least C99 also permits to omit the return statement and clearly defines what is happening, then. – Artillery 16/11, 2010 at 20:18

@John, @Jens, in terms of the way the arguments are passed, main() is the same as any other function even if other special semantics are also defined by the relevant standards. – Stereotype 17/11, 2010 at 9:28

I'm not convinced they have to be. They could not be "passed" to main at all, and the compiler could insert code at the start of main to go retrieve them. As you can't call main yourself it could do this and you'd never know. It probably doesn't in practice though – Hiero 17/11, 2010 at 11:40

L

8

The answer to this question is compiler-dependent. This means it is not treated in the C standard, so anyone can implement that as he or she would like to. This is normal since also operating systems don't have a common accepted, standard way to start processes and finish them.

Let's imagine a simple, why-not scenario.

The process receives by some mechanism the arguments written in the command line. argc is then just an int which is pushed to the stack by the bootstrap function the compiler put as the entry point for the process of the program (part of the runtime). The actual values are get from the operating system, and can be, say, written in a memory block of the Heap. Then the argv vector is built and the address to its first position also pushed into the stack.

Then the function main(), which must be provided by the programmer, is called, and its return value is saved for later (nearly inmediate) use. The structures in the Heap are freed, and the exit code obtained for main is exported to the operating system. The process finishes.

Letty answered 16/11, 2010 at 16:29 Comment(0)

I

3

These parameters are no different than any other function's parameters. If the architecture's calling sequence requires parameters to go through stack they are on stack. If, like on, x86-64 some parameters go in registers these also go in registers.

Isopleth answered 16/11, 2010 at 16:11 Comment(3)

Not sure this is necessarily true in c++. You can't call main as a normal function in c++ unlike in C and therefore the compiler can make different arrangements for passing the paramters if it likes. – Hiero 16/11, 2010 at 16:13

The strings are not parameters though, the parameter is a pointer to an array of pointers to the strings. – Mythology 16/11, 2010 at 16:14

Probably true of argc and argv themselves, but I think the question is more about argv[0] and friends. – Haplo 16/11, 2010 at 16:17

F

3

As pmg mentions, when main is called recursively, it's up to the caller where the arguments point to. Basically the answer is the same on the original invocation of main, except that the "caller" is the C implementation/OS.

On UNIX-y systems, the strings that argv points to, the argv pointers themselves, and the process's initial environment variables are almost always stored at the very top of the stack.

Flan answered 16/11, 2010 at 16:33 Comment(1)

+1 for real answer though, of course, partial. + that's the case on freeBSD/gcc. – Forsake 16/11, 2010 at 17:57

S

3

As many other answers here point out, the precise mechanism a compiler implementation uses to pass arguments to main is unspecified by the standard (as is the mechanism a compiler uses to pass any arguments to a function). Strictly speaking, the compiler need not even pass anything useful in those parameters, since the values are implementation-defined. But neither of these are particularly helpful answers.

The typical C (or C++) program is compiled for what's known as a 'hosted' execution environment (using function main() as the starting point of your program is one of the requirements for a hosted environment). The key thing to know is that the compiler arranges things so that when the executable is launched by the operating system, the compiler's runtime gets control initially - not the main() function. The runtime's initialization code performs whatever initialization is necessary, including allocating memory for the arguments to main(), then it transfers control to main().

The memory for the arguments to main() could come from the heap, could be allocated on the stack (possibly using techniques that aren't available to standard C code), or could use statically allocated memory, though that's a less likely option just because it's less flexible. The standard does require that the memory used for the strings pointed to by argv are modifiable and that modifications made to those string persist throughout the program's lifetime.

Just be aware that before execution reaches main(), quite a bit of code has already been run that's setting up the environment for your program to run in.

Sprightly answered 16/11, 2010 at 18:27 Comment(0)

R

2

The argument list is part of the process environment, similar to (but distinct from) environment variables.

Rafe answered 16/11, 2010 at 16:9 Comment(1)

Not quite. The C standard does not know the word "process". (This is the case for many implementations of C though) – Barefaced 16/11, 2010 at 16:37

E

2

Usually it is unknown where they are.

#include <stdlib.h>
#include <string.h>

int main(int argc, char *argv[]) {
  char **foo;
  char *bar[] = {"foo", "bar"};

  (void)argv; /* avoid unused argv warning */

  foo = malloc(sizeof *foo);
  foo[0] = malloc(42);
  strcpy(foo[0], "forty two");

  /* where is foo located? stack? heap? somewhere else? */
  if (argc != 42) main(42, foo); else return 0;

  /* where is bar located? stack? heap? somewhere else? */
  if (argc != 43) main(43, bar); else return 0;
  /* except for the fact that bar elements
  ** point to unmodifiable strings
  ** this call to main is perfectably reasonable */

  return 0;
  /* please ignore memory leaks, thank you */
}

Episiotomy answered 16/11, 2010 at 16:27 Comment(0)

K

-1

While you are able to access to the actual parameters, I think their actual location does not matter at all.

Kohler answered 16/11, 2010 at 16:9 Comment(0)

Recommended topics

Hot tags