Why is argv (argument vector) in C defined as a pointer and what is the need for defining its zeroth as the program name?
Asked Answered
N

4

3
#include <stdio.h>
int main(int argc, char *argv[])
{
 int i;
 for(i=1;i<argc;i++)
  printf("%s%s", argv[i], (i<argc-1)? " ":"");
 printf("\n");
 return 0;
} 

Given above is a simple C program that outputs command line inputs. Here argc is the argument counter. argv is said to be an array that contains arguments. My question is: why does it define as a pointer to a character array instead of a normal array? Also what is the need for defining its zeroth element (argv[0]) as the name by which the program is invoked.

I am a beginner and please explain it high level perspective.

Naughty answered 12/2, 2014 at 15:45 Comment(7)
What I meant is what is the need for a pointer there.Naughty
I was talking to a friend about your second question last night actually - what I said was that it could be needed if the user changes the name of the program, we might need to know what it's now called, say to call another instance, update, etc.Beheld
It would be better to call it "an array of pointers to char" in other words if char * represents a C-style string then char *x[] means that x is an array of C-style stringsScenarist
To answer your other question, make your loop go from i=0 then try to invoke your command from the shell in different ways e.g. ./foo bar baz or /path/to/foo bar baz and expirement to see if there is any difference in how your environment transmits argv[0]Scenarist
@Brandin: If char *x[] is a parameter declaration, it means that x is a pointer to pointer to char. It may or may not point to an element of an array of char* pointers, and those char* pointers may or may not point to C-style strings; the parameter declaration doesn't specify that.Hunk
@KeithThompson Yes. This is why one can write equivalently char **x in the declaration. Yes. char * does not automatically imply a C-style string. Also, it's interesting to notice that the definition of C-style strings does not provide any reliable way of testing whether or not something is one. Not sure if it's in the C standard or POSIX, but I think somewhere is specified that in the special case of argv this must be contain an array of C-style strings whose last element has the value (char *)0 in conforming implementationsScenarist
This explains it all: youtube.com/watch?v=gRdfX7ut8gwDilettante
H
4

argv is defined as a pointer rather than as an array because there is no such thing as an array parameter in C.

You can define something that looks like an array parameter, but it's "adjusted" to array type at compile time; for example, these two declarations are exactly equivalent:

int foo(int param[]);
int foo(int param[42]); /* the 42 is quietly ignored */
int foo(int *param);    /* this is what the above two declarations really mean */

And the definition of main can be written either as:

int main(int argc, char *argv[]) { /* ... */ }

or as

int main(int argc, char **argv) { /* ... */ }

The two are exactly equivalent (and the second one, IMHO, more clearly expresses what's actually going on).

Array types are, in a sense, second-class types in C. Code that manipulates array almost always does so via pointers to the elements, performing pointer arithmetic to traverse the elements.

Section 6 of the comp.lang.c FAQ explains the often confusing relationship between arrays and pointers.

(And if you've been told that arrays are "really" pointers, they're not; arrays and pointers are distinct things.)

As for why argv[0] points to the program name, that's just because it's useful. Some programs print their names in error messages; others may change their behavior depending on the name by which they're invoked. Bundling the program name with the command-line arguments was a fairly arbitrary choice, but it's convenient and it works.

Hunk answered 12/2, 2014 at 15:59 Comment(0)
E
3

The char *argv[] is a pointer that an array of char * has decayed into. For example, invoking a command like this:

$ ./command --option1 -opt2 input_file

could be viewed as:

char *argv[] = {
    "./command",
    "--option1",
    "-opt2",
    "input_file",
    NULL,
};
main(4, argv);

So basically there is an array of strings outside main, and it is passed to you in main:

    char *argv[]
    \- --/     ^
      V        |
      |   It was an array
      |
of strings

Regarding argv[0] being the invocation command, the reason is largely historical. I don't know what the first person who thought of it thought about, but I can tell at least one usefulness for it.

Imagine a program, such as vim or gawk. These programs may install symbolic links (such as vi or awk) which point to the same program. So effectively, running vim or vi (or similarly gawk or awk) could execute the exact same program. However, by inspecting argv[0], these programs can tell how they have been called and possibly adjust accordingly.

As far as I know, neither of the programs I mentioned above actually do this, but they could. For example vim called through a symbolic link named vi could turn on some compatibility. Or gawk called as awk could turn off some GNU extensions. In the modern world, if they wanted to do this, they would probably create scripts that gives the correct options, though.

Erasmoerasmus answered 12/2, 2014 at 15:56 Comment(15)
char *argv[] (which, as a parameter, is exactly equivalent to char**) is not a pointer to an array of char*; it's a pointer to the first element of an array of char*. A pointer to a char* and a pointer to an array of char* would have distinct types.Hunk
vim behaves differently depending on whether it is invoked as vim, view, vimdiff, rview, rvim (and maybe as vi).Trimerous
@KeithThompson, I made the wording more precise.Erasmoerasmus
@JonathanLeffler, exactly. However, in the couple of times that I actually wrote vi instead of vim, I didn't see any difference, that's why I said: As far as I know, neither of the programs I mentioned above actually do this (with respect to vim and vi)Erasmoerasmus
"char *argv[] is a pointer that an array of char * has decayed into." -- I wouldn't put it that way. The entity that calls main isn't necessarily even written in C. An array expression decays to a pointer to its first element, but there isn't necessarily any array expression. There are two distinct rules at play: a parameter of array type is adjusted to a parameter of pointer type at compile time, and an expression of array type is converted ("decays") to a pointer value, notionally at run time.Hunk
@KeithThompson, so long as the observed behavior is so, it doesn't really matter. The entity calling main is not necessarily written in C, but if it was, that is what could have happened. (before I continue, char *argv[] and char **argv are the same (as you said so too), so I wouldn't discuss the first rule you mentioned above). More precisely, most likely, there is a char **argv = malloc(...) involved (if the main-calling entity was written in C), so there is neither an array nor decaying to pointer involved at all!Erasmoerasmus
Slightly nitpicky, but in your example, string literals do not have type const char * in C so there's no need to use strdup() to avoid that, and you should have a null pointer as the last element of your argv array.Robustious
Again, the observed behavior is the same; it looks like the entity who called main had created an array and that array has decayed into a pointer. That explains the OP's question regarding arrays being involved.Erasmoerasmus
It's unlikely that malloc is involved; free(argv) has undefined behavior.Hunk
@PaulGriffiths: String literals aren't const, but you can't modify them; you're explicitly permitted to modify the argument strings.Hunk
@KeithThompson: No argument with that, pun unintended.Robustious
@PaulGriffiths, thanks, I didn't realize string literals in C aren't actually of const char [] type. And yeah, I forgot about the NULL.Erasmoerasmus
@KeithThompson, undefined behavior doesn't mean that the libraries are forbidden from doing it. It means that the standard doesn't mandate malloc to be used, but doesn't forbid it either. In fact, the implementation of glibc, in execvpe uses alloca or malloc based on configuration.Erasmoerasmus
+1 for simply and patiently addressing each of these (somewhat minor) nits, all from people with knowledge substantial enough to find holes in an explanation that otherwise seemed to address the OP in a very clear and simple way.Avitzur
@ryyker, thanks. It's hard to find the sweet spot between being absolutely correct and not super-terrifying to a newbie. I personally liked that Keith explained the different syntaxes for pointer function arguments (in his answer), which I think is important for the OP to know.Erasmoerasmus
A
2

The questions you ask are really answered best by simply saying its all "by definition". i.e. a set of rules designed and agreed upon by a committee.

Here is what C11 says: (see emphasized sections)

5.1.2.2.1 Program startup
1 The function called at program startup is named main. The implementation declares no prototype for this function. It shall be defined with a return type of int and with no parameters: int main(void) { /* ... */ } or with two parameters (referred to here as argc and argv, though any names may be used, as they are local to the function in which they are declared): int main(int argc, char argv[]) { / ... */ } or equivalent;10) or in some other implementation-defined manner.
2 If they are declared, the parameters to the main function shall obey the following constraints:
— The value of argc shall be nonnegative.
— argv[argc] shall be a null pointer.
— If the value of argc is greater than zero, the array members argv[0] through argv[argc-1] inclusive shall contain pointers to strings, which are given implementation-defined values by the host environment prior to program startup.
The intent is to supply to the program information determined prior to program startup from elsewhere in the hosted environment. If the host environment is not capable of supplying strings with letters in both uppercase and lowercase, the implementation shall ensure that the strings are received in lowercase.
— If the value of argc is greater than zero, the string pointed to by argv[0] represents the program name; argv[0][0] shall be the null character if the program name is not available from the host environment. If the value of argc is greater than one, the strings pointed to by argv[1] through argv[argc-1] represent the program parameters.
— The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.

Avitzur answered 12/2, 2014 at 15:55 Comment(2)
This is not the best way to answer the question because it suggests we should simply obey committees without reason and because it does not explain why the committees did what they did. The committees that developed the C standard had reasons for what they did, there is documentation about it, and it is useful for people to understand why the language is designed the way it is.Dunedin
I have found it very helpful at times to understand that something has been decided for the sake of standardization, possibly without any great significance. This is how it is - move on.Esquire
A
0

It is not defined as a normal array because in C the size of array elements has to be known at compile time. The size of char * is known, the size (length) of your arguments are not.

argv[0] contains the name of the invoked process because it is possible to invoke it by any arbitrary name. e.g. exec family of calls can specify what it wants and you are allowed to invoke a program via a symlink. argv[0] allows the program to offer different functionality depending on the invocation name.

Apothem answered 12/2, 2014 at 15:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.