Why does main(int argc, char* argv[]) take two argument? [duplicate]
Asked Answered
L

2

10

I always thought that argc was required to mark the end of argv but I just learned that argv[argc] == NULL by definition. Am I right in thinking that argc is totally redundant? If so, I always thought C made away with redundancy in the name of efficiency. Is my assumption wrong or there's a historic reason behind this? If the reason is historic, can you elaborate?

Ludlow answered 13/8, 2015 at 23:6 Comment(11)
Couldn't NULL be an element of argv? That is, before the actual end of the array.Invar
@AndrasDeak, I don't think so. An element of argv could be an empty string, that is an array of just one element, a 0-byte.Syllabic
Yes it is redundant. The reason is "historical reasons"Dawna
I guess you could argue it is a speed optimization to read argc instead of iterating over argv.Dawna
Plus the ability to say if (argc < 3) { printf("error message"); return 1; } without looping the argv list first. Not to mention various other choices that might be made based on the number of arguments (read files from command-line args vs. reading stdin, etc.)Kulak
How would you get argv to have an empty string with non empty strings following it?Defector
@Nighthawk441 you could call it like execname 'arg1' 'arg2' '' 'arg4', in which case argv[3] is an empty string. And, as @Jens said, that's not NULL.Invar
You could invoke a program via execv() and give it an array with null pointers in the middle of it. This would be a bad idea; the program's behavior would be undefined. The C standard specifically requires the pointers argv[0] through argv[argc-1] to be pointers to strings, which means they can't be null pointers.Mctyre
See also What should main() return in C and C++, which quotes what the standard says. The reason for the redundancy is primarily historical (that's how it was done in C in the mid-70s, so that's how it has been done ever since). And now, of course, there's a quarter century of it being standardized behaviour, and changing it would break a lot of code.Measures
@KeithThompson: execve() only knows the length of the argument list by coming across the first null pointer. The extras 'in the middle' simply don't count. It's a little more debatable what happens if the zeroth argument is a null pointer. The standard permits argc == 0, and still requires argv[argc] == 0.Measures
@JonathanLeffler: You're right, and I was wrong. You could pass invalid argument pointers via any of the exec*() functions, but the end of the list is defined by a null pointer (either as an argument for the variadic functions execl, execlp, and execle, or as the last element of the array for execv and execvp). And the argc value passed to the invoked program is computed from that. (You could pass invalid pointers, which could make the invoked program unhappy, but that's a different thing.)Mctyre
E
6

History.

Harbison & Steel (5th Edition, 9.9 "The main program") says the following:

Standard C requires that argv[argc] be a null pointer, but it is not so in some older implementations.

Exemplification answered 13/8, 2015 at 23:48 Comment(2)
It would be helpful to have some indication of which 'older implementations' don't have argv[argc] as a null pointer — I suspect H&S don't provide that level of detail, though. They'd have to be pretty old these days. (I was never unlucky enough to come across one, but there are plenty of esoteric platforms that I've not programmed on.)Measures
FWIW I managed to find a scanned PDF of K&R 1st ed., and as far as I can tell they never mention a null sentinel at argv[argc] and all examples use argc to determine he end of the argv[] array. The 2nd Edition points out the null sentinel, but doesn't use it in any examples.Exemplification
T
3

Here's the history.

In first edition UNIX, which predates C, exec took as arguments a filename and the address of a list of pointers to NUL-terminated argument strings terminated by a NULL pointer. From the man page:

sys exec; name; args      / exec = 11.
name: <...\0>
...
args: arg1; arg2; ...; 0
arg1: <...\0>
...

The kernel counted up the arguments and provided the new image with the arg count followed by a list of pointers to copies of the argument strings, at the top of the stack. From the man page:

sp--> nargs
      arg1
      ...
      argn

arg1: <arg1\0>
...
argn: <argn\0>

(The kernel source is here; I haven't looked to see if the kernel actually wrote something after the pointer to the last argument.)

At some point, up through the 6th edition, the documentation for exec, execl, and execv began to note that the kernel placed a -1 after the arg pointers. The man page says:

Argv is not directly usable in another execv, since argv[argc] is -1 and not 0.

At this point, you could argue that argc was redundant, but programs had, for some time, been using it rather than looking through the argument list for -1. For example, here's the beginning of cal.c:

main(argc, argv)
char *argv[];
{
    if(argc < 2) {
        printf("usage: cal [month] year\n");
        exit();
    }

In 7th edition, exec was changed to add a NULL pointer after the argument strings, and this was followed by a list of pointers to the environment strings, and another NULL. The man page says:

Argv is directly usable in another execv because argv[argc] is 0.

Twelvemonth answered 14/8, 2015 at 1:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.