Can argv[0] contain an empty string?
Asked Answered
K

4

9

In any C program, the command line argument argv[0] points to the name used to invoke the program. Is there any circumstance in which it will point to an empty string ""?

An example code snippet for such a case would be a good reference.

Klaxon answered 29/12, 2011 at 8:19 Comment(3)
Why not design your program so it works regardless? Or use a runtime assertion.Laurasia
Better to call it an empty string; null string can too easily be confused with a null pointer.Oversell
related: NULL: https://mcmap.net/q/175168/-when-can-argv-0-have-null/…Bouley
D
9

It's implementation defined. §5.1.2.2.1 abridged:

  • If the value of argc is greater than zero, the array members argv[0] through argv[argc-1] inclusive shall contain pointers to strings, which are given implementation-defined values by the host environment prior to program startup. The intent is to supply to the program information determined prior to program startup from elsewhere in the hosted environment. [...]

  • If the value of argc is greater than zero, the string pointed to by argv[0] represents the program name; argv[0][0] shall be the null character if the program name is not available from the host environment. [...]

So if argc is greater than zero, it's quite the intention that argv[0] never be an empty string, but it could happen. (Note that with argc equal to n, argv[0] through argv[n - 1] are never null and always point to a string. The string itself may be empty, though. If n is zero, argv[0] is null.)

In practice, of course, you just need to make sure the platforms your targetting behave as needed.

Delmardelmer answered 29/12, 2011 at 8:32 Comment(5)
So in fact if argv[0][0] may be 0 argv[0] may 'be' empty.Latinism
@alk: argv[0] is never null when argc is greater than zero, but it may point to an empty string.Delmardelmer
@KeithThompson: You name it ... ;-)Latinism
@GMan Thanks for your answer. Could you please let me know from where you get the C specification? f.e. Where do you get the info about 5.1.2.2.1 from?Klaxon
@SangeethSaravanaraj: This is the latest draft of the C99 standard. This is the latest available draft of the recently published 2011 standard.Unscrew
U
7

Yes.

The C language standard explicitly allows for the possibility that argv[0] can be a null pointer, or that it can point to an empty string (""). N1256 5.1.2.2.1p2:

The value of argc shall be nonnegative.

argv[argc] shall be a null pointer.

[...]

If the value of argc is greater than zero, the string pointed to by argv[0] represents the program name; argv[0][0] shall be the null character if the program name is not available from the host environment. If the value of argc is greater than one, the strings pointed to by argv[1] through argv[argc-1] represent the program parameters.

On Unix-like systems, programs are invoked by one of the exec() family of functions (execl(), execlp(), etc.), which allow the caller to specify exactly what arguments are passed to the main() function. (It's even possible to invoke a program in ways that violate the requirements imposed by the C standard.)

Note that the standard says that argv[0] (assuming it's neither null nor empty) "represents the program name". The standard is deliberately vague about how it represents the program name. In particular, it needn't provide a name by which the program can be invoked (since the standard doesn't even require that programs can be invoked by name).

Unscrew answered 29/12, 2011 at 8:33 Comment(10)
It can't be null, just empty. The previous paragraph requires that each argv point to a string (and that the last is null). A crappy implementation could supply them all with empty strings, but not all null.Delmardelmer
If argc == 0, then argv[0] is a null pointer. (I hadn't quoted that part when you wrote your comment.)Unscrew
It wasn't. I have upvoted, though, since our answers are similar.Delmardelmer
@KeithThompson +1 for mentioning the exec() family of functions! Thanks!!Klaxon
So do we simply assume the callers of exec invoked it according to the standard, program vulnerability aside @KeithThompson?Roselani
@Roselani I hadn't really thought about it before, but if a program assumes that its argc and argv are initialized in a conforming manner, using exec*() to invoke it could create a security hole. I haven't worked out the details. I'll note that none of the exec*() functions take argc as an argument; they all compute it from the actual argument vector. (Which eliminates one possible security hole I had thought of.) Probably most programs just assume the arguments are valid, but a sensitive program should allow for other possibilities. I'll have to think about this.Unscrew
Here's a post I made about it @KeithThompsonRoselani
@Roselani I did some experiments on Ubuntu and a couple of other systems (FreeBSD, Windows/Cygwin, Android/Termux). I was unable to invoke a program with argv[0] == NULL. At worst, either argv[0] points to an empty string or the execl() call failed, depending on the system. I don't see a way to exploit a security hole using the exec*() functions, as long as the target program is able to handle an empty (non-NULL) argv[0].Unscrew
I'm able to notice strange behavior when invoking exec without a NULL at the end and also strings without a \0. Tested MacOS. @KeithThompsonRoselani
Calling exec*() without a trailing null pointer has undefined behavior. There's pretty much no way to guard against that. (In at least some cases, gcc warns about such a call.)Unscrew
J
7

Other replies have quoted the C standard and shown that argv[0] and can be NULL or it can be the empty string (""). You should write your program with the assumption that this can happen, because otherwise you are creating a (small) security risk. It's easy to invoke your program and set argv to anything an attacker wants. As proof, consider the following two programs. The first one, echoargv.c prints out the contents of argv:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {
    int i;
    for (i = 0; i < argc; ++i)
        printf("argv[%d] = \"%s\"\n", i, argv[i]);
    exit(0);
}

The second one, argv0, invokes any other program and lets the user specify the other program's argv:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main(int argc, char **argv) {
    (void) execv(argv[1], argv+2);
    perror("execv");
    exit(1);
}

(This is a Posix-specific version. Non-standard environments may need changes.)

Here's how to use them:

$ gcc -o echoargv echoargv.c 
$ gcc -o argv0 argv0.c 
$ ./argv0 ./echoargv 
$ ./argv0 ./echoargv ''
argv[0] = ""
$ ./argv0 ./echoargv 'this is fun' 'it is fun indeed'
argv[0] = "this is fun"
argv[1] = "it is fun indeed"
$ 

The first run of argv0 sets echoargv's argv[0] to be NULL. The second run makes it be the empty string. The third run is there just for fun: note how argv[0] doesn't need to have anything to do with the actual name of the program.

How can this bite you? If, for example, you blindly print out the name of your program in a usage message:

printf("usage: %s [options] [FILE]...\n", argv[0]);

Better:

const char *program_name = "some default name"; /* (global) variable */
if (argv[0] && argv[0][0])
    program_name = argv[0];
printf("usage: %s [options] [FILE]...\n", program_name);

If you don't do this, an attacker can cause your program to segfault at will, or might get your program to report entirely wrong things to the user.

Joist answered 29/12, 2011 at 9:11 Comment(1)
+1 for the awesome example! It was indeed a lot of fun reading your post!! Thanks :)Klaxon
E
1

argv[0] can be null in C, for example if you directly invoke a main function (with some tricks can be done in C). I don't know if C++ allows direct main invocation.

Evangelicalism answered 29/12, 2011 at 8:25 Comment(7)
Isn't the OP asking about argv[0] referring to an empty string ("", {'\0'}), but to argv[0] being NULL?Latinism
@Latinism The OP is about empty/null string and not argv[0]=NULLKlaxon
C++ does not allow you to call main(); C does.Oversell
if can be null then can be empty too (\0). The core of the answer is "when main is not invoked by system". An example ? well you can write a main in C which calls itself recursively without passing argv. in C main is just a function. However I think it's bad damn bad practice to invoke main directly. But we're not talking about good or bad practices, are we? See GMan & Keith answers for details about it. They're better than mine (they actually give refs).Evangelicalism
@Evangelicalism +1 for mentioning direct invocation of main(). Thanks!Klaxon
@JonathanLeffler: just tested with a MS compiler, C++ doesn't allow it. IMHO is a good design choice. in 10+ years of C programming I've seen direct main invocation just once, and I'm still wondering why... but was in the code of a so defined genius so I didn't had the chance to ask or argue.Evangelicalism
Yes, if you call main recursively in C, you can pass any garbage values you like. Something like argc == 2, argv[1] == NULL is likely to break any normal argument processing. But that's just a bug in your code. You can safely make assumptions about argc and argv in the initial invocation of main.Unscrew

© 2022 - 2024 — McMap. All rights reserved.