Manipulating argv (assumptions) and language loopholes
Asked Answered
M

1

0

When parsing through argv, referring to

int main (int argc, char *argv[]) {}

I understand that argv[argc] == NULL according to the standard, but is it guaranteed that argv[i] != NULL where i < argc?

In other words, if I perform argv[i][0] where i < argc, am I guaranteed to not segfault, because argv[i] != NULL?

I think that programs could break the argv rules, consider these:

execlp("malicious_program", "ls", NULL); // program name not in argv[0]
execlp("ls", "ls", NULL, "-al", NULL); // NULL prior to end
execlp("ls", "ls"); // no NULL at end

Does the operating system provide safeguards (I think it could simply calculate argc from the number of arguments to exec), and if not, does that mean that these assumptions cannot safely be made?

When writing my own program, can I therefore use an idiom such as:

if (*argv[i] == NULL) break; // infinite loop if no NULL terminator, miss args if NULL middle

This last bit is where my question stemmed from.

Misbehavior answered 22/6, 2023 at 4:51 Comment(0)
H
2
execlp("malicious_program", "ls", NULL); // program name not in argv[0]

There's nothing wrong with this. Many programs can be run with different argv[0] values, and they use this as a flag to execute differently.

For example, if a shell is run with the first character of argv[0] being the - character, it executes as a login shell; this is a historical artifact of there not being a standard for arguments to shells, but the login system needs a way of telling the shell that it should operate in login mode.

On some systems, sh and bash are the same program; it checks argv[0] to determine whether to enable bash extensions.

If telnet is executed with some other argv[0], it's taken to be the destination hostname; this allows you to make symlinks to telnet with the name of a server, and use them as a shortcuts.

Programs that don't make any special use of argv[0] (the vast majority) completely ignore it. Putting a "malicious" value there will have no effect.

Why would malicious_program care that you put ls in argv[0]? Did you write that backwards, and intend this:

execlp("ls", "malicious_program", NULL);
execlp("ls", "ls", NULL, "-al", NULL); // NULL prior to end

Any arguments after NULL will be ignored, because the NULL argument is how execlp() determines where the end of the arguments is. Variadic argument lists don't provide any way for the function to determine the actual number of arguments. So there's no way for any argument before argc to be null, because the null value determines the value of argc.

execlp("ls", "ls"); // no NULL at end

This will cause undefined behavior. Without NULL, execlp() doesn't know how many arguments there are (see above), and it will try to access nonexistent arguments into argv.

As for whether it's safe to access these strings, I think it should always be. When argv is constructed by the program loader, it constructs a brand new set of strings, it doesn't simply pass along the pointers that were provided to exec*(). The actual layout of argv in memory is a single block of memory, with each argument consecutively allocated. E.g. if argv is

argv[0] = "ls"
argv[1] = "ls"
argv[2] = "filename"
argv[3] = NULL

the memory holding all the arguments will look like:

ls\0ls\0filename\0\0

and the elements of argv are pointers into this block.

So these pointers will never be invalid, they will always point into this block. The strings themselves may be meaningless, of course. In the case where you don't provide a NULL argument, it will copy garbage strings into argv.

Haplo answered 22/6, 2023 at 5:14 Comment(14)
How about something like char str[] = { 'a' }; execlp("ls", "ls", str, NULL);. I find that it is undefined in that the byte \0 will show up randomly so you get varying results with different executions. As someone on the receiving end of argv, do I simply treat it with the expectation of it being proper?Misbehavior
This is also undefined behavior. The arguments have to be strings, which means they have to have null terminators.Haplo
So as someone on the receiving end, do I simply assume the caller invoked me properly?Misbehavior
Yeah, there isn't really anything you can do. The same is true of every function.Haplo
Most likely a segmentation violation will happen when they're calling execlp() if they give invalid arguments, so it will never start the new program at all.Haplo
I mean. Typically functions return an error if invoked improperly, in my experience. It seems like the undefined behavior here spills into the callee. I'm wondering if certain program vulnerabilities could be exposed in this way.Misbehavior
If there's an invalid pointer in the argument list, execlp() will detect it and return an error. But the garbage could accidentally be valid, and it can't tell. That's the problem with undefined behavior, it's not always possible to detect.Haplo
Yes right. I've found that on MacOS, execlp(...) with exactly 5 arguments and no NULL produces an EFAULT (which is strange, because there are no bad addresses), but for example, with 3 arguments, it does not, but fills the argument list up until 5, with a trailing NULL. It also doesn't catch bad, non \0 terminated strings in any case.Misbehavior
Why do you think there are no bad addresses? It's reading past the end of the argument list, which contains arbitrary data, which may include bad addresses.Haplo
It's purely accidental that you get an error with 5 arguments, but not with 3. When there's undefined behavior, ANYTHING can happen.Haplo
I'm still wondering about the security implications of this. You can see my question was "can this safely be assumed" that the arguments are in standard conformant form.Misbehavior
I've added some more explanation to the end of the answer.Haplo
Well for example, couldn't you invoke undefined behavior in the callee via calling exec* without a NULL end or including a string without \0 end?Misbehavior
The UB will be in the caller, not the callee. In the caller, it will search for a null terminator, and copy everything until that into the callee's copy of argv.Haplo

© 2022 - 2024 — McMap. All rights reserved.