Is argv[n] writable?
Asked Answered
E

6

40

C11 5.1.2.2.1/2 says:

The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.

My interpretation of this is that it specifies:

int main(int argc, char **argv)
{
    if ( argv[0][0] )
        argv[0][0] = 'x';   // OK

    char *q;
    argv = &q;              // OK
}

however it does not say anything about:

int main(int argc, char **argv)
{
    char buf[20];
    argv[0] = buf;
}

Is argv[0] = buf; permitted?

I can see (at least) two possible arguments:

  • The above quote deliberately mentioned argv and argv[x][y] but not argv[x], so the intent was that it is not modifiable
  • argv is a pointer to non-const objects, so by in the absence of specific wording to the contrary, we should assume they are modifiable objects.
Empty answered 9/9, 2014 at 5:48 Comment(12)
Related: this answer which asserts that argv[n] is non-modifiable but does not provide any justification for that assertionEmpty
When it says that argv can be modified I take that to mean that argv[n] can be. (Of course the argv pointer itself can be modified, it's just a function-local argument.)Purposive
@Purposive Why do they bother to say that argc can be modified, that same "of course" applies to it? I think they're just talking about the local variable, not the pointers.Headed
@Purposive But it says "the strings pointed to". argv[n] is not a string; it's a pointer that points to the first character of a string.Empty
@Headed Fair point, though it seems silly to list argc and argv explicitly at all...Purposive
@MattMcNabb I am aware of that verbiage; it's also not the part I'm talking about. Nevertheless, my interpretation is not infallible.Purposive
@Purposive I certainly agree that it seems silly to list argc and argv explicitly , but maybe there's historical justification that I'm not aware of. Maybe it's relevant in the case of main being called recursively.Empty
Here's a legitimate reason why a particular implementation might require that argv[n] not be modified.Ache
@hvd good point, write it as an answer perhaps. I was active on other clc threads at the same time so I should have remembered!Empty
@MattMcNabb I posted it as a comment because it doesn't answer the question of what the standard actually requires. I don't know if the hypothetical implementation in that message would conform to the (intended) requirements of the standard.Ache
@hvd if we accept that the standard doesn't clearly specify what it requires, then the discussion has to move onto what a likely rationale would be and what situations it's supposed to cover.Empty
compiler doesnt treat pointer to pointer array argv nor any of pointers argv[x] differently from any automatic local variables or function arguments created on stack, and thus are modifiable e.g, reassignable. it makes perfectly sense, since C is meant to be powerful and efficient, and with great power comes great responsibility which is placed on programmer. C expects that you know what you are doing and it is fine that way.Fist
M
11

IMO, code like argv[1] = "123"; is UB (using the original argv).


"The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination." C11dr & C17dr1 §5.1.2.2.1 2

Recall that const came into C many years after C's creation.

Much like char *s = "abc"; is valid when it should be const char *s = "abc";. The need for const was not required else too much existing code would have be broken with the introduction of const.

Likewise, even if argv today should be considered char * const argv[] or some other signature with const, the lack of const in the char *argv[] does not completely specify the const-ness needs of the argv, argv[], or argv[][]. The const-ness needs would need to be driven by the spec.

From my reading, since the spec is silent on the issue, yet goes into depth about other assignments of main()'s argv = and argv[i][j] = , it is UB.

Undefined behavior is otherwise indicated in this International Standard by the words ‘‘undefined behavior’’ or by the omission of any explicit definition of behavior" §4 2


[edit]:

main() is a very special function in C. What is allowable in other functions may or may not be allowed in main(). The C spec details attributes about its parameters that given the signature int argc, char *argv[] that shouldn't need. main(), unlike other functions in C, can have an alternate signature int main(void) and potentially others. main() is not reentrant. As the C spec goes out of its way to detail what can be modified: argc, argv, argv[][], it is reasonable to question if argv[] is modifiable due to its omission from the spec asserting that code can.

Given the specialty of main() and the omission of specifying that argv[] as modifiable, a conservative programmer would treat this greyness as UB, pending future C spec clarification.


If argv[i] is modifiable on a given platform, certainly the range of i should not exceed argc-1.

As "argv[argc] shall be a null pointer", assignining argv[argc] to something other than NULL appears to be a violation.

Although the strings are modifiable, code should not exceed the original string's length.

char *newstr = "abc";
if (strlen(newstr) <= strlen(argv[1])) 
  strcpy(argv[1], newstr);

1 No change with C17/18. Since that version was meant to clarify many things, it re-enforces this spec is adequate and not missing an "argv array elements shall be modifiable".

Martelli answered 9/9, 2014 at 14:43 Comment(12)
"omission of any explicit definition of behaviour" - well, we would say int bar = 7; is defined despite the fact that the text "int bar = 7;" does not appear in the standard.Empty
@Matt McNabb Normally one would readily agree with your comment's line of reasoning were it not for C11dr §5.1.2.2.1 2. The spec goes out of the way to say some things are modifiable even though char *argv[] does not need that affirmation. Since the spec specifically indicates modifiability for argv, and argv[][], (2 out of 3) but not argv[], that absence is significant - hence UB by omission. IMO, it is a weakness for the spec to imply modifiability for argv[][] and to be silent on argv[].Martelli
@Matt McNabb A usefulness to modifiable of argv[][] is the direct use of strtok() on argv[] and command line arguments often need to be parsed.Martelli
I disagree (see my answer). Can you point to any actual implementation in which argv[n]=buf behaves badly? Or any implementation which warns against doing it?Elvaelvah
@Elvaelvah Advertising your answer here? - hmmm. No, cannot come up with an example just like the spec does not specifically say it can be done - else OP would have seen that and there would have been no question. We are left with a greyness in the spec and are trying to call it black or white. In the end, until the spec is made more clear, compiler makers and C programmers will do there best. Certainly you do not see a greyness, but if you did, how would you code: the way you think it should be or conservatively? For me, I would code conservatively because of the possibility of UB.Martelli
In this I'm a provider rather than consumer. I would write my compiler to allow it. If I wanted to use it I would read the compiler source to find if it's safe. Sometimes for things like this there is no other way.Elvaelvah
@Elvaelvah Good point on what a compiler (provider) should do! If I (consumer) wanted to use a dubious spec defined ability though, I would change my needs (not use it), even if the compiler I was presently using did allow it. This avoids getting caught by compiler providers that employ embrace, enhance, extinguish.Martelli
I won't labour the point, but sometimes requirements push you into places you'd rather not be, particularly if you only own/control part of the code. I've relied on worse things than this.Elvaelvah
I came to the opposite conclusion wrt. basically the same question (here), because 1) leaving const out of definition of argv does not seem like an accidental omission to me, and 2) the standard consistently refers to argv as an array, and modifying an array ⇒ modifying its members, as the array itself cannot be modified as a whole; it is the adjustment done to array parameters (turning them into qualified pointers to the first element) that allows modifying argv itself. Is that explicit definition enough? I think so, but I could be wrong.Pisistratus
@Nominal Animal Note: int main(argc, argv) as int argc and char *argv[] had lots of history before const was invented.Martelli
Quite true, chux. Like I mentioned in that other question, there is quite a lot of code (in olden Unix-land, and in GNU land) that expects both the pointers in the argv array as well as the contents of the pointed-to strings to be modifiable. I see the standard as implying the pointers are modifiable, and that the omission of an unambiguous explicit statement (in C99 5.1.2.2.1p2) declaring it so, is just an oversight. On the other hand, I cannot see any fault in your answer here either; I just cannot help but draw different conclusions.Pisistratus
@Nominal Animal Suggest adding your answer here along with your additional insights.Martelli
I
7

The argv array is not required to be modifiable (but may be in actual implementations). This is an intentional wording which was reaffirmed in the n849 meeting in 1998:

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n849.htm

PUBLIC REVIEW COMMENT #7

[...]

Comment 10.
Category: Request for information/clarification
Committee Draft subsection: 5.1.2.2.1
Title: argc/argv modifiability, part 2
Detailed description:

Is the array of pointers to char pointed to by argv modifiable?

Response Code: Q
    This is currently implictly unspecified and the committee 
    has chosen to leave it that way.

In addition, two separate proposals were made to, respectively, change and augment the wording. Both were rejected. Interested readers can find them by searching for "argv".


Trivia: an example in the Kernighan and Ritchie The C Programming Language, 2nd ed, ("K&R2") runs afoul of this. It is on page 117, and the relevant line of code is:

while (c = *++argv[0])

which increments the pointer inside the argument vector itself to step through the characters of the string.

Illdefined answered 6/1, 2023 at 14:49 Comment(1)
@Kaz, thank-you for posting the informative link. It is telling that the the requests were rejected, yet I do not see this as clearly supporting "argv array is not required to be modifiable" only that the ambiguity remains, possibly dependent on the spec as a whole.Martelli
D
3

One possible reason for this wording and restricting modifications of the argv array itself is to allow an implementation to allocate the argv array and the C strings it contains with malloc and free them after returning from main. eg:

    // C startup
    [...]
    // allocate the command line argument array
    int argc = os_get_argument_count();
    char **argv = malloc(sizeof *argv * (argc + 1));
    for (int i = 0; i < argc; i++)
        argv[i] = strdup(os_get_argument(i));
    argv[argc] = NULL;
    [...]
    // calling main
    int status = main(argc, argv, envp);
    // call atexit functions, etc.
    [...]
    // release arguments
    for (i = 0; i < argc; i++)
        free(argv[i]);
    free(argv);
    os_exit_with_status(status);

If the argv array is modified, the above hypothetical library code would have undefined behavior.

This is a theoretical example, in practice there would be no reason for the C runtime to care about the argv array contents after the call to main, and implementations that do would fail to compile and run many C programs that modify the argv array for argument parsing.

Delectable answered 9/12, 2023 at 20:1 Comment(4)
Was this: argv[0] = strdup(os_get_argument(i)); intended to be this: argv[i] = strdup(os_get_argument(i));? In this model changing argv[0] would lead to UB when attempting to free, but swapping the other pointers (as in the Q&A that brought this up) would be ok, wouldn't it?Beutner
@adabsurdum: good catch! typo fixed. Yes, swapping the pointers would be OK for this theoretical case, but if argv[1] is the only one allocated and the following ones are just pointers to the same allocated area split as if by strtok(), swapping argv[0] with subsequent pointer would lead to UB :)Delectable
This seems like a plausible example. It would be nice if the Standard were more explicit on this, but what would we do if the Standard was always crystal clear?Beutner
"... allow an implementation to allocate the argv array ..." interesting rational.Martelli
M
-1

argc is just an int and is modifiable without any restriction.

argv is a modifiable char **. It means that argv[i] = x is valid. But it does not say anything about argv[i] being itself modifiable. So argv[i][j] = c leads to undefined behaviour.

The getopt function of C standard library does modify argc and argv but never modifies the actual char arrays.

Mob answered 9/9, 2014 at 15:46 Comment(2)
Saying "x is modifiable" means that you are allowed to change x; you seem to be interpreting it as meaning "you are allowed to change what x points to , if x is a pointer, or if x is not a pointer than you are allowed to change x"Empty
As quoted in the question, “the strings pointed to by the argv array shall be modifiable by the program”, and argv[i][j] is part of a string pointed to by argv, so it is modifiable by the program.Candi
E
-2

The answer is that argv is an array and yes, its contents are modifiable.

The key is earlier in the same section:

If the value of argc is greater than zero, the array members argv[0] through argv[argc-1] inclusive shall contain pointers to strings, which are given implementation-defined values by the host environment prior to program startup.

From this it is clear that argv is to be thought of as an array of a specific length (argc). Then *argv is a pointer to that array, having decayed to a pointer.

Read in this context, the statement to the effect that 'argv shall be modifiable...and retain its contents' clearly intends that the contents of that array be modifiable.

I concede that there remains some ambiguity in the wording, particularly as to what might happen if argc is modified.


Just to be clear, what I'm saying is that I read this language as meaning:

[the contents of the] argv [array] and the strings pointed to by the argv array shall be modifiable...

So both the pointers in the array and the strings they point to are in read-write memory, no harm is done by changing them, and both preserve their values for the life of the program. I would expect that this behaviour is to be found in all the major C/C++ runtime library implementations, without exception. This is not UB.

The ambiguity is the mention of argc. It is hard to imagine any purpose or any implementation in which the value of argc (which appears to be simply a local function parameter) could not be changed, so why mention it? The standard clearly states that a function can change the value of its parameters, so why treat argc specially in this respect? It is this unexpected mention of argc that has triggered this concern about argv, which would otherwise pass without remark. Delete argc from the sentence and the ambiguity disappears.

Elvaelvah answered 11/9, 2014 at 13:59 Comment(8)
argv is a pointer which points to the first element of an array, that's what char **argv or char *argv[] means in a parameter list. argv[0] is a member of that array, but argv itself isn't. In any case , the relevant text is "the strings pointed to by the argv array", so it doesn't matter whether or not argv is called an array; as the thing being defined is the strings pointed to, not the array.Empty
@MattMcNabb: I still read "the argv [array] ... shall be modifiable" regardless of how you slice and dice it. Seems pretty obvious to me, hard to see why anyone would disagree.Elvaelvah
I don't see your justification for ignoring "the strings pointed to by". another example of the same language construct: "the man shot by the policeman died". Would you say the policeman died?Empty
I see no ambiguity in the wording, though it's possible the wording doesn't match the intent. argv is a pointer, not an array, and is that pointer object that the standard says is modifiable. The phrase "the argv array" can only refer to the array to whose first element argv points. The standard doesn't say whether that array is modifiable. That does seem like an odd omission, and I suspect it was not deliberate.Whereof
But it's not plausible that the standard would be so sloppy as to use the unqualified argv (the name of a pointer object) to refer to an array object -- especially when it uses the more precise and correct phrase "the argv array" in the same sentence. The meaning of "The parameters argc and argv ... shall be modifiable ..." is clear, and it doesn't refer to the array.Whereof
"why treat argc specially in this respect?" is because argc and argv are the parameters of main(). main() itself in a special function detailed in the spec. IMO, the peculiarities of the starting point for code necessitated these details for main(), its parameters, return value, ability to be recursively called.Martelli
@KeithThompson: I'm not sure why you say "It's not plausible that the standard would be so sloppy as to..." when there's a lot of sloppiness in the Standard. It was written in a very different era from today, before language lawyers took over everything.Dextrogyrate
compiler doesnt treat nor pointer to pointer array argv nor any of pointers argv[x] differently from any automatic local variables or function arguments, and thus are modifiable e.g, reassignable. it makes perfectly sense, since C is meant to be powerful and efficient, and with great power comes great responsibility.Fist
L
-2

It is modifiable, and gdb with -std=11 shows what happens

ISO C11, the 2011 revision of the ISO C standard. This standard is substantially completely supported’. So, let us safely assume that the gcc compiler supports most of the features proposed by the C11 standard documentation. The option –std=c11 of gcc will compile programs in C11 standard.

argv[] is just array of string pointers like arguments of any other function they are treated as local variables and are modifiable. Since they belong to main() function they will last until program exits..

char **argv (pointer to pointer) and int argc are arguments to main() function and thus created on a stack.

If we run this code and stop at breakpoint at main(): :~/proba$ gdb --args proba BBBBBBB CCCCCCC

#include <stdio.h>
    
int main(int argc, char **argv)
{
    char buf[20] = "AAAAAAA";
    argv[0] = buf;
    return 1;
}

Starting program: /home/drazen/proba/proba BBBBBBB CCCCCCC
Breakpoint 1, main (argc=3, argv=0x7fffffffdef8) at main9.c:5

If we dump stack we see:

(gdb) x/32gx $sp
0x7fffffffddb0: 0x00007fffffffdef8  0x0000000300000000
0x7fffffffddc0: 0x0000000000000000  0x0000000000000000
0x7fffffffddd0: 0x0000000000000000  0x0000000000000000
.......

We recognize value of argv 0x00007fffffffdef8 and value of argc 0x00000003 on top of a stack.

Since we passed one argument argc is 3 as expected.

But argv holds address of the array of char pointers. So at address 0x00007fffffffdef8 is actually address of the first pointer argv[0] with value 0x00007fffffffe25c, what should be an address where program name with absolute path is.

And at address 0x7fffffffdf00 is second pointer argv[1] with value 0x00007fffffffe275 what should be an address where first program argument is. And at address 0x7fffffffdf08 is third pointer argv[2] with value 0x00007fffffffe27d what should be an address where second program argument is.

We can see that at the stack:

........
0x7fffffffdef0: 0x0000000000000003            0x00007fffffffe25c<argv[0]>
0x7fffffffdf00: 0x00007fffffffe275<argv[1]>   0x00007fffffffe27d<argv[2]>

So argv[0] holds the address of program name:

(gdb) x/s 0x00007fffffffe25c
0x7fffffffe264: "/home/drazen/proba/proba"

And at address 0x00007fffffffe275 is second argument argv[1] which holds address of a program argument BBBBBBB:

(gdb) x/s 0x00007fffffffe275
0x7fffffffe27d: "BBBBBBB"

And at address 0x00007fffffffe27d is third argument argv[2] which holds address of a program argument CCCCCCC:

(gdb) x/s 0x00007fffffffe27d
0x7fffffffe27d: "CCCCCCC"

We see that string literals these char *argv[] pointers points to are at consecutive addresses:

(gdb) x/3s 0x00007fffffffe25c
0x7fffffffe25c: "/home/drazen/proba/proba"
0x7fffffffe275: "BBBBBBB"
0x7fffffffe27d: "CCCCCCC"

Now let's step few instructions where we initialize local string variable on stack and reassign pointerargv[0]:

(gdb) s
6       char buf[20] = "AAAAAAA";
(gdb) s
7       argv[0] = buf;
(gdb) s
8       return 1;

If we dump the stack now we notice it has slightly changed. We see that local variable buf[] was initialized on stack too (string AAAAAAA is HEX 0x0041414141414141) right after main() function arguments:

(gdb) x/32gx $sp
0x7fffffffddb0: 0x00007fffffffdef8  0x0000000300000000
0x7fffffffddc0: 0x0041414141414141  0x0000000000000000
0x7fffffffddd0: 0x0000000000000000  0x5315d27018aa8e00
.....
.....
0x7fffffffdef0: 0x0000000000000003  0x00007fffffffddc0<argv[0]>
0x7fffffffdf00: 0x00007fffffffe275  0x00007fffffffe27d

Value 0x00007fffffffdef8 of argv hasn't changed, but value of argv[0] has changed to 0x00007fffffffddc0:

.....
0x7fffffffdef0: 0x0000000000000003  0x00007fffffffddc0<argv[0]>
0x7fffffffdf00: 0x00007fffffffe275  0x00007fffffffe27d


(gdb) x/s 0x00007fffffffddc0
0x7fffffffddc0: "AAAAAAA"
(gdb) p argv[0]
$1 = 0x7fffffffddc0 "AAAAAAA"

pointer argv[0] now points to the new memory location where string literal "AAAAAAA" is allocated on a stack.

Llama answered 9/12, 2023 at 23:16 Comment(2)
The question is about what the language standard guarantees, not about how any particular compiler behaves in some scenarioEmpty
I tested scenario in question. Yes I know different compilers may work differently, but I didn't use any compiler but gcc with -std=11 so I my intention was to show what happens in real life with compiler which comply with the standard.Fist

© 2022 - 2024 — McMap. All rights reserved.