Are the pointers to strings in argv modifiable? [duplicate]
Asked Answered
E

2

10

Recently (Jan 2016, in case the question persists long enough) we had the question Are the strings in argv modifiable?.
In the comment section to this answer, we (@2501 and I) argued whether it is really the strings of characters (an example character being **argv) that's modifiable or the pointers to the strings (an example pointer being *argv).

The appropriate standard quotation is from the C11 standard draft N1570, §5.1.2.2.1/2:

The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.

So are the pointers to the strings as pointed to by argv modifiable?

Ethiopia answered 30/1, 2016 at 19:13 Comment(13)
Since argv is modifiable, and argv contains the pointers you speak of, yes.Grandson
@SimonShine But if argv is modifiable, doesn't that mean only argv = ... can be done and not necessarily *argv = ...?Ethiopia
Because one of proper main function declarations is int main(int argc, char *argv[]) it looks clear, that an argv parameter is an array of char * pointers, not char const *argv[] pointers. So the strings should be modifiable.Royroyal
@Royroyal As I already said in the referenced question, in C, you can also do char* str = "foo"; and *str = 'c'; is undefined behavior. So I would leave const aside here.Ethiopia
@SimonShine: argv is a local variable, so it is modifyable anyway. But the question is about *argv and the other pointers.Alexander
While the question itself is good, I personally think it is bad style to change the arguments and *argv. It might be useful for a freestanding environment, but for that the standard does not define the startup behaviour/function, so it depends on your implementation & environment anyway.Alexander
@Olaf Well, almost all nitpick-questions and about the standard emerge from bad style and edge cases, don't they? And doesn't the standard define int main(void) and int main(int, char**) on a hosted implementation?Ethiopia
When you do char *str = "foo" you are making str point to char const * string, which modern compilers will warn you about.Royroyal
@Royroyal I think you're drifting away from the main topic. You said the type of the pointer makes it clear modification is allowed. Solely the type of str does that as well, therefore my counter argument. It's about well-defined and undefined, not about some diagnostic not even specified in the C11 standard, AFAIK. Furthermore, the type of a string literal is char[N], if my wits are still fine.Ethiopia
@cad: Basically, it is the programmer who defines main and is responsible to be compatible. Anyway, taking that declaration as given means *argv is in fact modifyable.Alexander
As I wrote previously doing char *str = "string"; will give you a warning. Of course an 'ordinary' programmer could do that, but compiler/library should comply to some standards and not pass arguments to the main function which are of not correct type.Royroyal
@nsilent22:; The warning is a symptom. The reason they warn is writing to a string literal is undefined behaviour. The question exactly asks if writing to *argv is also UB. (Note that the standard does not state a string literal is const char [], but only that writing to it is UB. Which makes the literal only technically const char []).Alexander
I think this is related:https://mcmap.net/q/398555/-is-argv-n-writable?rq=1Plant
P
10

As OP quoted in the question, the C11 standard explicitly states that the argc and argv variables, and the strings pointed by the argv array, are modifiable. Whether those pointers are modifiable or not, is the question at hand. The standard does not seem to explicitly state it one way or the other.

There are two key points to note about the wording in the standard:

  1. If the pointers were supposed to be immutable, the standard could have made it clear by requiring main to be declared as int main(int argc, char *const argv[]), as haccks mentioned in another answer to this question.

    The fact that nowhere in the standard is const mentioned in association with argv seems deliberate. That is, the lack of const does not seem optional, but dictated by the standard.

  2. The standard calls argv consistently an array. Modifying an array refers to modifying its members. Thus, it seems obvious that the wording in the standard refers to modifying the members in the argv array, when it states that argv is modifiable.

    On the other hand, array parameters in C (based on C11 draft N1570, §6.7.6.3p7) "shall be adjusted to 'qualified pointer to type'". Thus, the following code,

    int foo(int x[2], int y[2])
    {
        if (x[0] > y[0])
            x = y;
        return x[1];
    }
    

    is valid C11, since x and y are adjusted to int *x and int *y, respectively. (This is also reiterated in C11 draft N1570, §6.3.2.1p3: "... array ... is converted to an expression with type 'pointer to type' that points to the initial element of the array ...".) Obviously, the same would not be, if x and y were declared as local or global arrays, not function parameters.

As far as language-lawyerism goes, I'd say the standard does not state it one way or another, although it implies the pointers too should be modifiable. Thus, as an answer to OP: both.


In practice, there is a very long tradition of the pointers in the argv array being modifiable. Many libraries have initialization functions that take a pointer to argc and a pointer to the argv array, and some of them do modify the pointers in the argv array (removing options specific to the library); for example GTK+ gtk_init() and MPI_Init() (although at least OpenMPI explicitly states it does not examine or modify them). Look for parameter declaration (int *argc, char ***argv); the only reason for this -- assuming the intent is to be called from main() using (&argc, &argv) -- is to modify the pointers, to parse and remove the library-specific command-line parameters from the command-line parameters, modifying both argc and the pointers in argv as needed.

(I originally stated that the getopt() facility in POSIX relies on the pointers being modifiable -- the feature dating back to 1980, adopted by most Unix flavours, and standardized in POSIX.2 in 1997 -- but that is incorrect, as Jonathan Leffler pointed out in a comment: POSIX getopt() does not modify the actual pointers; only GNU getopt() does, and it only when the POSIXLY_CORRECT environment variable is not set. Both GNU getopt_long() and BSD getopt_long() modify the pointers unless POSIXLY_CORRECTis set, but they are much younger and less widespread compared to getopt().)

In the Unix land, it was considered "portable" to modify the contents of the strings pointed to by argv[] array, and have the modified strings visible in the process list. One example of how this was useful is in DJB's daemontools package, readproctitle. (Note that the strings would have to be modified in-place, and cannot be extended, for the changes to be visible in the process list.)

All this indicates a very long tradition, basically almost since the birth of C as a programming language, and definitely preceding the standardization of C, of treating argc, argv, the pointers in the argv array, and the contents of the strings pointed to by those pointers, as modifiable.

Because the intent of the C standard is not to define new behaviour, but codify existing behaviour across implementations (to promote portability and reliability and so on), it seems safe to assume that it was an unintended omission on part of the standard writers to not explicitly specify the pointers in the argv array as modifiable. Anything else would break tradition, and be explicitly contrary to the POSIX standard (which is also intended to promote portability across systems, and extends C features not included in the ISO C standard).

Paleozoic answered 31/1, 2016 at 1:26 Comment(6)
Standard getopt() does not rely on argv being modifiable; GNU getopt() does because it permutes the argument list.Matthieu
@JonathanLeffler: Darn, true! I even checked some of the old Unix getopt() implementations, and they too keep the argument list intact. Only GNU getopt() modifies the pointer. (Although, both GNU and BSD getopt_long() do modify the pointers, even if they are marked const, unless POSIXLY_CORRECT environment variable is set.) I'll have to correct my answer.Paleozoic
I replaced the incorrect part about getopt() pointed out by @JonathanLeffler with references to GTK+ and MPI initialization functions (and the signature you can look for other library initialization functions). If you (or anybody else) find any other errors, please do point them out.Paleozoic
The const argument of point #1 is countered in that const was added to C long after main(). A change to main() signature by adding const would break existing code. Augment #2 on it own, is potentially sufficient.Mojgan
String literals may be immutable yet their type is char *, not const char *. C++ changed that (since around 2006 IIRC) but C has not.Frill
It was discusssed in 1998. open-std.org/jtc1/sc22/wg14/www/docs/n849.htm (look for all references to argv).Frill
C
0

Whether a pointer is modifiable or not depends on constness of the pointer. The parameter argv is declared as char *argv[] or char **argv. It depends on the environment whether they treat this as char *const argv[] or not (I am not aware of any).

Coralline answered 30/1, 2016 at 19:30 Comment(4)
But can you really make statements about the pointers being writable just from knowing they are not const? See my comment to this question.Ethiopia
@cad; You asked in a question about modification of pointer, not the content it points to: Are the pointers to strings in argv modifiable?. char const *a and char *const a both have different meanings.Coralline
But then what do the first two sentences of your answer tell me? They basically tell me that the pointers to the strings are modifiable because they are declared accordingly. So you say they are modifiable depending on the implementation?Ethiopia
@cad; If you declare char * p = "abcd", then you can't do p[0] = 'e' but you can modify pointer p itself. p = "Hello". But, if in some environment, anyhow, char * p = "abcd" is interpreted as char * const p then its not possible to modify p.Coralline

© 2022 - 2024 — McMap. All rights reserved.