Difference between (*++argv)[0] and while(c = *++argv[0])
Asked Answered
E

5

13

I have the following snippet of code:

int main(int argc, char *argv[])
{   

     char line[MAXLINE];
     long lineno = 0;
     int c, except = 0, number = 0, found = 0;

     while(--argc > 0 && (*++argv)[0] == '-') //These two lines
        while(c = *++argv[0])                 //These two lines
          switch(c) {
             case 'x':
                  except = 1;
                  break;
             case 'n':
                  number = 1;
                  break;
             default:
                  printf("find: illegal option %c\n", c);
                  argc = 0;
                  found = -1;
                  break;
          }

     ...
}

Containing the following expressions:

while(--argc > 0 && (*++argv)[0] == '-')

Does this expression in the parentheses (*++argv)[0] differ from while(c = *++argv[0]) without parentheses?

If so, how? Does (*++argv) mean pointer to the next argument, and does *++argv[0] mean pointer to the next character in the current char array which is being pointed to?

Evvoia answered 7/1, 2010 at 14:26 Comment(2)
Im also interested about one thing: while(c = *++argv[0]) this expression. Does this actually means: while(c = *++argv[0] != 0), i mean does *++argv[0] returns a null pointer to c if it hasnt found a character?Evvoia
As noted in my answer, see K&R's errata entry on this code: cm.bell-labs.com/cm/cs/cbook/2ediffs.htmlPhotobathic
P
40

First, K&R have an errata on this particular snippet:

117(§5.10): In the find example, the program increments argv[0]. This is not specifically forbidden, but not specifically allowed either.

Now for the explanation.

Let's say your program is named prog, and you execute it with: prog -ab -c Hello World. You want to be able to parse the arguments to say that options a, b and c were specified, and Hello and World are the non-option arguments.

argv is of type char **—remember that an array parameter in a function is the same as a pointer. At program invocation, things look like this:

                 +---+         +---+---+---+---+---+
 argv ---------->| 0 |-------->| p | r | o | g | 0 |
                 +---+         +---+---+---+---+---+
                 | 1 |-------->| - | a | b | 0 |
                 +---+         +---+---+---+---+
                 | 2 |-------->| - | c | 0 |
                 +---+         +---+---+---+---+---+---+
                 | 3 |-------->| H | e | l | l | o | 0 |
                 +---+         +---+---+---+---+---+---+
                 | 4 |-------->| W | o | r | l | d | 0 |
                 +---+         +---+---+---+---+---+---+
                 | 5 |-------->NULL
                 +---+

Here, argc is 5, and argv[argc] is NULL. At the beginning, argv[0] is a char * containing the string "prog".

In (*++argv)[0], because of the parentheses, argv is incremented first, and then dereferenced. The effect of the increment is to move that argv ----------> arrow "one block down", to point to the 1. The effect of dereferencing is to get a pointer to the first commandline argument, -ab. Finally, we take the first character ([0] in (*++argv)[0]) of this string, and test it to see if it is '-', because that denotes the start of an option.

For the second construct, we actually want to walk down the string pointed to by the current argv[0] pointer. So, we need to treat argv[0] as a pointer, ignore its first character (that is '-' as we just tested), and look at the other characters:

++(argv[0]) will increment argv[0], to get a pointer to the first non- - character, and dereferencing it will give us the value of that character. So we get *++(argv[0]). But since in C, [] binds more tightly than ++, we can actually get rid of the parentheses and get our expression as *++argv[0]. We want to continue processing this character until it's 0 (the last character box in each of the rows in the above picture).

The expression

c = *++argv[0]

assigns to c the value of the current option, and has the value c. while(c) is a shorthand for while(c != 0), so the while(c = *++argv[0]) line is basically assigning the value of the current option to c and testing it to see if we have reached the end of the current command-line argument.

At the end of this loop, argv will point to the first non-option argument:

                 +---+         +---+---+---+---+---+
                 | 0 |-------->| p | r | o | g | 0 |
                 +---+         +---+---+---+---+---+
                 | 1 |-------->| - | a | b | 0 |
                 +---+         +---+---+---+---+
                 | 2 |-------->| - | c | 0 |
                 +---+         +---+---+---+---+---+---+
 argv ---------->| 3 |-------->| H | e | l | l | o | 0 |
                 +---+         +---+---+---+---+---+---+
                 | 4 |-------->| W | o | r | l | d | 0 |
                 +---+         +---+---+---+---+---+---+
                 | 5 |-------->NULL
                 +---+

Does this help?

Photobathic answered 7/1, 2010 at 15:22 Comment(3)
@Alok could you explain this step:: if((strstr(line, *argv) != NULL) != except) ?Positive
@AbhimanyuAryan strstr(a, b) checks if the string b exists in a. It returns NULL if b is not in a. So, strstr(line, *argv) != NULL checks if the string pointed to by argv is in line, and has the value 1 if it is, and 0 if it isn't. except was set to 1 or 0 earlier based on the presence of x flag.Photobathic
Thanks a lot for the careful step-by-step explanation. I had been pulling my hair over this for the last one hour. Now it's as clear as a crystal. Thanks.Fluxion
O
5

yes, you are correct.

while(--argc > 0 && (*++argv)[0] == '-')

is scanning the array (of length argc) of command line arguments one by one looking for those starting with a - option prefix. For each of those:

while(c = *++argv[0])

is scanning through the set of switch characters that follow the first - in the current argument (i.e. t and n in -tn, until it hits the string null terminator \0, which terminates the while loop, since it evaluates as false.

This design allows both

myApp -t -n

and

myApp -tn

to both work and be understood as having the options t and n.

Orbit answered 7/1, 2010 at 14:32 Comment(1)
This design is simple and mostly reasonable, apart from the fact it modifies argc, and the contents of the array argv, which is poor design since it prevents any further use of these variables.Orbit
F
5

Incrementing argv is a very bad idea, as once you have done so it is difficult to get the original value back. It is simpler, clearer and better to use an integer index - after all argv IS an array!

To answer your question ++argv increments the pointer. This then has indirection applied to it to get the first character.

Fissure answered 7/1, 2010 at 14:34 Comment(2)
Actually, the indirection starts with the first character after the -, and each cycle it moves onto the next, to support clusters of option flags after a single - character.Orbit
I was referring to (*++argv)[0] == '-'Fissure
I
4

The parentheses change the order in which the expressions are evaluated.

Without parentheses *++argv[0]:

  1. argv[0] gets the pointer to character data currently pointed to by argv.
  2. ++ increments that pointer to the next character in the character array.
  3. * gets the character.

with parentheses (*++argv)[0]:

  1. ++argv increments the argv pointer to point to the next argument.
  2. * defereferences it to obtain a pointer to the character data.
  3. [0] gets the first character in the character array.
Imf answered 7/1, 2010 at 15:0 Comment(0)
S
2

Yes, the two expressions differ (though only slightly). IMO, this code is a bit on the excessively clever side. You'd be better off with something like this:

for (int i=1; i<argc; i++)
    if (argv[i][0] == '-') {
       size_t len = strlen(argv[i]);
       for (int j=0; j<len; ++j)
           switch(argv[i][j]) {
               case 'x':
               // ...

This is pretty much equivalent to the code above, but I doubt anybody (who knows C at all) would have any difficulty figuring out what it really does.

Seraphim answered 7/1, 2010 at 14:34 Comment(5)
but this code would not detect chains of options - you need another iterator to walk the chain of options -tnOrbit
@Alex Brown: I believe I've fixed that -- though I'm not sure it's necessarily any real improvement. Allowing -tn instead of -t -n would have meant a fair amount when a typical terminal was a Teletype, but it's hardly worthwhile anymore.Seraphim
@Jerry It's entirely worthwhile. Every command-line user expects to be able to provide single-letter options in a group, and being lazy about the code means violating those strongly held expectations. The deeper issue here is the custom-coding of this functionality, rather than the use of getopt or similar.Prismatoid
@Novelocrat: I'm afraid I can't really agree -- quite a few command line tools either don't allow clustered arguments at all, or have specific limitations about what arguments can and can't be clustered. Nobody with any substantial amount of experience can honestly have much expectation about this subject. Given that it only supports two arguments, neither with any associated parameter, I can see where using getopt would probably make the code more complex, so I can understand not using it, even though I agree that it probably should anyway.Seraphim
you try taking tar -xvzf a.tar.gz away from me and see what happens. or ls -laTr, or ps -elF, etc, etc.Orbit

© 2022 - 2024 — McMap. All rights reserved.