Reallocation of argv
Asked Answered
C

2

6

I was looking at the code for the GNU coreutils package, specifically the program 'yes', when I saw this block of code in the main function (line 78):

if (argc <= optind)
{
  optind = argc;
  argv[argc++] = bad_cast ("y");
}

How can the array argv be expanded this way? Obviously, just taking any code snippet out of context is a really bad idea, so I looked to see if argv is modified at all beforehand, and it doesn't seem to be except in a call to initialize_main (&argc, &argv), which doesn't seem like it takes an argument for a "new size" or anything like that (but in C, like any language, things aren't always what they seem to be).

I decided to write a simple program to test if I could call realloc() on argv

char** new_argv = realloc(argv, ++argc * sizeof*argv);

And it worked (with VS2013 on Windows 10). It returned a pointer to the allocated memory. Of course, that doesn't actually mean anything if it's undefined behavior.

So, long story short, my question is, how is argv allocated? Is it actually safe to realloc argv?

Cylix answered 15/8, 2015 at 16:29 Comment(2)
I'm pretty sure it's unspecified thus UB; you'd better malloc + memcpy. Consider that coreutils is written for a specific platform, where some behaviors are known.Dipole
argv is an array of pointers to char (owned by the program), the last one, i.e. argv[argc] is supposed to be a null pointer, so it can be made to point to something else.Radiothermy
M
3
argv[argc++] = bad_cast ("y");

This does not expand the argv array. It merely assigns a value to argv[argc] and then increments argc. This does break the initial guarantee that argv[argc] == NULL, but as long as the code doesn't rely on that it's valid.

The standard guarantees that:

The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.

It does not explicitly guarantee that the char* pointers in the array that argv points to are modifiable, but it's a reasonable assumption that they are. (Strictly speaking argv points to the first element of the array, not to the array itself, but that sentence was long enough already.)

char** new_argv = realloc(argv, ++argc * sizeof*argv);

This has undefined behavior. The first argument to realloc must be either a null pointer or a pointer to memory allocated by malloc, calloc, realloc, or equivalent. The memory pointed to by argv is allocated in some unspecified manner before main is entered. You can make a copy of the array, but you can't legally deallocate it, which is part of what realloc does. If the realloc call behaved "correctly" for you, you were just unlucky. (If you'd been lucky, your program would have crashed, which would have told you that there's a problem.)

Metzger answered 15/8, 2015 at 18:17 Comment(2)
I like this answer better because it's more in-depth and more directly answers my question.Cylix
Excellent answer! Especially UB and unlucky / lucky part.Dissolvent
P
5

First, argv[argc] is defined to be NULL.

Second, argc++ increments argc but returns its old value.

Thus, argv[argc++] = ... doesn't invoke undefined behaviour; it simply assigns a new value to a previously NULL pointer.

Paiz answered 15/8, 2015 at 16:37 Comment(2)
Is argv[argc] it guaranteed to be NULL?Cylix
@ThePcLuddite it is, From the Standard 5.1.2.2.1 Program startup - 2 argv[argc] shall be a null pointer.Radiothermy
M
3
argv[argc++] = bad_cast ("y");

This does not expand the argv array. It merely assigns a value to argv[argc] and then increments argc. This does break the initial guarantee that argv[argc] == NULL, but as long as the code doesn't rely on that it's valid.

The standard guarantees that:

The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.

It does not explicitly guarantee that the char* pointers in the array that argv points to are modifiable, but it's a reasonable assumption that they are. (Strictly speaking argv points to the first element of the array, not to the array itself, but that sentence was long enough already.)

char** new_argv = realloc(argv, ++argc * sizeof*argv);

This has undefined behavior. The first argument to realloc must be either a null pointer or a pointer to memory allocated by malloc, calloc, realloc, or equivalent. The memory pointed to by argv is allocated in some unspecified manner before main is entered. You can make a copy of the array, but you can't legally deallocate it, which is part of what realloc does. If the realloc call behaved "correctly" for you, you were just unlucky. (If you'd been lucky, your program would have crashed, which would have told you that there's a problem.)

Metzger answered 15/8, 2015 at 18:17 Comment(2)
I like this answer better because it's more in-depth and more directly answers my question.Cylix
Excellent answer! Especially UB and unlucky / lucky part.Dissolvent

© 2022 - 2024 — McMap. All rights reserved.