Questions about putenv() and setenv()
Asked Answered
C

5

43

I have been thinking a little about environment variables and have a few questions/observations.

  • putenv(char *string);

    This call seems fatally flawed. Because it doesn't copy the passed string you can't call it with a local and there is no guarantee a heap allocated string won't be overwritten or accidentally deleted. Furthermore (though I haven't tested it), since one use of environment variables is to pass values to child's environment this seems useless if the child calls one of the exec*() functions. Am I wrong in that?

  • The Linux man page indicates that glibc 2.0-2.1.1 abandoned the above behavior and began copying the string but this led to a memory leak that was fixed in glibc 2.1.2. It's not clear to me what this memory leak was or how it was fixed.

  • setenv() copies the string but I don't know exactly how that works. Space for the environment is allocated when the process loads but it is fixed. Is there some (arbitrary?) convention at work here? For example, allocating more slots in the env string pointer array than currently used and moving the null terminating pointer down as needed? Is the memory for the new (copied) string allocated in the address space of the environment itself and if it is too big to fit you just get ENOMEM?

  • Considering the above issues, is there any reason to prefer putenv() over setenv()?

Cockpit answered 3/5, 2011 at 17:3 Comment(0)
G
48
  • [The] putenv(char *string); [...] call seems fatally flawed.

Yes, it is fatally flawed. It was preserved in POSIX (1988) because that was the prior art. The setenv() mechanism arrived later. Correction: The POSIX 1990 standard says in §B.4.6.1 "Additional functions putenv() and clearenv() were considered but rejected". The Single Unix Specification (SUS) version 2 from 1997 lists putenv() but not setenv() or unsetenv(). The next revision (2004) did define both setenv() and unsetenv() as well.

Because it doesn't copy the passed string you can't call it with a local and there is no guarantee a heap allocated string won't be overwritten or accidentally deleted.

You're correct that a local variable is almost invariably a bad choice to pass to putenv() — the exceptions are obscure to the point of almost not existing. If the string is allocated on the heap (with malloc() et al), you must ensure that your code does not modify it. If it does, it is modifying the environment at the same time.

Furthermore (though I haven't tested it), since one use of environment variables is to pass values to child's environment this seems useless if the child calls one of the exec*() functions. Am I wrong in that?

The exec*() functions make a copy of the environment and pass that to the executed process. There's no problem there.

The Linux man page indicates that glibc 2.0-2.1.1 abandoned the above behavior and began copying the string but this led to a memory leak that was fixed in glibc 2.1.2. It's not clear to me what this memory leak was or how it was fixed.

The memory leak arises because once you have called putenv() with a string, you cannot use that string again for any purpose because you can't tell whether it is still in use, though you could modify the value by overwriting it (with indeterminate results if you change the name to that of an environment variable found at another position in the environment). So, if you have allocated space, the classic putenv() leaks it if you change the variable again. When putenv() began to copy data, allocated variables became unreferenced because putenv() no longer kept a reference to the argument, but the user expected that the environment would be referencing it, so the memory was leaked. I'm not sure what the fix was — I would 3/4 expect it was to revert to the old behaviour.

setenv() copies the string but I don't know exactly how that works. Space for the environment is allocated when the process loads but it is fixed.

The original environment space is fixed; when you start modifying it, the rules change. Even with putenv(), the original environment is modified and could grow as a result of adding new variables, or as a result of changing existing variables to have longer values.

Is there some (arbitrary?) convention at work here? For example, allocating more slots in the env string pointer array than currently used and moving the null terminating pointer down as needed?

That is what the setenv() mechanism is likely to do. The (global) variable environ points to the start of the array of pointers to environment variables. If it points to one block of memory at one time and a different block at a different time, then the environment is switched, just like that.

Is the memory for the new (copied) string allocated in the address space of the environment itself and if it is too big to fit you just get ENOMEM?

Well, yes, you could get ENOMEM, but you'd have to be trying pretty hard. And if you grow the environment too large, you may be unable to exec other programs properly - either the environment will be truncated or the exec operation will fail.

Considering the above issues, is there any reason to prefer putenv() over setenv()?

  • Use setenv() in new code.
  • Update old code to use setenv(), but don't make it a top priority.
  • Do not use putenv() in new code.
Ganda answered 3/5, 2011 at 23:16 Comment(6)
Just in case: If calling putenv() with a local variable, then it should be a top priority to replace putenv() with setenv().Kelantan
@Yeow_Meng: well, sort of...the code was broken before, so it is unlikely that there are many people doing that, simply because it would be broken.Ganda
@Yeow_Meng: Unless there is an exec() (or, improbably, an exit()) reached in the same scope. E.g. github.com/apk/c-utils/blob/… is ok, and putenv is the best choice here, because you already happen to have the NAME=value string.Hydrogenate
@JonathanLeffler Only it is worse than that… setenv() followed by unsetenv() is a memory leak. setenv() followed by setenv() with the same key may leak as well, plus/minus quality of implementation. putenv() with manual free() before overwrite and at the time of unsetenv() could be used in a leak-free way, but then you'd need to keep track of which key-value-pairs are actually heap-allocated, using an external data structure, or risk undefined behavior, etc. – it's a mess. :-)Phineas
@ArneVogel — the setenv() and unsetenv() functions should not cause memory leaks. Internal to the *env() function package, there are a lot of unpleasant details to be managed, as you hint, and managing them requires more information than just the array of pointers exposed via environ. That exposed variable opens up a back door (though modifying it invokes undefined behaviour). It becomes a QoI (quality of implementation) issue. Poor (but simple) implementations easily leak memory. Higher quality implementations avoid most leaks, but can be forced to leak with ill-behaved programs.Ganda
There's no arguing that this should not leak from a moral perspective, but note the putenv() rationale in POSIX 2008: "The standard developers noted that putenv() is the only function available to add to the environment without permitting memory leaks."; While these weasel words are no longer present since the 2013 edition – see ERN 1086, the ERN did not at all address the original claim that setenv() may leak.Phineas
O
5

Read the RATIONALE section of the setenv man page from The Open Group Base Specifications Issue 6.

putenv and setenv are both supposed to be POSIX compliant. If you have code with putenv in it, and the code works well, leave it alone. If you are developing new code you may want to consider setenv.

Look at the glibc source code if you want to see an example of an implementation of setenv (stdlib/setenv.c) or putenv (stdlib/putenv.c).

Osculate answered 3/5, 2011 at 17:56 Comment(0)
P
5

There is no special "the environment" space - setenv just dynamically allocates space for the strings (with malloc for example) as you would do normally. Because the environment doesn't contain any indication of where each string in it came from, it is impossible for setenv or unsetenv to free any space which may have been dynamically allocated by previous calls to setenv.

"Because it doesn't copy the passed string you can't call it with a local and there is no guarantee a heap allocated string won't be overwritten or accidentally deleted." The purpose of putenv is to make sure that if you have a heap-allocated string it's possible to delete it on purpose. That's what the rationale text means by "the only function available to add to the environment without permitting memory leaks." And yes, you can call it with a local, just remove the string from the environment (putenv("FOO=") or unsetenv) before you return from the function.

The point is that using putenv makes the process of removing a string from the environment entirely deterministic. Whereas setenv will on some existing implementations modify an existing string in the environment if the new value is shorter (to avoid always leaking memory), and since it made a copy when you called setenv you're not in control of the originally dynamically allocated string so you can't free it when it's removed.

Meanwhile, setenv itself (or unsetenv) can't free the previous string, since - even ignoring putenv - the string may have come from the original environment instead of being allocated by a previous invocation of setenv.

(This whole answer assumes a correctly implemented putenv, i.e. not the one in glibc 2.0-2.1.1 you mentioned.)

Patsis answered 4/5, 2011 at 3:58 Comment(0)
J
4

Furthermore (though I haven't tested it), since one use of environment variables is to pass values to child's environment this seems useless if the child calls one of the exec() functions. Am I wrong in that?

That's not how the environment is passed to the child. All of the various flavors of exec() (which you find in section 3 of the manual beause they are library functions) ultimately invoke the system call execve() (which you find in section 2 of the manual). The arguments are:

   int execve(const char *filename, char *const argv[], char *const envp[]);

The vector of environment variables is passed explicitly (and may be partly constructed from the results of your putenv() and setenv() calls). The kernel copies these into the address space of the new process. Historically there was a limit to the size of your environment derived from the space available for this copy (similar to the argument limit) but I'm not familiar with the restrictions on a modern Linux kernel.

Junior answered 3/5, 2011 at 21:26 Comment(0)
B
3

I would highly recommend against using either of these functions. Either can be used safely and without leaks, as long as you're careful and only one part of your code is responsible for modifying the environment, but it's hard to get right and dangerous if any code might be using threads and might read the environment (e.g. for timezone, locale, dns config, etc. purposes).

The only two purposes I can think of for modifying the environment are to change the timezone at runtime, or to pass a modified environment to child processes. For the former, you probably have to use one of these functions (setenv/putenv), or you could walk environ manually to change it (this might be safer if you're worried other threads could try to read the environment at the same time). For the latter use (child processes), use one of the exec-family functions that lets you specify your own environment array, or simply clobber environ (the global) or use setenv/putenv in the child process after fork but before exec, in which case you don't have to care about memory-leaks or thread-safety because there are no other threads and you're about to destroy your address space and replace it with a new process image.

Bouleversement answered 3/5, 2011 at 18:25 Comment(2)
this might be a bit off topic, but if using vfork() to fork a child, then modify environment, then exec(), would the parent environment be modified?Endotoxin
It's undefined. In the worst case, you don't just modify it, but horribly corrupt the parent's state. This could happen, for instance, if you use setenv in the vfork child and setenv calls malloc. Just don't even consider doing something like that. Note that there's no reason to modify the environment before calling exec; just use one of the forms of exec that lets you pass a new environment pointer.Bouleversement

© 2022 - 2024 — McMap. All rights reserved.