why does the C readdir man page say to not call free on the static allocated result struct
Asked Answered
A

3

5

$ uname -a

Linux crowsnest 2.6.32-28-generic #55-Ubuntu SMP Mon Jan 10 23:42:43 UTC 2011 x86_64 GNU/Linux

$ man readdir:

DESCRIPTION

The readdir() function returns a pointer to a dirent structure representing the next directory entry in the directory stream pointed to by dirp...

..[snip]...

The readdir_r() function is a reentrant version of readdir()...

...[snip]...

RETURN VALUE

On success, readdir() returns a pointer to a dirent structure. (This structure may be statically allocated; do not attempt to free(3) it.) If the end of the directory stream is reached, NULL is returned and errno is not changed. If an error occurs, NULL is returned and errno is set appropriately.

The readdir_r() function returns 0 on success. On error, it returns a positive error number. If the end of the directory stream is reached, readdir_r() returns 0, and returns NULL in *result.

I'm confused about what this means, my application of this function is to collect a dynamically allocated array of pointers to structs with data about the directory entries, and I'm wondering if I can dynamically allocate dirent structs and set the pointers to them. but this line seams to say that the result should never be called by free, so I'm wondering if I should allocate a seperate dirent struct which will be part of the list and memcpy it over the returned result.

I'm also confused by the terminology of "may" in the above man page. does this mean that somtimes it's statically allocated, and sometimes it's not.

I'm familiar, (vaguely) with what static variables mean in C, but not sure about all the rules and possible gotcha's arround them. because I want to pass the dirent structs that are in a directory around, I would rather it be dynamically allocated. is this what readdir_r is for? or will the double pointer be set to point to another statically allocated dirent struct?

and I'm not entirely sure what reentrant means in this context for readdir_r. my understanding of renetrant is only from scheme coroutines which I'm not sure how that would apply to reading unix directories.

Absa answered 23/8, 2011 at 9:23 Comment(9)
It should be noted that the documentation claiming readdir_r is reentrant is simply wrong. The author of the man page does not understand the difference between reentrant and thread-safe. readdir_r certainly uses buffered reading through the DIR struct (requesting one entry at a time from the kernel would be very slow) and thus has to hold a lock on this structure, which makes it non-reentrant.Pyromancy
@R..: but as Posix says over and over, "a function that is not required to be reentrant is not required to be thread-safe". Therefore, a function that is required to be thread-safe is required to be reentrant, by whatever definition Posix uses for reentrant. Other people may use other definitions of reentrant: I guess that since Posix forbids any code that might re-enter readdir_r in a single thread, and it does require thread-safety, it considers it trivially reentrant in some sense.Smile
@Steve: This bug was fixed in POSIX 2008 which no longer has the nonsensical wording. Instead, it now reads simply: "The readdir() function need not be thread-safe."Pyromancy
@R.. ah, well, if Posix changed the definition of reentrant in 2008, it's not necessarily surprising if a man page hasn't caught up.Smile
POSIX 2008 didn't really change the definition; it simply removed the erroneous use of the word "reentrant".Pyromancy
@R..: erroneous? The mantra, "a function that is not required to be reentrant is not required to be thread-safe" appeared all over previous versions of Posix. Are they claiming it was some kind of epic typo, or just that they wish they hadn't previously required all thread-safe functions to be reentrant? Is it a breaking or a non-breaking change for programs, i.e. were there any functions that previously were required to be reentrant, no longer are, and which it's actually permissible to reenter via recursion or signal handler?Smile
I suppose it's potentially also worth distinguishing between a function that's reentrant from the POV of the user, and one that's reentrant from the POV of other parts of the implementation. It's possible to write a function that calls out to two bits of code, one a user callback and the other another function in libc, and have it so that reentrancy by the program is OK, but from the libc call not (e.g. a lock is held across the libc call only). Presumably as far as the Posix standard is concerned, that function then is reentrant, it doesn't care about implementation details.Smile
POSIX does not define "reentrant", but presumably it also entails the ability to reenter the function from a signal handler (i.e. async-signal-safety). Your example does not meet this condition unless the function blocks all signals for the duration of the lock (which is a costly but trivial way to make any function without callbacks "reentrant"). Incidentally, this operation would be free if the kernel would just put the signal mask and other process data that doesn't need security in a page mapped to userspace so that a userspace function could perform the atomic changes on it...Pyromancy
@R..: odd. It was never the case that all thread-safe functions in Posix were async-signal-safe, so either Posix had a definition (since removed) of reentrant that didn't include async-signal-safety, or else it used the term without definition in a way that contradicts your "presumably", or else it repeated a false statement many times all over the C library definitions. Do you know which? I always assumed that reentrant meant "reentrant, provided that the program is otherwise valid", and so non-async-signal-safe functions still qualified if other routes of reentrancy were OK.Smile
F
6

The rule here is really simple -- you're free to make a copy of the data readdir() returns, however you don't own the buffer it puts that data in so you cannot take actions that suggest you do. (I.e., copy the data out to your own buffer; don't store a pointer to within the readdir-owned buffer.)

so I'm wondering if I should allocate a seperate dirent struct which will be part of the list and memcpy it over the returned result - that's exactly what you should do.

I'm also confused by the terminology of "may" in the above man page. does this mean that somtimes it's statically allocated, and sometimes it's not. - it means you cannot count on how it will be managed, but it will be managed for you. The details could vary from one system to the next.

Reentrant means thread-safe. readdir() uses a static entry, making it not safe for multiple threads to use as if they each control the multi-call process. readdir_r() will use allocated space provided by the caller, letting multiple threads act independently.

Flavourful answered 23/8, 2011 at 9:31 Comment(1)
Reentrant and thread-safe are actually quite different. malloc is thread-safe but not reentrant. And conversely, any function that uses static storage but backs up and restores the static data in automatic storage on entry/exit is reentrant but not likely to be thread-safe.Pyromancy
S
7

The structure might be statically-allocated, it might be thread-local, it might be dynamically allocated. That's up to the implementation. But no matter what, it's not yours to free, which is why you must not free it.

readdir_r doesn't allocate anything for you, you give it a dirent, allocated however you like, and it fills it in. Therefore it does save you a little bit of effort compared with calling readdir and copying the dir data. That's not the main purpose of readdir_r, though, what it's actually for is the ability to make calls from different threads at the same time, which you can't do with readdir.

What "reentrant" actually means, is that the function can be called again before a previous call to it has returned. In general, this might mean from a different thread (which is what most people mean by "thread-safe"), from a handler for a signal that occurred during the first call, or due to recursion. But the C standard has no concept of threads, so it mentions "reentrant" meaning only the latter two. Posix defines "thread-safe" to require this form of reentrancy and, in addition, the thing that most people mean by thread-safe.

In Posix, every function required to be thread-safe is required to be reentrant, and readdir_r is required to be thread-safe. I think reentrancy in the weaker sense is irrelevant to readdir_r, since it doesn't call any user code that could result in recursion, and it's not async-signal-safe so it must not be called from a signal handler either.

Beware, because when some people (Java programmers) say "thread-safe", they mean that the function can be called by different threads on the same arguments at the same time, and will use locks to work correctly. Posix APIs do not mean this by thread-safe, they only mean that the function can be called on different data at the same time. Any global data that the function uses is protected by locks or otherwise, but the arguments need not be.

Smile answered 23/8, 2011 at 9:32 Comment(0)
F
6

First question

It means readdir could have something like this:

struct dirent *
readdir(DIR *dirp)
{
    static struct dirent;
    /* Do stuff. */

    return &dirent;
}

Clearly it would be illegal to free it (since you didn't obtain it via malloc).

The standard doesn't force anyone to do it like this. An implementation could use its own mechanism (perhaps malloc and free later on its own).

Second question

"Reentrant" means that while we are inside readdir_r, the function can be safely called again (for example from a signal handler). For instance, readdir isn't reentrant. Suppose this happens:

  • You call readdir(dir); and it starts modifying dirent
  • BEFORE it is done, it is interrupted and someone else calls it (from an async context)
  • Its version modifies dirent, returns and the async context goes on its way
  • Your version returns. What does dirent contain ?

Reentrant functions are a godsend, they are always safe to call.

Field answered 23/8, 2011 at 9:27 Comment(3)
To clarify for the OP (who has admitted to a "vague" familiarity with static variables): You cannot call readdir(), store the value returned in a pointer, call it again and store the second value. If the implementation is as described in cnicutar's answer, the 2nd call will modify the contents of the variable pointed to by the address returned the first time. You will need to copy the struct whose address is returned by readdir. You will be copying it into a struct that you have malloced, and you can (and should) free that when you are done with it.Procrastinate
@William Pursell I edited my answer; feel free to edit if you think you can further clarify.Field
@Field FTR, the example given isn't possible - "The returned pointer... shall not be affected by a call to readdir() on a different directory stream." The actual dirent returned has to be stashed in each DIR, not globally.Aubarta
F
6

The rule here is really simple -- you're free to make a copy of the data readdir() returns, however you don't own the buffer it puts that data in so you cannot take actions that suggest you do. (I.e., copy the data out to your own buffer; don't store a pointer to within the readdir-owned buffer.)

so I'm wondering if I should allocate a seperate dirent struct which will be part of the list and memcpy it over the returned result - that's exactly what you should do.

I'm also confused by the terminology of "may" in the above man page. does this mean that somtimes it's statically allocated, and sometimes it's not. - it means you cannot count on how it will be managed, but it will be managed for you. The details could vary from one system to the next.

Reentrant means thread-safe. readdir() uses a static entry, making it not safe for multiple threads to use as if they each control the multi-call process. readdir_r() will use allocated space provided by the caller, letting multiple threads act independently.

Flavourful answered 23/8, 2011 at 9:31 Comment(1)
Reentrant and thread-safe are actually quite different. malloc is thread-safe but not reentrant. And conversely, any function that uses static storage but backs up and restores the static data in automatic storage on entry/exit is reentrant but not likely to be thread-safe.Pyromancy

© 2022 - 2024 — McMap. All rights reserved.