gcc attributes for init-on-first-use functions
Asked Answered
H

1

7

I've been using the gcc const and pure attributes for functions which return a pointer to "constant" data that's allocated and initialized on the first use, i.e. where the function will return the same value each time it's called. As an example (not my usage case, but a well-known example) think of a function that allocates and computes trig lookup tables on the first call and just returns a pointer to the existing tables after the first call.

The problem: I've been told this usage is incorrect because these attributes forbid side effects, and that the compiler could even optimize out the call completely in some cases if the return value is not used. Is my usage of const/pure attributes safe, or is there any other way to tell the compiler that N>1 calls to the function are equivalent to 1 call to the function, but that 1 call to the function is not equivalent to 0 calls to the function? Or in other words, that the function only has side effects the first time it's called?

Hysterectomize answered 29/7, 2011 at 1:26 Comment(3)
Are you sure it's a problem? If the call is optimized out, then the data will be created next time instead. As long as the data can only be accessed through the return value, there should be nothing wrong with this.Whiteley
pure is GCC-specific, but const is not.Fraga
__attribute__((const)) is also a gcc-ism, but more widely supported by non-gcc compilers...Hysterectomize
G
7

I say this is correct based on my understanding of pure and const, but if anyone has a precise definition of the two, please speak up. This gets tricky because the GCC documentation doesn't lay out exactly what it means for a function to have "no effects except the return value" (for pure) or to "not examine any values except their arguments" (for const). Obviously all functions have some effects (they use processor cycles, modify memory) and examine some values (the function code, constants).

"Side effects" would have to be defined in terms of the semantics of the C programming language, but we can guess what the GCC folks mean based on the purpose of these attributes, which is to enable additional optimizations (at least, that's what I assume they are for).

Forgive me if some of the following is too basic...

Pure functions can participate in common subexpression elimination. Their feature is that they don't modify the environment, so the compiler is free to call it fewer times without changing the semantics of the program.

z = f(x);
y = f(x);

becomes:

z = y = f(x);

Or gets eliminated entirely if z and y are unused.

So my best guess is that a working definition of "pure" is "any function which can be called fewer times without changing the semantics of the program". However, function calls may not be moved, e.g.,

size_t l = strlen(str); // strlen is pure
*some_ptr = '\0';
// Obviously, strlen can't be moved here...

Const functions can be reordered, because they do not depend on the dynamic environment.

// Assuming x and y not aliased, sin can be moved anywhere
*some_ptr = '\0';
double y = sin(x);
*other_ptr = '\0';

So my best guess is that a working definition of "const" is "any function which can be called at any point without changing the semantics of the program". However, there is a danger:

__attribute__((const))
double big_math_func(double x, double theta, double iota)
{
    static double table[512];
    static bool initted = false;
    if (!initted) {
        ...
        initted = true;
    }
    ...
    return result;
}

Since it's const, the compiler could reorder it...

pthread_mutex_lock(&mutex);
...
z = big_math_func(x, theta, iota);
...
pthread_mutex_unlock(&mutex);
// big_math_func might go here, if the compiler wants to

In this case, it could be called simultaneously from two processors even though it only appears inside a critical section in your code. Then the processor could decide to postpone changes to table after a change to initted already went through, which is bad news. You can solve this with memory barriers or pthread_once.

I don't think this bug will ever show up on x86, and I don't think it shows up on many systems that don't have multiple physical processors (not cores). So it will work fine for ages and then fail suddenly on a dual-socket POWER computer.

Conclusion: The advantage of these definitions is that they make it clear what kind of changes the compiler is allowed to make in the presence of these attributes, which (I think is) somewhat vague in the GCC docs. The disadvantage is that it's not clear that these are the definitions used by the GCC team.

If you look at the Haskell language specification, for example, you'll find a much more precise definition of purity, since purity is so important to the Haskell language.

Edit: I have not been able to coerce GCC or Clang into moving a solitary __attribute__((const)) function call across another function call, but it seems entirely possible that in the future, something like that would happen. Remember when -fstrict-aliasing became the default, and everybody suddenly had a lot more bugs in their programs? It's stuff like that that makes me cautious.

It seems to me that when you mark a function __attribute__((const)), you are promising the compiler that the result of the function call is the same no matter when it is called during your program's execution, as long as the parameters are the same.

However, I did come up with a way of moving a const function out of a critical section, although the way I did it could be called "cheating" of a sort.

__attribute__((const))
extern int const_func(int x);

int func(int x)
{
    int y1, y2;
    y1 = const_func(x);
    pthread_mutex_lock(&mutex);
    y2 = const_func(x);
    pthread_mutex_unlock(&mutex);
    return y1 + y2;
}

The compiler translates this into the following code (from the assembly):

int func(int x)
{
    int y;
    y = const_func(x);
    pthread_mutex_lock(&mutex);
    pthread_mutex_unlock(&mutex);
    return y * 2;
}

Note that this won't happen with only __attribute__((pure)), the const attribute and only the const attribute triggers this behavior.

As you can see, the call inside the critical section disappeared. It seems rather arbitrary that the earlier call was kept, and I would not be willing to wager that the compiler won't, in some future version, make a different decision about which call to keep, or whether it might move the function call somewhere else entirely.

Conclusion 2: Tread carefully, because if you don't know what promises you are making to the compiler, a future version of the compiler might surprise you.

Greeting answered 29/7, 2011 at 2:52 Comment(11)
In your example with the critical section, wouldn't a memory barrier (which is typically part of a mutex's implementation) using store-release semantics on the unlock call prevent the re-location of the call to big_math_func from inside the critical section to outside the critical section?Stanhope
@Jason: A memory barrier only prevents the processor (not the compiler) from reordering certain operations. For comparison, the volatile keyword prevents the compiler (not the processor) from reordering certain operations. When you write code for multiprocessor systems (especially at a low level), it helps to think of both the processor and the compiler as your enemy.Greeting
A mutex is also a complete compiler barrier for any data that could be accessible from other threads, i.e. basically anything except local variables whose address you have never taken or at least never passed to the outside world.Hysterectomize
@R.. Well, technically speaking, there's nothing special about the mutex functions from the compiler's point of view. They're just ordinary function calls, which are good enough to prevent data stores being moved across them by the compiler. However, if the other function call is marked as __attribute__((const)), then you are explicitly giving the compiler permission to move it somewhere else, as long as the compiler can figure out that the result or parameters are not aliased.Greeting
Well formally there is something special about mutex functions. All that makes normal external functions compiler barriers is the inability of the compiler to do program-global optimizations powerful enough to move most stores across them. But with mutexes and other synchronization primitives, the specification actually forbids moving stores across them. This means that if compilers get sufficiently smart that they could move stores across other external functions, they'll have to have some way of special-casing synchronization primitives that forbid it.Hysterectomize
@R.. But a compiler is free to move x = y + z across a mutex, if x, y, and z are all unaliased locals. When you specify that a function is __attribute__((const)), you are telling the compiler that the function does not do any loads and stores to non-constant memory, so whether the compiler is allowed to move loads and stores across a mutex is irrelevant.Greeting
@R.. Can you come up with a citation for where the specification states that the compiler can't move stores across a mutex? I looked at IEEE 1003.1-2008 but I can't find it. (Not arbitrary stores obviously, but stores to unaliased locals...)Greeting
Of course it can move stores to unaliased locals. My point was that, even if it can determine via global code flow analysis that pthread_mutex_lock does not call any function that modifies some variable foo, it still has to be aware that locking a mutex allows for the possibility that foo is modified by other threads, and thus that stores to foo cannot be moved across lock boundaries.Hysterectomize
Here is some info on the implementation of memory/compiler barriers in the Linux Kernel: lxr.linux.no/#linux+v3.0/Documentation/memory-barriers.txt ... check out lines 1048-1274. Specifically on line 1061 it states, "All memory barriers except the data dependency barriers imply a compiler barrier. Data dependencies do not impose any additional compiler ordering." Also on line 1161 it states,"Some of the other functions in the linux kernel imply memory barriers, amongst which are locking and scheduling functions.". This is linux specific, so it could change on different platforms.Stanhope
@Jason: Those are implemented with asm __volatile__, and are GCC hacks, just like __attribute__((const)). They only work for inlined functions.Greeting
Pracically speaking, the pthreads spec is written with the assumption that the compiler has no knowledge of the internals of the pthreads functions, and must thus assume that any escaped variable may be modified. The compiler has no specific knowledge of pthreads, it's just another library. For some cases of where this breaks down, see e.g. hpl.hp.com/techreports/2004/HPL-2004-209.html , which was one of the papers that led to the inclusion of threads and a memory model in C++0x and C1X.Guyot

© 2022 - 2024 — McMap. All rights reserved.