Yes, you need restrict. Pointer-to-const doesn't mean that nothing can change the data, only that you can't change it through that pointer.
const
is mostly just a mechanism to ask the compiler to help you keep track of which stuff you want functions to be allowed to modify. const
is not a promise to the compiler that a function really won't modify data.
Unlike restrict
, using pointer-to-const
to mutable data is basically a promise to other humans, not to the compiler. Casting away const
all over the place won't lead to wrong behaviour from the optimizer (AFAIK), unless you try to modify something that the compiler put in read-only memory (see below about static const
variables). If the compiler can't see the definition of a function when optimizing, it has to assume that it casts away const
and modifies data through that pointer (i.e. that the function doesn't respect the const
ness of its pointer args).
The compiler does know that static const int foo = 15;
can't change, though, and will reliably inline the value even if you pass its address to unknown functions. (This is why static const int foo = 15;
is not slower than #define foo 15
for an optimizing compiler. Good compilers will optimize it like a constexpr
whenever possible.)
Remember that restrict
is a promise to the compiler that things you access through that pointer don't overlap with anything else. If that's not true, your function won't necessarily do what you expect. e.g. don't call foo_restrict(buf, buf, buf)
to operate in-place.
In my experience (with gcc and clang), restrict
is mainly useful on pointers that you store through. It doesn't hurt to put restrict
on your source pointers, too, but usually you get all the asm improvement possible from putting it on just the destination pointer(s), if all the stores your function does are through restrict
pointers.
If you have any function calls in your loop, restrict
on a source pointer does let clang (but not gcc) avoid a reload. See these test-cases on the Godbolt compiler explorer, specifically this one:
void value_only(int); // a function the compiler can't inline
int arg_pointer_valonly(const int *__restrict__ src)
{
// the compiler needs to load `*src` to pass it as a function arg
value_only(*src);
// and then needs it again here to calculate the return value
return 5 + *src; // clang: no reload because of __restrict__
}
gcc6.3 (targeting the x86-64 SysV ABI) decides to keep src
(the pointer) in a call-preserved register across the function call, and reload *src
after the call. Either gcc's algorithms didn't spot that optimization possibility, or decided it wasn't worth it, or the gcc devs on purpose didn't implement it because they think it's not safe. IDK which. But since clang does it, I'm guessing it's probably legal according to the C11 standard.
clang4.0 optimizes this to only load *src
once, and keep the value in a call-preserved register across the function call. Without restrict
, it doesn't do this, because the called function might (as a side-effect) modify *src
through another pointer.
The caller of this function might have passed the address of a global variable, for example. But any modification of *src
other than through the src
pointer would violate the promise that restrict
made to the compiler. Since we don't pass src
to valonly()
, the compiler can assume it doesn't modify the value.
The GNU dialect of C allows using __attribute__((pure))
or __attribute__((const))
to declare that a function has no side-effects, allowing this optimization without restrict
, but there's no portable equivalent in ISO C11 (AFAIK). Of course, allowing the function to inline (by putting it in a header file or using LTO) also allows this kind of optimization, and is much better for small functions especially if called inside loops.
Compilers are generally pretty aggressive about doing optimizations that the standard allows, even if they're surprising to some programmers and break some existing unsafe code which happened to work. (C is so portable that many things are undefined behaviour in the base standard; most nice implementations do define the behaviour of lots of things that the standard leaves as UB.) C is not a language where it's safe to throw code at the compiler until it does what you want, without checking that you're doing it the right way (without signed-integer overflows, etc.)
If you look at the x86-64 asm output for compiling your function (from the question), you can easily see the difference. I put it on the Godbolt compiler explorer.
In this case, putting restrict
on a
is sufficient to let clang hoist the load of a[0]
, but not gcc.
With float *restrict result
, both clang and gcc will hoist the load.
e.g.
# gcc6.3, for foo with no restrict, or with just const float *restrict a
.L5:
vmovss xmm0, DWORD PTR [rsi]
vmulss xmm0, xmm0, DWORD PTR [rdx+rax*4]
vmovss DWORD PTR [rdi+rax*4], xmm0
add rax, 1
cmp rcx, rax
jne .L5
vs.
# gcc 6.3 with float *__restrict__ result
# clang is similar with const float *__restrict__ a but not on result.
vmovss xmm1, DWORD PTR [rsi] # outside the loop
.L11:
vmulss xmm0, xmm1, DWORD PTR [rdx+rax*4]
vmovss DWORD PTR [rdi+rax*4], xmm0
add rax, 1
cmp rcx, rax
jne .L11
So in summary, put __restrict__
on all pointers that are guaranteed not to overlap with something else.
BTW, restrict
is only a keyword in C. Some C++ compilers support __restrict__
or __restrict
as an extension, so you should #ifdef
it away on unknown compilers.
Since
float *__restrict__ result
to let them both optimize well. clang also manages to hoist the load withconst float *__restrict a
, but gcc doesn't. – GuillemaC
andC++
butconst
in C andconst
in C++ are have very different meaning. – Caruncleconst
meansread only
, the hardware and other processes may can still change it (Example is the result register of a ADC, which would beconst
andvolatile
). In C++,const
means: It does never change, not be your process, not be other processes, not be hardware. – Caruncle