Realistic usage of the C99 'restrict' keyword?

Asked 14/4, 2009 at 0:9 Answered 30/6, 2022 at 19:29

239

I was browsing through some documentation and questions/answers and saw it mentioned. I read a brief description, stating that it would be basically a promise from the programmer that the pointer won't be used to point somewhere else.

Can anyone offer some realistic cases where its worth actually using this?

Sheathbill answered 14/4, 2009 at 0:9 Comment(5)

memcpy vs memmove is one canonical example. – Bungle 19/4, 2016 at 18:22

@AlexandreC.: I don't think it's a particularly applicable one, since lack of a "restrict" qualifier does not imply that the program logic will work with overloading source and destination, nor would presence of such a qualifier prevent a called method from determining whether source and destination overlap and, if so, replacing dest with src+(dest-src) which, since it is derived from src, would be allowed to alias it. – Forbiddance 19/4, 2016 at 22:25

@supercat: That's why I put it as a comment. However, 1) restrict-qualifying arguments to memcpy enables in principle a naïve implementation to be optimized aggressively, and 2) merely calling memcpy enables the compiler to assume that the arguments given to it do not alias, which could allow some optimization around the memcpy call. – Bungle 20/4, 2016 at 6:50

@AlexandreC.: It would be very hard for a compiler on most platforms to optimize a naive memcpy--even with "restrict"--to be anywhere near as efficient as version tailored to the target. Call-side optimizations would not require the "restrict" keyword, and in some cases efforts to facilitate those may be counter-productive. For example, many implementations of memcpy could, at zero extra cost, regard memcpy(anything, anything, 0); as a no-op, and ensure that if p is a pointer to at least n writable bytes, memcpy(p,p,n); will have no adverse side-effects. Such cases may arise... – Forbiddance 20/4, 2016 at 15:28

...naturally in certain kinds of application code (e.g. a sort routine swapping an item with itself), and in implementations where they have no adverse side effects, letting those cases be handled by the general-case code may be more efficient than having to add special-case tests. Unfortunately, some compiler writers seem to think it's better to require that programmers add code the compiler may not be able to optimize out, so as to facilitate "optimization opportunities" which compilers would very seldom exploit anyway. – Forbiddance 20/4, 2016 at 15:33

234

restrict says that the pointer is the only thing that accesses the underlying object. It eliminates the potential for pointer aliasing, enabling better optimization by the compiler.

For instance, suppose I have a machine with specialized instructions that can multiply vectors of numbers in memory, and I have the following code:

void MultiplyArrays(int* dest, int* src1, int* src2, int n)
{
    for(int i = 0; i < n; i++)
    {
        dest[i] = src1[i]*src2[i];
    }
}

The compiler needs to properly handle if dest, src1, and src2 overlap, meaning it must do one multiplication at a time, from start to the end. By having restrict, the compiler is free to optimize this code by using the vector instructions.

Wikipedia has an entry on restrict, with another example, here.

Waldos answered 14/4, 2009 at 0:16 Comment(6)

@Waldos - If I am not mistaking, then the problem would be only when dest is overlapping either of the source vectors. Why would there be a problem if src1 and src2 overlap? – Iroquois 25/3, 2015 at 17:53

restrict normally has an effect only when pointing to an object which is modified, in which case it asserts no hidden side effects need be taken into account. Most compilers use it to facilitate vectorization. Msvc uses run time check for data overlap for that purpose. – Twosided 21/5, 2016 at 18:24

Adding the register keyword to the for loop variable also makes it faster in addition to adding restrict. – Iguanodon 31/5, 2017 at 2:12

Actually, the register keyword is only advisory. And in compilers since about the year 2000, the i (and the n for the comparison) in the example will be optimized into a register whether or not you use the register keyword. – Toowoomba 7/3, 2018 at 19:47

For anyone else like me who is wondering what in the world "pointer aliasing" means, Cornell University writes about it here: cvw.cac.cornell.edu/vector/coding_aliasing. – Bigwig 24/6, 2021 at 20:10

@MarkFischler: At least when targeting ARM targets, GCC still makes use of the register keyword when using -O0. In some cases that keyword will allow the compiler to generate code at -O0 which would be as efficient (sometimes more efficient!) as at higher optimization level if things like stack frame setup/teardown were omitted on leaf functions that don't use the stack. – Forbiddance 8/9, 2021 at 16:33

224

The Wikipedia example is very illuminating.

It clearly shows how it allows to save one assembly instruction.

Without restrict:

void f(int *a, int *b, int *x) {
  *a += *x;
  *b += *x;
}

Pseudo assembly:

load R1 ← *x    ; Load the value of x pointer
load R2 ← *a    ; Load the value of a pointer
add R2 += R1    ; Perform Addition
set R2 → *a     ; Update the value of a pointer
; Similarly for b, note that x is loaded twice,
; because x may point to a (a aliased by x) thus 
; the value of x will change when the value of a
; changes.
load R1 ← *x
load R2 ← *b
add R2 += R1
set R2 → *b

With restrict:

void fr(int *restrict a, int *restrict b, int *restrict x);

Pseudo assembly:

load R1 ← *x
load R2 ← *a
add R2 += R1
set R2 → *a
; Note that x is not reloaded,
; because the compiler knows it is unchanged
; "load R1 ← *x" is no longer needed.
load R2 ← *b
add R2 += R1
set R2 → *b

Does GCC really do it?

GCC 4.8 Linux x86-64:

gcc -g -std=c99 -O0 -c main.c
objdump -S main.o

With -O0, they are the same.

With -O3:

void f(int *a, int *b, int *x) {
    *a += *x;
   0:   8b 02                   mov    (%rdx),%eax
   2:   01 07                   add    %eax,(%rdi)
    *b += *x;
   4:   8b 02                   mov    (%rdx),%eax
   6:   01 06                   add    %eax,(%rsi)  

void fr(int *restrict a, int *restrict b, int *restrict x) {
    *a += *x;
  10:   8b 02                   mov    (%rdx),%eax
  12:   01 07                   add    %eax,(%rdi)
    *b += *x;
  14:   01 06                   add    %eax,(%rsi)

For the uninitiated, the calling convention is:

rdi = first parameter
rsi = second parameter
rdx = third parameter

GCC output was even clearer than the wiki article: 4 instructions vs 3 instructions.

Arrays

So far we have single instruction savings, but if pointer represent arrays to be looped over, a common use case, then a bunch of instructions could be saved, as mentioned by supercat.

Consider for example:

void f(char *restrict p1, char *restrict p2) {
    for (int i = 0; i < 50; i++) {
        p1[i] = 4;
        p2[i] = 9;
    }
}

Because of restrict, a smart compiler (or human), could optimize that to:

memset(p1, 4, 50);
memset(p2, 9, 50);

which is potentially much more efficient as it may be assembly optimized on a decent libc implementation (like glibc): Is it better to use std::memcpy() or std::copy() in terms to performance?

Does GCC really do it?

GCC 5.2.1.Linux x86-64 Ubuntu 15.10:

gcc -g -std=c99 -O0 -c main.c
objdump -dr main.o

With -O0, both are the same.

With -O3:

with restrict:

3f0:   48 85 d2                test   %rdx,%rdx
3f3:   74 33                   je     428 <fr+0x38>
3f5:   55                      push   %rbp
3f6:   53                      push   %rbx
3f7:   48 89 f5                mov    %rsi,%rbp
3fa:   be 04 00 00 00          mov    $0x4,%esi
3ff:   48 89 d3                mov    %rdx,%rbx
402:   48 83 ec 08             sub    $0x8,%rsp
406:   e8 00 00 00 00          callq  40b <fr+0x1b>
                        407: R_X86_64_PC32      memset-0x4
40b:   48 83 c4 08             add    $0x8,%rsp
40f:   48 89 da                mov    %rbx,%rdx
412:   48 89 ef                mov    %rbp,%rdi
415:   5b                      pop    %rbx
416:   5d                      pop    %rbp
417:   be 09 00 00 00          mov    $0x9,%esi
41c:   e9 00 00 00 00          jmpq   421 <fr+0x31>
                        41d: R_X86_64_PC32      memset-0x4
421:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)
428:   f3 c3                   repz retq

Two memset calls as expected.

without restrict: no stdlib calls, just a 16 iteration wide loop unrolling which I do not intend to reproduce here :-)

I haven't had the patience to benchmark them, but I believe that the restrict version will be faster.

C99

Let's look at the standard for completeness sake.

restrict says that two pointers cannot point to overlapping memory regions. The most common usage is for function arguments.

This restricts how the function can be called, but allows for more compile-time optimizations.

If the caller does not follow the restrict contract, undefined behavior.

The C99 N1256 draft 6.7.3/7 "Type qualifiers" says:

The intended use of the restrict qualifier (like the register storage class) is to promote optimization, and deleting all instances of the qualifier from all preprocessing translation units composing a conforming program does not change its meaning (i.e., observable behavior).

and 6.7.3.1 "Formal definition of restrict" gives the gory details.

Strict aliasing rule

The restrict keyword only affects pointers of compatible types (e.g. two int*) because the strict aliasing rules says that aliasing incompatible types is undefined behavior by default, and so compilers can assume it does not happen and optimize away.

See: What is the strict aliasing rule?

See also

C++14 does not yet have an analogue for restrict, but GCC has __restrict__ as an extension: What does the restrict keyword mean in C++?
Many questions that ask: according to the gory details, does this code UB or not?
A "when to use" question: When to use restrict and when not to
The related GCC __attribute__((malloc)), which says that the return value of a function is not aliased to anything: GCC: __attribute__((malloc))

Mcalpine answered 14/6, 2015 at 8:24 Comment(8)

Optimization based on type based aliasing is important to avoid concerns about pointers being modified as a side effect of storing into other data types. Reluctance to standardize restrict beyond c99 is extreme so it seems the widely accepted __restrict will never become a standard. – Twosided 21/5, 2016 at 18:34

@Twosided "Reluctance to standardize restrict beyond c99 is extreme so it seems the widely accepted __restrict will never become a standard": do you mean in C++? What is __restrict? – Mcalpine 21/5, 2016 at 19:59

@tim18: The "restrict" keyword can enable many optimizations that even aggressive type-based optimization cannot. Further, the existence of "restrict" in the language--unlike aggressive type-based aliasing--never makes it impossible to accomplish tasks as efficiently as would be possible in their absence (since code which would be broken by "restrict" can simply not use it, while code which is broken by aggressive TBAA must be often be rewritten in less-efficient fashion). – Forbiddance 23/5, 2016 at 22:28

restrict is an alternate spelling of restrict which is supported by several C++ compilers, including g++, MSVC++, Intel C++. Each of those compilers has a slightly different flavor, but only Intel C++ has an option to support restrict without the underscores in C++. All support __restrict in C89 as well as C99... Some support __restrict as an alternate spelling. – Twosided 25/5, 2016 at 10:45

@tim18: Surround things that contain double-underlines in backticks, as in __restrict. Otherwise the double-underlines may be misinterpreted as an indication that you're shouting. – Forbiddance 6/6, 2016 at 17:9

More important than not shouting is that the underscores have meaning directly relevant to the point you are trying to make. – Cherlycherlyn 6/8, 2018 at 4:11

GCC 7 and Clang 9 - at any optimization (-O1, -Os) __restrict__ as no more meaning, as the compiler will optimize the same. Reminds me of the register keyword. Still upvote for the epic answer. Nice job! – Horus 26/8, 2020 at 8:20

@Horus hmmm, I'm surprised that restrict has no meaning, since the compiler simply cannot know if the memories will be aliased or not, since restrict actually changes the function interface and forbids aliased callers. – Mcalpine 26/8, 2020 at 8:40

The following C99 code returns either 0 or 1, depending of restrict qualifier :

__attribute__((noinline))
int process(const int * restrict const a, int * const b) {
    *b /= (*a + 1) ;
    return *a + *b ;
}

int main(void) {
    int data[2] = {1, 2};
    return process(&data[0], &data[0]);
}

You can build realistic examples using the snippet, especially when *a is a loop condition.

Compile with gcc -std=c99 -Wall -pedantic -O3 main.c.

Instal answered 30/6, 2022 at 19:29 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags