Why can't the C compiler optimize changing the value of a const pointer assuming that two pointers to the same variable would be illegal/UB?

Asked 1/2, 2021 at 13:32 Answered 28/2, 2021 at 17:24

Solved c rust undefined-behavior strict-aliasing

Recently I stumbled over a comparison between Rust and C and they use the following code:

bool f(int* a, const int* b) {
  *a = 2;
  int ret = *b;
  *a = 3;
  return ret != 0;
}

In Rust (same code, but with Rust syntax), it produces the following Assembler Code:

    cmp      dword ptr [rsi], 0 
    mov      dword ptr [rdi], 3 
    setne al                    
    ret

While with gcc it produces the following:

   mov      DWORD PTR [rdi], 2   
   mov      eax, DWORD PTR [rsi]
   mov      DWORD PTR [rdi], 3        
   test     eax, eax                  
   setne al                           
   ret

The text claims that the C function can't optimize the first line away, because a and b could point to the same number. In Rust this is not allowed so the compiler can optimize it away.

Now to my question:

The function takes a const int* which is a pointer to a const int. I read this question and it states that modifying a const int with a pointer should result in a compiler warning and in the worst cast in UB.

Could this function result in a UB if I call it with two pointers to the same integer?

Why can't the C compiler optimize the first line away, under the assumption, that two pointers to the same variable would be illegal/UB?

Link to godbolt

Planetarium answered 1/2, 2021 at 13:32 Comment(2)

consider int foo = 0; f(&foo, &foo);. This is perfectly legal C and it works as expected with your function returning 1. – Byway 1/2, 2021 at 13:38

Essentially, the C compiler of the function does not know that the pointer address that might be passed to it will be until runtime, so you need to use restrict to tell it that updating a doesn't update the value b is pointing to, which needs to be included in the comparison to 0, hence the store to a that happens before the comparison needs to go ahead, whereas in rust the default assumption is restrict – Ametropia 28/2, 2021 at 17:21

The function int f(int *a, const int *b); promises to not change the contents of b through that pointer... It makes no promises regarding access to variables through the a pointer.

If a and b point to the same object, changing it through a is legal (provided the underlying object is modifiable, of course).

Example:

int val = 0;
f(&val, &val);

Byway answered 1/2, 2021 at 13:47 Comment(2)

Notably, changing *b through b (after casting away constness) would also be legal C. const is essentially a lint for the programmer, not a hint for the compiler -- C is not allowed to optimize calls to f by assuming that it does not change *b. restrict-qualified pointers are another matter (see Morten Jensen's answer, and my comment there). – Bedizen 2/2, 2021 at 21:7

@trentcl: Not sure if you meant “hint” or actually “lint,” but it works either way. – Malisamalison 3/2, 2021 at 0:26

Why can't the C Compiler optimize the first line away, under the assumption, that two pointers to the same variable would be illegal/UB?

Because you haven't instructed the C compiler to do so -- that it is allowed to make that assumption.

C has a type qualifier for exactly this called restrict which roughly means: this pointer does not overlap with other pointers (not exactly, but play along).

The assembly output for

bool f(int* restrict a, const int* b) {
  *a = 2;
  int ret = *b;
  *a = 3;
  return ret != 0;
}

        mov     eax, DWORD PTR [rsi]
        mov     DWORD PTR [rdi], 3
        test    eax, eax
        setne   al
        ret

... which removes/optimizes-away the assignment *a = 2

From https://en.wikipedia.org/wiki/Restrict

In the C programming language, restrict is a keyword that can be used in pointer declarations. By adding this type qualifier, a programmer hints to the compiler that for the lifetime of the pointer, only the pointer itself or a value directly derived from it (such as pointer + 1) will be used to access the object to which it points.

Pam answered 1/2, 2021 at 13:39 Comment(2)

Strictly speaking, all Rust references are restrict, even the shared ones -- so the equivalent of Aiden4's Rust code would be bool f(int * restrict a, const int* restrict b). Wikipedia omits to mention two facts about restrict that make this work: first, that the single pointer access rule only applies if the object behind the pointer is modified (so b can be both restrict and aliased, because *b is not modified); and second, that a const-qualified restrict pointer must not point to something that is modified (unlike normal const pointers). (C99 6.7.3.1p4) – Bedizen 2/2, 2021 at 21:2

@trentcl: A const restrict pointer may point to objects that are modified, if it is not used to access any of the modified portions of the object. For example, given void test(int * restrict p, int const * restrict q), code could write p[0] and read both p[1] and q[1], provided it never reads q[0], nor (for clang or gcc) tests any restrict-qualified pointer for equality with anything not derived from it. – Biomedicine 2/2, 2021 at 22:45

The function int f(int *a, const int *b); promises to not change the contents of b through that pointer... It makes no promises regarding access to variables through the a pointer.

If a and b point to the same object, changing it through a is legal (provided the underlying object is modifiable, of course).

Example:

int val = 0;
f(&val, &val);

Byway answered 1/2, 2021 at 13:47 Comment(2)

@trentcl: Not sure if you meant “hint” or actually “lint,” but it works either way. – Malisamalison 3/2, 2021 at 0:26

While the other answers mention the C side, it is still worth taking a look at the Rust side. With Rust the code you have is probably this:

fn f(a:&mut i32, b:&i32)->bool{
    *a = 2;
    let ret = *b;
    *a = 3;
    return ret != 0;
}

The function takes in two references, one mutable, one not. References are pointers that are guaranteed to be valid for reads, and mutable references are also guaranteed to be unique, so it gets optimized to

        cmp     dword ptr [rsi], 0
        mov     dword ptr [rdi], 3
        setne   al
        ret

However, Rust also has raw pointers that are equivalent to C's pointers and make no such guarantees. The following function, which takes in raw pointers:

unsafe fn g(a:*mut i32, b:*const i32)->bool{
    *a = 2;
    let ret = *b;
    *a = 3;
    return ret != 0;
}

misses out on the optimization and compiles to this:

        mov     dword ptr [rdi], 2
        cmp     dword ptr [rsi], 0
        mov     dword ptr [rdi], 3
        setne   al
        ret

Godbolt Link

Goethe answered 2/2, 2021 at 17:8 Comment(10)

So they're comparing apples to oranges. Nice one! +1 :D – Debag 3/2, 2021 at 11:58

@PaulEvans to be fair, as a C fan, it's a shame C does not have references. Switching to C++ and using references has sped parts of my code up non trivial amounts while keeping it easy to read and reason about. – Impi 3/2, 2021 at 16:11

@Impi I'm mainly a C++ programmer so obviously use references all the time. But there just syntactic sugar for pointers so in themselves offer no sped advantages. What's really great are smart pointers! Not for sped but correctness. :D – Debag 3/2, 2021 at 17:0

@PaulEvans Not a language lawyer, so I may in fact be incorrect, but I don't think references are syntactic sugar for pointers. – Gemini 3/2, 2021 at 20:11

@FrankoLeonTokalić Pretty much, yeah. You can't reseat a reference (as you can with a non-const pointer) and it must refer to an object (as opposed to a pointer set to nullptr) but under the hood it's just a pointer that you don't dereference to access what's pointed to. – Debag 3/2, 2021 at 20:33

@PaulEvans I don't think saying that a reference is just a pointer under the hood is correct, though. – Gemini 3/2, 2021 at 21:47

@FrankoLeonTokalić Take any function with a reference parameter of type T& and copy that function and change that parameter to a T*const and all usages in that function of that parameter to a dereference. Go to godbolt and look at the assembly that's generated and you'll see they're both exactly the same. – Debag 3/2, 2021 at 22:10

@PaulEvans Output of a particular compiler with particular configuration in a particular situation does not mean that references are pointers under the hood, though. In some cases, they are. My point is that it might be harmful as a general assumption. – Gemini 3/2, 2021 at 22:45

Let us continue this discussion in chat. – Debag 3/2, 2021 at 23:2

@FrankoLeonTokalić see https://mcmap.net/q/36099/-what-are-the-differences-between-a-pointer-variable-and-a-reference-variable – Ametropia 28/2, 2021 at 15:57

The function takes a const int* which is a pointer to a const int.

No, const int* is not a pointer to a const int. Anyone who says that is deluded.

int* is a pointer to an int that definitely isn't const.
const int* is a pointer to an int of unknown constness.
There is no way to express the notion of a pointer to an int that definitely is const.

If C was a better designed language, then const int * would be a pointer to a const int, mutable int * (borrowing a keyword from C++) would be a pointer to a non-const int, and int * would be a pointer to an int of unknown constness. Dropping the qualifiers (i.e., forgetting something about the pointed-to type) would be safe – the opposite of real C in which adding the const qualifier is safe. I haven't used Rust, but it appears from examples in another answer that it uses a syntax like that.

Bjarne Stroustrup, who introduced const, originally named it readonly, which is much closer to its actual meaning. int readonly* would have made it clearer that it's the pointer that's read-only, not the pointed-to object. The renaming to const has confused generations of programmers.

When I have the choice, I always write foo const*, not const foo*, as the next best thing to readonly*.

Middleoftheroad answered 8/2, 2021 at 17:30 Comment(1)

Neither const int * nor int * say anything about the constness of the underlying object. C does not keep track of such a thing. It is perfectly legal to have an int * that points to an int declared const. – Bedizen 1/3, 2021 at 12:5

It should be noted that this question is talking about optimisation on -Ofast and how it's even the case there.

Essentially, the C compiler of the function does not know the full discrete set of addresses that might be passed to it as that isn't known until link time / runtime as the function can be called from multiple translation units, and therefore it makes considerations that handle any legal address that a and b might point to, and of course that includes the case where they overlap.

Therefore, you need to use restrict to tell it that updating a (which the function allows because it is not a pointer-to-const, but even then the function could cast off const) doesn't update the value b is pointing to, which needs to be included in the comparison to 0, hence the store to a that happens before the comparison needs to go ahead, whereas on rust the default assumption is restrict. The compiler of the function does however know that *a is the same as *(a+1-1) and therefore will not produce 2 separate stores, but it does not know whether a or b overlap.

Ametropia answered 28/2, 2021 at 17:24 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags