Why does the Rust compiler not optimize code assuming that two mutable references cannot alias?

Asked 29/7, 2019 at 17:57 Answered 29/7, 2019 at 18:13

Solved rust reference compiler-optimization llvm-codegen

413

As far as I know, reference/pointer aliasing can hinder the compiler's ability to generate optimized code, since they must ensure the generated binary behaves correctly in the case where the two references/pointers indeed alias. For instance, in the following C code,

void adds(int *a, int *b) {
    *a += *b;
    *a += *b;
}

when compiled by clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) with the -O3 flag, it emits

0000000000000000 <adds>:
   0:    8b 07                    mov    (%rdi),%eax  # load a into EAX
   2:    03 06                    add    (%rsi),%eax  # load-and-add b
   4:    89 07                    mov    %eax,(%rdi)  # store into a
   6:    03 06                    add    (%rsi),%eax  # load-and-add b again
   8:    89 07                    mov    %eax,(%rdi)  # store into a again
   a:    c3                       retq

Here the code stores back to (%rdi) twice in case int *a and int *b alias.

When we explicitly tell the compiler that these two pointers cannot alias with the restrict keyword:

void adds(int *restrict a, int *restrict b) {
    *a += *b;
    *a += *b;
}

Then Clang will emit a more optimized version that effectively does *a += 2 * (*b), which is equivalent if (as promised by restrict) *b isn't modified by assigning to *a:

0000000000000000 <adds>:
   0:    8b 06                    mov    (%rsi),%eax   # load b once
   2:    01 c0                    add    %eax,%eax     # double it
   4:    01 07                    add    %eax,(%rdi)   # *a += 2 * (*b)
   6:    c3                       retq

Since Rust makes sure (except in unsafe code) that two mutable references cannot alias, I would think that the compiler should be able to emit the more optimized version of the code.

When I test with the code below and compile it with rustc 1.35.0 with -C opt-level=3 --emit obj,

#![crate_type = "staticlib"]
#[no_mangle]
fn adds(a: &mut i32, b: &mut i32) {
    *a += *b;
    *a += *b;
}

it generates:

0000000000000000 <adds>:
   0:    8b 07                    mov    (%rdi),%eax
   2:    03 06                    add    (%rsi),%eax
   4:    89 07                    mov    %eax,(%rdi)
   6:    03 06                    add    (%rsi),%eax
   8:    89 07                    mov    %eax,(%rdi)
   a:    c3                       retq

This does not take advantage of the guarantee that a and b cannot alias.

Is this because the current Rust compiler is still in development and has not yet incorporated alias analysis to do the optimization?

Is this because there is still a chance that a and b could alias, even in safe Rust?

Dichromic answered 29/7, 2019 at 17:57 Comment(4)

godbolt.org/z/aEDINX, strange – Cervelat 29/7, 2019 at 18:6

Side remark: "Since Rust makes sure (except in unsafe code) that two mutable references cannot alias" -- it is worth mentioning that even in unsafe code, aliasing mutable references are not allowed and result in undefined behavior. You can have aliasing raw pointers, but unsafe code does not actually allow you to ignore Rust standard rules. It's just a common misconception and thus worth pointing out. – Ipomoea 29/7, 2019 at 19:20

It took me a while to figure out what the example is getting at, because I'm not skilled at reading asm, so in case it helps anyone else: it boils down to whether the two += operations in the body of adds can be reinterpreted as *a = *a + *b + *b. If the pointers don't alias, they can, you can even see what amounts to b* + *b in the second asm listing: 2: 01 c0 add %eax,%eax. But if they do alias, they can't, because by the time you add *b for the second time, it will contain a different value than the first time around (the one you store on line 4: of the first asm listing). – Darsie 2/8, 2019 at 12:44

@dlukes: Yup. I commented the asm and added that *a += 2 * (*b) equivalence for future readers. – Ascot 19/3, 2022 at 18:7

484

Rust originally did enable LLVM's noalias attribute, but this caused miscompiled code. When all supported LLVM versions no longer miscompile the code, it will be re-enabled.

If you add -Zmutable-noalias=yes to the compiler options, you get the expected assembly:

adds:
        mov     eax, dword ptr [rsi]
        add     eax, eax
        add     dword ptr [rdi], eax
        ret

Simply put, Rust put the equivalent of C's restrict keyword everywhere, far more prevalent than any usual C program. This exercised corner cases of LLVM more than it was able to handle correctly. It turns out that C and C++ programmers simply don't use restrict as frequently as &mut is used in Rust.

This has happened multiple times.

Rust 1.0 through 1.7 — noalias enabled
Rust 1.8 through 1.27 — noalias disabled
Rust 1.28 through 1.29 — noalias enabled
Rust 1.30 through 1.54 — noalias disabled
Rust 1.54 through ??? — noalias conditionally enabled depending on the version of LLVM the compiler uses

Related Rust issues

Datolite answered 29/7, 2019 at 18:13 Comment(22)

This is not surprising. Its broadly-scoped claims of multi-language-friendliness notwithstanding, LLVM was specifically designed as a C++ backend and it has always had a strong tendency to choke on things that don't look enough like C++. – Hover 30/7, 2019 at 21:38

@MasonWheeler if you click through to some of the issues, you can find C code examples that use restrict and miscompile on both Clang and GCC. It’s not limited to languages that aren’t “C++ enough”, unless you count C++ itself in that group. – Datolite 30/7, 2019 at 21:44

@Shepmaster: Both clang and gcc seem to be designed around the language which is defined by the C Standard, rather than the language the C Standard was written to describe. Every standard since C89 has treated as a quality of implementation issue, outside its jurisdiction, the question of how implementations should behave if one part of the Standard or an implementation's documentation describe the behavior of some action but another part of the Standard specifies that an overlapping category of actions invoke UB. As a consequence... – Balcony 30/7, 2019 at 22:2

...there was no effort to exclude useful behaviors from the broader categories of UB, since good-quality compilers were expected to process them meaningfully anyway, and there was no point trying to write rules about what low-quality compilers could do. – Balcony 30/7, 2019 at 22:4

@MasonWheeler: I don't think LLVM was really designed around the rules of C or C++, but rather around the rules of LLVM. It makes assumptions that usually hold true for C or C++ code, but from what I can tell the design is predicated upon a static-data-dependencies model which can't handle tricky corner cases. That would be okay if it pessimistically assumed data dependencies that can't be disproven, but instead it treats as no-ops actions which would write storage with the same bit pattern as it held, and which have potential but not provable data dependencies on the read and the write. – Balcony 30/7, 2019 at 22:20

@Balcony I've read your comments a few times, but I admit I'm stumped — I have no idea what they have to do with this question or answer. Undefined behavior doesn't come into play here, this is "just" a case of multiple optimization passes interacting poorly with each other. – Datolite 31/7, 2019 at 13:41

@Shepmaster: From what I can tell, LLVM assumes that it is impossible for various combinations of references to be used in certain way that would be uncommon in C, but are nonetheless defined by the Standard. One could construct a language that would be useful for many purposes, in which such assumptions would hold in all defined cases, but C is not such a language. For example, if one had a language in which equality-comparing a pointer to one object with a pointer "just past" another would yield UB, then it would be reasonable for a compiler to assume that if two pointers compare equal... – Balcony 31/7, 2019 at 14:54

...and one is known to point just past some unknown object, no "standalone" object could be accessed by directly dereferencing the second pointer. While the C Standard does not specify any means by which one could create an object immediately preceding a stand-alone object, it explicitly recognizes that such objects could exist, and specifies the comparison of comparing a pointer "just past" such an object with a pointer "to" its successor. Such a comparison, however, will cause clang's optimizer to jump the rails. – Balcony 31/7, 2019 at 15:0

@Shepmaster: I think the simplest way to describe the fundamental problem with LLVM is that it is focused on recognizing equivalent references that can be merged to convert a program to single-static-assignment form, but can't accommodate the possibility of references that can be proven to identify the same storage, but are derived and used at different times and are not equivalent. – Balcony 31/7, 2019 at 15:46

@Balcony : That sounds very interesting. Have any good links for me to read up on? Because I don't understand. Let's say you have an array of objects. A pointer "just past" the first element of the array, would point on the second object in the array, and surely an object can be accessed through that pointer? Is that what you mean by "standalone"? Objects not in arrays? Couldn't a custom allocator arrange so that objects are "back to back" even if not in arrays? – Machinist 5/8, 2019 at 11:49

@avl_sweden: By standalone, I mean named objects that are not parts of structures or unions, e.g. static int a[1],b[1],c[1];. An implementation could place such objects in arbitrary order, consecutively or not, at its leisure. If an implementation ensured that objects whose address was taken were never placed consecutively, it could assume that a pointer just past one object could never identify another standalone object, but neither clang nor gcc does anything to prevent addressable objects being placed consecutively. – Balcony 5/8, 2019 at 15:15

@avl_sweden: As for links, I don't have any to offer, but I've done a fair amount of experimentation with the compilers at godbolt.org, and found that if code compares the address of a pointer with an address just past one array object, the compiler will reorder accesses made with that pointer across addresses made with another array object without regard for whether the arrays might be stored consecutively, even if the arrays are declared in the same translation unit and are, in fact, stored consecutively. – Balcony 5/8, 2019 at 15:18

@supercat: So LLVM makes incorrect assumptions in these cases? And this causes incorrect behavior? Is this only when using 'restrict' keyword? Do you perhaps have an example? – Machinist 7/8, 2019 at 12:48

@Machinist to reiterate, it's just a bug. The loop unrolling optimization step did (does?) not fully take noalias pointers into account when executing. It created new pointers based on input pointers, improperly copying the noalias attribute even though the new pointers did alias. – Datolite 7/8, 2019 at 12:53

@Shepmaster: Many of the problematic behaviors I've seen didn't occur by happenstance; they are a result of the optimizer making assumptions, without any particular evidence to justify them, that will usually be correct, but are not sound in the general case. A common pattern is that if a compiler can show that ref1 and ref2 have the same address, and it would be entitled to assume accesses via ref1 won't interact with ref3, it is prone to ignore the possibility that accesses via ref2 might interact with ref3. – Balcony 7/8, 2019 at 15:23

@supercat: Ah, now I get it, I think. Pointer value isn't the full story, since time of access may also affect whether or not aliasing occurs. Thanks for clarifying! – Machinist 8/8, 2019 at 14:47

@avl_sweden: Not just the timing of access, but also the timing of when pointers are used to derive other pointers. While the authors of the Standard didn't explicitly specify transitive sequencing relationships between the derivation of a pointer to part of an object and any use of the pointer, or between such derivation and any preceding conflicting actions on the parent object, I doubt they imagined anyone would fail to regard such sequencing relationships as obvious. – Balcony 8/8, 2019 at 15:20

@avl_sweden: There are some cases, though, that don't involve the timing of pointer derivation. Given int a[1],b[1],c[1],*p; at file scope, the Standard explicitly recognizes the possibility that p==b+1 could be true at the same time as either p==a or p==c, but neither clang nor gcc does so. Given int temp = a[0]+c[0]; if (p==b+1) *p+=1; return temp+a[0]+c[0]; both clang and gcc are prone to generate code that simply returns temp+temp without regard for whether they placed a[] or c[] immediately after b[]. – Balcony 8/8, 2019 at 15:59

@avl_sweden: That's not a case of a compiler failing to notice something. That's a case of an implementation making a deliberate inference that would usually happen to be correct (and which it could cause to be correct, if placed something other than a or c in the space after b) but is unsound in the general case. – Balcony 8/8, 2019 at 16:7

What does "conditionally enabled" refer to? According to godbolt, Rust 1.54 appears to generate the optimized assembly. – Villasenor 2/9, 2021 at 13:29

@Villasenor updated. rustc can be compiled to use a number of versions of LLVM. The version of rustc distributed by Rust (e.g. via rustup) uses a version of LLVM that has the bug fixed. Older versions of LLVM (such as used by Linux distros that compile rustc themselves) do not have that fix. – Datolite 3/9, 2021 at 14:27

@user4815162342: Specifically, it looks like the most recent fix (committed to master on March 22, 2021) enables it for LLVM >= 12. So if you built with LLVM 11 or older, it wouldn't enable mutable noalias. – Sterol 11/1 at 1:54

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Related Rust issues

Recommended topics

Hot tags