How to compile Rust to LLVM bitcode including dependencies?
Asked Answered
R

2

5

I'm working on verifying some Rust code using SAW. SAW requires that you compile to LLVM bitcode, which you can then import and verify. I know you can generate bitcode using the --emit=llvm-bc flag to rustc, and this works great for projects without dependencies.

The issue comes when trying to compile a project which makes use of external crates. Here's an example Cargo.toml file:

[package]
name = "foobar"
version = "0.1.0"
edition = "2018"

[dependencies]
pythagoras = "0.1.1"

And here's a basic src/lib.rs we might want to compile & verify:

pub use pythagoras;

#[no_mangle]
pub extern "C" fn calc_hypot(a: u32, b: u32) -> f64 {
    pythagoras::theorem(a, b)
}

We can compile this to bitcode like this: RUSTFLAGS="--emit=llvm-bc" cargo build --release. The issue is that the bitcode for the current module and its dependencies are generated separately (in target/release/deps/foobar-something.bc and target/release/deps/pythagoras-somethingelse.bc). They're only combined when the actual compiled library is generated.

Is there any way to generate a single bitcode file containing both the current module & all its dependencies, so this file can be imported, and won't refer to any external names? I realise this is a pretty niche case, so hacky solutions (e.g: Compiling to a C static lib, then converting that back to LLVM bitcode somehow) are also completely reasonable.

Riband answered 3/9, 2021 at 8:48 Comment(2)
Try turning on lto- that way it rust will have to keep the code in llvm bit code until after linking.Ragland
Also, llvm does have a linker for llvm ir and bitcode that you could use as well.Ragland
S
6

Expanding on Aiden4s comment:

  • Delete the current target directory to prevent any old artifacts from being used: rm -r target/
  • Compile it with RUSTFLAGS="--emit=llvm-bc" cargo build --release
  • Link the bitcode files together with llvm-link target/release/deps/*.bc > withdeps.bc

That will get you almost all dependencies. It turns out all Rust programs have an implict dependency on either core or std though (although you can avoid this with the unstable #![no_core], but good luck actually getting anything to compile that way), so you probably want to get the bitcode for that too.

The easiest way to do that is to compile the standard library from source to bitcode. cargo has experimental support for building the standard libraries from source, so just append -Z build-std --target x86_64-unknown-linux-gnu (and update the target if needed) to your cargo build command. When using --target, which is required by -Z build-std, the build files are put in a target-specific directory, target/x86_64-unknown-linux-gnu/release/deps/ in this case. The targetless directory contains build-dependencies for the standard libraries: we don't want that!

We don't want to link all of the standard libraries. We really only need std and its dependencies: proc_macro isn't needed here since we are compiling to a binary, not a proc-macro. We also need to link with either proc_abort or panic_unwind, matching it up with the unwind codegen setting we chose. The default is unwinding, so let's delete the other one, proc_abort. Let's send those libraries to the chopping block: rm target/x86_64-unknown-linux-gnu/release/deps/{panic_abort,proc_macro}-*.bc.

Let's try linking for real this time:

rm -r target/
RUSTFLAGS="--emit=llvm-bc" cargo build --release -Z build-std --target x86_64-unknown-linux-gnu
rm target/x86_64-unknown-linux-gnu/release/deps/{panic_abort,proc_macro}-*.bc
llvm-link target/x86_64-unknown-linux-gnu/release/deps/*.bc > withalldeps.bc

Yay, it worked! Well, except for the calls to undefined functions in there that still managed to slip through. __rust_alloc, __rust_dealloc, __rust_realloc, and __rust_alloc_zeroed are magic functions that are defined if you use Rust's LLVM fork. The standard library also depends on libpthread and dlsym which are language-asnostic libraries/functions that are usually implemented in C. You can use clang and a libc implementation that supports being compiled with Clang (GNU libc doesn't, I think musl might work here?) to get that if needed. Also if you are compiling to an executable it has trouble finding main from _start.

Stantonstanway answered 3/9, 2021 at 17:35 Comment(2)
it isn't clear to me how to get __rust_alloc, __rust_dealloc, __rust_realloc, and __rust_alloc_zeroed to resolve. Any chance you can add some more details around that point?Vestryman
if you're trying to link an executable from the bitcode: clang -L/path/to/rust/lib -lstd-6bfb0c73b036f3e5 -lpthread -ldl -o hello hello.bcEffeminate
S
0

Expanding on loops's reply.

I learned how to resolve __rust_alloc, __rust_dealloc, __rust_realloc, and __rust_alloc_zeroed from this link.

I tested it and it worked.

Here is the command that I used to compile the rust program

RUSTFLAGS="-C save-temps -Zlocation-detail=none -Zfmt-debug=none --emit=llvm-bc" \
cargo +nightly build -Z build-std=std,panic_abort \
-Z build-std-features="optimize_for_size" \
--target x86_64-unknown-linux-gnu 

Then, in your target/x86_64-unknown-linux-gnu/debug/deps, you can find files that are compiled into bitcode. In my case, the important file is called function-*.rcgu.bc. If you run llvm-nm function-*.rcgu.bc, you will find symbols like this:

                 U __rdl_alloc
                 U __rdl_alloc_zeroed
                 U __rdl_dealloc
                 U __rdl_realloc
                 U __rg_oom
---------------- T __rust_alloc
---------------- T __rust_alloc_error_handler
---------------- D __rust_alloc_error_handler_should_panic
---------------- T __rust_alloc_zeroed
---------------- T __rust_dealloc
---------------- D __rust_no_alloc_shim_is_unstable
---------------- T __rust_realloc

It means this file contains the things we need.

So when you run llvm-link *.bc -o withdeps.bc, you should also include this function-*.rcgu.bc

Saltandpepper answered 26/9 at 21:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.