While working on a Brainfuck interpreter in Rust, I noticed that cargo bench
takes an incredibly long time to build a Criterion bench when the Benchmark contains a large type.
As part of my Brainfuck implementation, I define a structure with a very large array to act as the interpreters memory. When I compile with cargo build --release
, my crate builds in a handful of seconds, but when I run cargo bench
the build takes upwards of 6 minutes! I eventually realized this has something to do with the very large array in my type definition ([u8; 30_000]
). I figured I could make the type smaller via indirection, so I changed my type into a Box<[u8; 30_000]>
which did reduce my build times significantly for the benchmark.
My question is: why does cargo bench
take so much longer to build than cargo build --release
without Box<>
, shouldn't they both be using the same optimization level? Why doesn't cargo build --release
take just as long without Box<>
?
Is this potentially a bug with Criterion?
I confirmed via cargo bench --no-run --timings
nearly all of the time goes the building the bench itself. I also made a minimal reproducible example:
cargo --version
> cargo 1.64.0 (387270bc7 2022-09-16)
cargo.toml
[package]
name = "large-type"
version = "0.1.0"
edition = "2021"
[dev-dependencies]
criterion = "0.4.0"
[[bench]]
name = "my_benches"
harness = false
lib.rs
pub struct MyType {
data: [u8; 30_000],
}
impl MyType {
pub fn new() -> Self {
MyType { data: [0; 30_000] }
}
}
benches/my_benches.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use large_type::MyType;
pub fn my_test(c: &mut Criterion) {
c.bench_function("my test", |b| b.iter(|| black_box(MyType::new())));
}
criterion_group!(benches, my_test);
criterion_main!(benches);
Clean release build
$ cargo build --release
Compiling large-type v0.1.0 (C:\Users\myuser\large-type)
Finished release [optimized] target(s) in 0.22s
Clean bench build
$ cargo bench --no-run
...
...
Compiling large-type v0.1.0 (C:\Users\myuser\large-type)
Finished bench [optimized] target(s) in **3m 46s**
About 15 seconds go to compiling Criterion dependencies and my library, the remainder goes to Executable benches\my_benches.rs
.
Lastly, when I do change [u8; 30_000]
to Box<[u8; 30_000]>
, both cargo build --release
and cargo bench --no-run
complete in a very reasonable about of time. Any ideas?
edit: I guess boxing is an appropriate solution, but I want to avoid the performance hit if possible and understand why there would be a difference between the 2 builds.
c.bench_function("my test", |b| b.iter(|| black_box(Box::new(MyType::new()))));
. Does that improve compile time? – Gustation