Long build times for Criterion cargo bench but not cargo build --release with large type

While working on a Brainfuck interpreter in Rust, I noticed that cargo bench takes an incredibly long time to build a Criterion bench when the Benchmark contains a large type.

As part of my Brainfuck implementation, I define a structure with a very large array to act as the interpreters memory. When I compile with cargo build --release, my crate builds in a handful of seconds, but when I run cargo bench the build takes upwards of 6 minutes! I eventually realized this has something to do with the very large array in my type definition ([u8; 30_000]). I figured I could make the type smaller via indirection, so I changed my type into a Box<[u8; 30_000]> which did reduce my build times significantly for the benchmark.

My question is: why does cargo bench take so much longer to build than cargo build --release without Box<>, shouldn't they both be using the same optimization level? Why doesn't cargo build --release take just as long without Box<>?

Is this potentially a bug with Criterion?

I confirmed via cargo bench --no-run --timings nearly all of the time goes the building the bench itself. I also made a minimal reproducible example:

cargo --version
> cargo 1.64.0 (387270bc7 2022-09-16)

cargo.toml

[package]
name = "large-type"
version = "0.1.0"
edition = "2021"

[dev-dependencies]
criterion = "0.4.0"

[[bench]]
name = "my_benches"
harness = false

lib.rs

pub struct MyType {
    data: [u8; 30_000],
}

impl MyType {
    pub fn new() -> Self {
        MyType { data: [0; 30_000] }
    }
}

benches/my_benches.rs

use criterion::{black_box, criterion_group, criterion_main, Criterion};
use large_type::MyType;

pub fn my_test(c: &mut Criterion) {
    c.bench_function("my test", |b| b.iter(|| black_box(MyType::new())));
}

criterion_group!(benches, my_test);
criterion_main!(benches);

Clean release build

$ cargo build --release
   Compiling large-type v0.1.0 (C:\Users\myuser\large-type)
    Finished release [optimized] target(s) in 0.22s

Clean bench build

$ cargo bench --no-run
...
...
Compiling large-type v0.1.0 (C:\Users\myuser\large-type)
    Finished bench [optimized] target(s) in **3m 46s**

About 15 seconds go to compiling Criterion dependencies and my library, the remainder goes to Executable benches\my_benches.rs.

Lastly, when I do change [u8; 30_000] to Box<[u8; 30_000]>, both cargo build --release and cargo bench --no-run complete in a very reasonable about of time. Any ideas?

edit: I guess boxing is an appropriate solution, but I want to avoid the performance hit if possible and understand why there would be a difference between the 2 builds.

cargo.toml

lib.rs

benches/my_benches.rs

Clean release build

Clean bench build

Recommended topics

Hot tags