I tried to duplicate the example in this famous question. My code looks like this:
#![feature(test)]
extern crate rand;
extern crate test;
use test::Bencher;
use rand::{thread_rng, Rng};
type ItemType = u8;
type SumType = u64;
const TEST_SIZE: usize = 32_768;
#[bench]
fn bench_train(b: &mut Bencher) {
let numbers = get_random_vec();
b.iter(|| calc_sum(&numbers));
}
#[bench]
fn bench_train_sort(b: &mut Bencher) {
let mut numbers = get_random_vec();
numbers.sort(); // <-- the magic difference
b.iter(|| calc_sum(&numbers));
}
fn get_random_vec() -> Vec<ItemType> {
thread_rng().gen_iter().take(TEST_SIZE).collect()
}
fn calc_sum(numbers: &Vec<ItemType>) -> SumType {
let mut sum = 0;
for &num in numbers {
if num < ItemType::max_value() / 2 {
sum += num.into();
}
}
sum
}
If I benchmark the exact code from above I get reasonable results (like in the linked question):
test bench_train ... bench: 148,611 ns/iter (+/- 8,445)
test bench_train_sort ... bench: 21,064 ns/iter (+/- 1,980)
However, if I change SumType
to u8
both versions run equally fast and much faster overall:
test bench_train ... bench: 1,272 ns/iter (+/- 64)
test bench_train_sort ... bench: 1,280 ns/iter (+/- 170)
First of: of course, the sum
will overflow all the time, but in release mode the overflow checks of Rust are disabled, so we just calculate a wrong result without panicking. Could this be the reason for the surprisingly short time?
Even stranger: when I change the implementation of calc_sum
to something more idiomatic, the results change again. My second implementation:
fn calc_sum(numbers: &Vec<ItemType>) -> SumType {
numbers.iter()
.filter(|&&num| num < ItemType::max_value() / 2)
.fold(0, |acc, &num| acc + (num as SumType))
}
With this implementation the SumType
doesn't matter anymore. With u8
as well as with u64
I get these results:
test bench_train ... bench: 144,411 ns/iter (+/- 12,533)
test bench_train_sort ... bench: 16,966 ns/iter (+/- 1,100)
So we again get the numbers we are expecting. So the question is:
What is the reason for the strange running times?
PS: I tested with cargo bench
which compiles in release mode.
PPS: I just noticed that in the first implementation of calc_sum
I use into()
for casting, whereas I use as
in the second example. When also using as
in the first example, I get more strange numbers. With SumType = u64
:
test bench_train ... bench: 39,850 ns/iter (+/- 2,355)
test bench_train_sort ... bench: 39,344 ns/iter (+/- 2,581)
With SumType = u8
:
test bench_train ... bench: 1,184 ns/iter (+/- 339)
test bench_train_sort ... bench: 1,239 ns/iter (+/- 85)
perf
tool to be really useful. I may look at it later out of curiosity, but not right now. – Mathildamathilde