TL;DR
A functional implementation can be faster than your original procedural implementation, in certain cases.
Why is the functional style so much slower than the imperative style? Is there some problem with the functional implementation which is causing such a huge slowdown?
As Matthieu M. already pointed out, the important thing to note is that the algorithm matters. How that algorithm is expressed (procedural, imperative, object-oriented, functional, declarative) generally doesn't matter.
I see two main issues with the functional code:
Allocating numerous strings over and over is inefficient. In the original functional implementation, this is done via to_string
and format!
.
There's the overhead of using group_by
, which exists to give a nested iterator, which you don't need just to get the counts.
Using more of itertools (batching
, take_while_ref
, format_with
) brings the two implementations much closer:
pub fn encode_slim(data: &str) -> String {
data.chars()
.batching(|it| {
it.next()
.map(|v| (v, it.take_while_ref(|&v2| v2 == v).count() + 1))
})
.format_with("", |(c, count), f| match count {
1 => f(&c),
n => f(&format_args!("{}{}", n, c)),
})
.to_string()
}
A benchmark of 4MiB of random alphanumeric data, compiled with RUSTFLAGS='-C target-cpu=native'
:
encode (procedural) time: [21.082 ms 21.620 ms 22.211 ms]
encode (fast) time: [26.457 ms 27.104 ms 27.882 ms]
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
If you are interested in creating your own iterator, you can mix-and-match the procedural code with more functional code:
struct RunLength<I> {
iter: I,
saved: Option<char>,
}
impl<I> RunLength<I>
where
I: Iterator<Item = char>,
{
fn new(mut iter: I) -> Self {
let saved = iter.next(); // See footnote 1
Self { iter, saved }
}
}
impl<I> Iterator for RunLength<I>
where
I: Iterator<Item = char>,
{
type Item = (char, usize);
fn next(&mut self) -> Option<Self::Item> {
let c = self.saved.take().or_else(|| self.iter.next())?;
let mut count = 1;
while let Some(n) = self.iter.next() {
if n == c {
count += 1
} else {
self.saved = Some(n);
break;
}
}
Some((c, count))
}
}
pub fn encode_tiny(data: &str) -> String {
use std::fmt::Write;
RunLength::new(data.chars()).fold(String::new(), |mut s, (c, count)| {
match count {
1 => s.push(c),
n => write!(&mut s, "{}{}", n, c).unwrap(),
}
s
})
}
1 — thanks to Stargateur for pointing out that eagerly getting the first value helps branch prediction.
A benchmark of 4MiB of random alphanumeric data, compiled with RUSTFLAGS='-C target-cpu=native'
:
encode (procedural) time: [19.888 ms 20.301 ms 20.794 ms]
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
encode (tiny) time: [19.150 ms 19.262 ms 19.399 ms]
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
I believe this more clearly shows the main fundamental difference between the two implementations: an iterator-based solution is resumable. Every time we call next
, we need to see if there was a previous character that we've read (self.saved
). This adds a branch to the code that isn't there in the procedural code.
On the flip side, the iterator-based solution is more flexible — we can now compose all sorts of transformations on the data, or write directly to a file instead of a String
, etc. The custom iterator can be extended to operate on a generic type instead of char
as well, making it very flexible.
See also:
If I want to write high performance code, should I ever use this functional style?
I would, until benchmarking shows that it's the bottleneck. Then evaluate why it's the bottleneck.
Supporting code
Always got to show your work, right?
benchmark.rs
use criterion::{criterion_group, criterion_main, Criterion}; // 0.2.11
use rle::*;
fn criterion_benchmark(c: &mut Criterion) {
let data = rand_data(4 * 1024 * 1024);
c.bench_function("encode (procedural)", {
let data = data.clone();
move |b| b.iter(|| encode_proc(&data))
});
c.bench_function("encode (functional)", {
let data = data.clone();
move |b| b.iter(|| encode_iter(&data))
});
c.bench_function("encode (fast)", {
let data = data.clone();
move |b| b.iter(|| encode_slim(&data))
});
c.bench_function("encode (tiny)", {
let data = data.clone();
move |b| b.iter(|| encode_tiny(&data))
});
}
criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
lib.rs
use itertools::Itertools; // 0.8.0
use rand; // 0.6.5
pub fn rand_data(len: usize) -> String {
use rand::distributions::{Alphanumeric, Distribution};
let mut rng = rand::thread_rng();
Alphanumeric.sample_iter(&mut rng).take(len).collect()
}
pub fn encode_proc(source: &str) -> String {
let mut retval = String::new();
let firstchar = source.chars().next();
let mut currentchar = match firstchar {
Some(x) => x,
None => return retval,
};
let mut currentcharcount: u32 = 0;
for c in source.chars() {
if c == currentchar {
currentcharcount += 1;
} else {
if currentcharcount > 1 {
retval.push_str(¤tcharcount.to_string());
}
retval.push(currentchar);
currentchar = c;
currentcharcount = 1;
}
}
if currentcharcount > 1 {
retval.push_str(¤tcharcount.to_string());
}
retval.push(currentchar);
retval
}
pub fn encode_iter(data: &str) -> String {
data.chars()
.group_by(|&c| c)
.into_iter()
.map(|(c, group)| match group.count() {
1 => c.to_string(),
n => format!("{}{}", n, c),
})
.collect()
}
pub fn encode_slim(data: &str) -> String {
data.chars()
.batching(|it| {
it.next()
.map(|v| (v, it.take_while_ref(|&v2| v2 == v).count() + 1))
})
.format_with("", |(c, count), f| match count {
1 => f(&c),
n => f(&format_args!("{}{}", n, c)),
})
.to_string()
}
struct RunLength<I> {
iter: I,
saved: Option<char>,
}
impl<I> RunLength<I>
where
I: Iterator<Item = char>,
{
fn new(mut iter: I) -> Self {
let saved = iter.next();
Self { iter, saved }
}
}
impl<I> Iterator for RunLength<I>
where
I: Iterator<Item = char>,
{
type Item = (char, usize);
fn next(&mut self) -> Option<Self::Item> {
let c = self.saved.take().or_else(|| self.iter.next())?;
let mut count = 1;
while let Some(n) = self.iter.next() {
if n == c {
count += 1
} else {
self.saved = Some(n);
break;
}
}
Some((c, count))
}
}
pub fn encode_tiny(data: &str) -> String {
use std::fmt::Write;
RunLength::new(data.chars()).fold(String::new(), |mut s, (c, count)| {
match count {
1 => s.push(c),
n => write!(&mut s, "{}{}", n, c).unwrap(),
}
s
})
}
#[cfg(test)]
mod test {
use super::*;
#[test]
fn all_the_same() {
let data = rand_data(1024);
let a = encode_proc(&data);
let b = encode_iter(&data);
let c = encode_slim(&data);
let d = encode_tiny(&data);
assert_eq!(a, b);
assert_eq!(a, c);
assert_eq!(a, d);
}
}
map
step with aflat_map
step, with a special-purpose iterator implementation taking the character and count and outputting the required stream of bytes. Forward encoding the integer is a bit tricky, but not too bad withcount_leading_zeroes
giving a hint of the magnitude (clz(i) * 77 / 256
gives the log 10). – Nasho