Is there a way to drain parts of a vector based on a predicate?
Asked Answered
H

3

28

I'm trying to remove some elements from a vector, based on a predicate, and collecting the result. Here's a (not working) example with an expected result:

let mut v: Vec<i32> = vec![1, 2, 3, 4, 5, 6];

let drained: Vec<i32> = v.iter().filter(|e| (*e) % 2 == 0).drain(..).collect();

assert_eq!(v, vec![1, 3, 5]);
assert_eq!(drained, vec![2, 4, 6]);

This results in the error

error[E0599]: no method named `drain` found for type `std::iter::Filter<std::slice::Iter<'_, i32>, [closure@src/main.rs:4:45: 4:62]>` in the current scope
 --> src/main.rs:4:64
  |
4 |     let drained: Vec<i32> = v.iter().filter(|e| (*e) % 2 == 0).drain(..).collect();
  |                                                                ^^^^^

There are several alternatives I looked at, none of them seem to be doing what I want:

  • Vec::retain removes the elements from the vector, but doesn't give back ownership of the removed elements.

  • v.drain(..).filter(condition).collect() returns the correct value for drained but empties the whole vector.

Hyperbaton answered 9/10, 2017 at 16:50 Comment(1)
have you tried converting the array with to_vec? let drained: Vec<i32> = v.iter().filter(|e| (*e) % 2 == 0).to_vec().drain(..).collect();Weatherworn
B
25

Not in stable Rust 1.33.0. There's an unstable nightly feature called drain_filter that does exactly what you want:

#![feature(drain_filter)]

fn main() {
    let mut v: Vec<i32> = vec![1, 2, 3, 4, 5, 6];

    let drained: Vec<i32> = v.drain_filter(|&mut e| e % 2 == 0).collect();

    assert_eq!(v, vec![1, 3, 5]);
    assert_eq!(drained, vec![2, 4, 6]);
}

As a stable workaround, you may be able to use Iterator::partition, but it does not reuse the memory:

fn main() {
    let v: Vec<i32> = vec![1, 2, 3, 4, 5, 6];

    let (drained, v): (Vec<_>, Vec<_>) = v.into_iter().partition(|&e| e % 2 == 0);

    assert_eq!(v, vec![1, 3, 5]);
    assert_eq!(drained, vec![2, 4, 6]);
}
Bleat answered 9/10, 2017 at 17:16 Comment(1)
For future reference: the RFC, and the tracking issueHodgkin
C
1

I can suggest few other ways to do this + my benchmarks.

N.B. I compare all methods by few criteria:

  1. Does it support external source of truth (my use-case). E.g. Vec::retain supports that, meaning that you can write code like
// conditions: &[bool]
assert_eq!(conditions.len(), my_vec.len());
let cond = conditions.iter().copied();
my_vec.retain(move|_|cond.next().unwrap());
  1. Is method supported by third-party Vecs, namely ArrayVec, TinyVec, SmallVec, FixedSliceVec and others.
  2. Is it fast.

So, let's begin

Sort slice than split slice

Features:

  1. Can support external source of truth — No ❌. Would call closure O(n log n) times in unspecified order so support only predicates which calculated only directly from values.
  2. Third-party support — Excellent ✅. You can use it on anything convertible to mutable slice.
  3. Is it fast — In generic case, no ❌. It runs for O(n log n) time while other methods run for O(n) time. However, if preserving original relative order is not important for you, you can use sort_unstable_by_key which doesn't allocate memory at all, which can make it fastest in some scenarios.

Implementation:

v.sort_by_key(|x| predicate(x));
let split_pos = v.partition_point(|x| predicate(x));
let (false_slice, true_slice) = v.split_at_mut(split_pos)

Vec::drain_filter

  1. Can support external source of truth — Yes ✅. Visits items in original order exactly one time.
  2. Third-party support — Non existent ❌. Also, you can't it even use in stable Rust and it's tracking issue has been suffering from bikeshedding 5 years now (at 2022-07).
  3. Is it fast — Yes ✅.

Code

let removed_items: Vec<_> = v.drain_filter(|x| predicate(x)).collect();

MaybeUninit trick using unsafe code

Well, I wrote it myself.

Features:

  1. Can support external source of truth — Yes ✅. Visits items in original order exactly one time.
  2. Third-party support — Supported ✅. Note that you must audit their implementation of retain to ensure that it cannot panic itself.
  3. Is it fast — Yes ✅. In my benchmarks it is faster than Vec::drain_filter.

This implementation makes 2 assumptions:

  1. Internal layout of Vec<T> and Vec<MaybeUninit<T>> is same. There is no reason why it is not because memory layout of T and MaybeUninit<T> is same but I also verify this using asserts. Note that asserts would be removed by compiler optimizer because they are always true.
  2. retain doesn't panic itself (it can only propagate panics from element drop or from predicate). This is true for std::vec::Vec but you need to ensure that for third-party crates.

Algorithm is simple:

  1. reinterpret initial vector Vec<T> as Vec<MaybeUninit<T>>;
  2. wrap our predicate into new predicate that moves items which we want to remove into external storage;
  3. let Vec::retain handle removal of items.

Also, only reason of usage of MaybeUninit and unsafe is avoiding double-frees so if your elements implement Copy, this algorithm can be implemented in safe Rust. However, in that case you can just use filter(...).collect() and retain with almost same performance.

So, code is below with all comments why it is safe (note that I didn't test it using sanitizers or Miri so use it on your own risk):

/// Returns removed values.
fn retain_unsafe_generic<T: Sized>(
    v: &mut Vec<T>,
    mut which_to_keep: impl FnMut(&T) -> bool,
) -> Vec<T> {
    use std::mem::{transmute, MaybeUninit};

    /// # Safety
    /// Caller must ensure that if it makes living copies of inner items,
    /// those items is removed from original vec before original reference became usable again.
    unsafe fn as_uninits<T: Sized>(v: &mut Vec<T>) -> &mut Vec<MaybeUninit<T>> {
        let orig_ptr = v.as_ptr();
        let orig_cap = v.capacity();
        let orig_size = v.len();

        let v: &mut Vec<MaybeUninit<T>> = unsafe {
            // Safety: since `MaybeUninit` has same memory layout
            // as wrapped type, we assume that we can treat vec with T
            // as MaybeUninit<T>. This assumption checked by asserts below.
            //
            // Lifetimes of elements must be correctly enforced by caller.
            transmute(v)
        };
        // Check if layout of Vec with different element type remains same.
        assert_eq!(v.len(), orig_size);
        assert_eq!(v.capacity(), orig_cap);
        assert_eq!(v.as_ptr(), orig_ptr.cast());

        v
    }

    let mut res: Vec<T> = Vec::with_capacity(v.len());
    let v = unsafe {
        // Safety: we keep result reference only in `retain` call.
        // We would remove all moved elements using retain.
        as_uninits(v)
    };
    v.retain(
        // Safety: `Vec::retain` would remove all items which values we moved into `res`.
        // It wouldn call `drop::<T>` for removed values
        // because `MaybeUninit` never drops wrapped values.
        |x| unsafe {
            // Safety: it is safe because `Vec::retain` visits elements sequentally
            // so we haven't moved value from `x` yet.
            // https://doc.rust-lang.org/std/vec/struct.Vec.html#method.retain
            let val = &*x.as_ptr();
            if which_to_keep(val) {
                return true;
            }
            res.reserve(1);
            // Any panic before this place is safe because
            // 1. We didn't moved value from `x` yet;
            // 2. In case of panic in predicate, `Vec::retain` preserve current value.

            // Here we could probably use `Vec::push`
            // but compiler currently fails to remove capacity check in `Vec::push`
            // so this function became slower than `Vec::drain_filter`
            // https://godbolt.org/z/7fhnnMh46
            // And `Vec::push(x.assume_init_read())` is unsafe operation too anyway.

            let old_len = res.len();
            // Safety: We just allocated memory for this place.
            let dst = res.as_mut_ptr().add(old_len);
            // Safety: since we cannot panic until the end of closure
            // and `Vec::retain` wouldn't panic and would remove `x`,
            // making bitwise copy of `x` is safe.
            x.as_ptr().copy_to_nonoverlapping(dst, 1);
            // Safety: we just wrote additional value.
            res.set_len(old_len + 1);

            false
        },
    );

    res
}

Benchmarks

Code of benchmarks is long so here link to the gist: https://gist.github.com/AngelicosPhosphoros/7ee482316bc1c83945f88308954e0d7e It tries to split away odd numbers from they Vec using all three algorithms I listed.

Results:

algorithm Mixed Odds first Evens first
sort-split 465us 35us 10us
drain_filter 26us 24us 22.5us
retain-uninit 17us 21us 19us

As you see, retain usage won in all cases except when sort-split doesn't actually have anything to do. It is mainly because Vec::retain has been rigorously optimized over the years.

Crucible answered 16/7, 2022 at 15:14 Comment(0)
P
0

The documentation state that Vec.retain will operate in-place, and visit each element in order, exactly once.

fn drain_where<T, Pred : Fn(&T) -> bool>(source: &mut Vec<T>, pred: Pred) -> Vec<T>
  where T : Copy {
  let mut drained: Vec<T> = Vec::new();
  
  source.retain(|item| {
    if pred(item) { drained.push(*item); false } else { true }
  });
  
  drained
}
Padriac answered 18/3, 2022 at 22:12 Comment(1)
It only works with types that are Copy, so not generally usefulLinkoski

© 2022 - 2024 — McMap. All rights reserved.