Memory reclamation with crossbeam::epoch
Asked Answered
H

2

6

I've encountered an issue with memory reclamation in crossbeam. Say you're implementing a simple thread-safe lock-free container that holds a single value. Any thread can obtain a clone of the stored value, and the value can get updated at any point, after which readers begin observing clones of the new value.

Although the typical use case would be to specify something like Arc<X> as T, the implementation cannot rely on T being pointer-sized - for example, X could be a trait, resulting in a fat-pointer Arc<X>. But lock-free access to an arbitrary T seems like a great fit for epoch-based lock-free code. Based on the examples, I came up with this:

extern crate crossbeam;

use std::thread;
use std::sync::atomic::Ordering;

use crossbeam::epoch::{self, Atomic, Owned};

struct Container<T: Clone> {
    current: Atomic<T>,
}

impl<T: Clone> Container<T> {
    fn new(initial: T) -> Container<T> {
        Container { current: Atomic::new(initial) }
    }

    fn set_current(&self, new: T) {
        let guard = epoch::pin();
        let prev = self.current.swap(Some(Owned::new(new)),
                                     Ordering::AcqRel, &guard);
        if let Some(prev) = prev {
            unsafe {
                // once swap has propagated, *PREV will no longer
                // be observable
                //drop(::std::ptr::read(*prev));
                guard.unlinked(prev);
            }
        }
    }

    fn get_current(&self) -> T {
        let guard = epoch::pin();
        // clone the latest visible value
        (*self.current.load(Ordering::Acquire, &guard).unwrap()).clone()
    }
}

When used with a type that doesn't allocate, e.g. with T=u64, it works great - set_current and get_current can be called millions of times without leaking. (The process monitor shows minor memory oscillations due to epoch pseudo-gc, as would be expected, but no long-term growth.) However, when T is a type that allocates, e.g. Box<u64>, one can easily observe leaks. For example:

fn main() {
    use std::sync::Arc;
    let c = Arc::new(Container::new(Box::new(0)));
    const ITERS: u64 = 100_000_000;
    let producer = thread::spawn({
        let c = Arc::clone(&c);
        move || {
            for i in 0..ITERS {
                c.set_current(Box::new(i));
            }
        }
    });
    let consumers: Vec<_> = (0..16).map(|_| {
        let c = Arc::clone(&c);
        thread::spawn(move || {
            let mut last = 0;
            loop {
                let current = c.get_current();
                if *current == ITERS - 1 {
                    break;
                }
                assert!(*current >= last);
                last = *current;
            }
        })}).collect();
    producer.join().unwrap();
    for x in consumers {
        x.join().unwrap();
    }
}

Running this program shows a steady and significant increase in memory usage that ends up consuming the amount of memory proportional to the number of iterations.

According to the blog post introducing it, Crossbeam's epoch reclamation "does not run destructors, but merely deallocates memory". The try_pop in the Treiber stack example uses ptr::read(&(*head).data) to move the value contained in head.data out of the head object destined for deallocation. The ownership of the data object is transferred to the caller which will either move it elsewhere or deallocate it when it goes out of scope.

How would that translate to the code above? Is the setter the proper place for guard.unlinked, or how else does one ensure that the drop is run on the underlying object? Uncommenting the explicit drop(ptr::read(*prev)) results in failed assertion that checks monotonicity, possibly indicating premature deallocation.

Helm answered 17/9, 2017 at 18:44 Comment(0)
P
6

The crux of the problem is (as you already figured out yourself) that guard.unlinked(prev) defers execution of the following piece of code:

drop(Vec::from_raw_parts(prev.as_raw(), 0, 1));

But you want it to defer this instead:

drop(Vec::from_raw_parts(prev.as_raw(), 1, 1));

Or, equivalently:

drop(Box::from_raw(prev.as_raw());

In other words, unlinked simply frees the memory in which the object is stored, but doesn't drop the object itself.

This is currently a known pain point in Crossbeam, but fortunately it will be resolved soon. Crossbeam's epoch-based garbage collector is currently undergoing a redesign and rewrite in order to:

  • allow deferred drops and arbitrary deferred functions
  • incrementally collect garbage in order to minimize pauses
  • avoid overcrowding thread-local garbage bags
  • more eagerly collect large pieces of garbage
  • fix a soundness issue in the API

If you're curious to find out more on the new Crossbeam design, check out the RFCs repository. I suggest starting with the RFC on new Atomic and the RFC on new GC.

I've created an experimental crate, Coco, which has a lot in common with Crossbeam's new design. If you need a solution right now, I'd suggest switching to it. But keep in mind that Coco will be deprecated in favor of Crossbeam as soon as we release a new version (probably this or the next month).

Pecker answered 18/9, 2017 at 10:10 Comment(2)
Hi Stjepan! I am really glad to see you here, especially on this very specific question which has you can see the "usual suspects" didn't manage to answer. If you want to chat or need anything, don't hesitate to pop in the Rust chatroom: chat.stackoverflow.com/rooms/62927/rustSandler
Thanks, coco does exactly what I need - I've now added an answer with the code from question updated for coco. For me it doesn't matter that coco will be eventually deprecated, I'll just switch back to Crossbeam then.Helm
H
1

As Stjepan answered in some detail, it is a known limitation of current Crossbeam that it only supports deallocation and not full dropping of objects that have become unreachable, but are potentially still visible to other threads. This does not affect the lock-free collections supported by Crossbeam, which automatically remove items "observed" by the collection user - no peeking is allowed. This fits the need of a queue or stack, but not of e.g. a lock-free map.

This is addressed by the coco crate, which defines several concurrent collections and serves as a preview of the next generation of Crossbeam design. It supports deferred dropping of values. Here is a rendition of Container using coco:

use std::thread;
use std::sync::atomic::Ordering;

use coco::epoch::{self, Atomic, Owned};

struct Container<T: Clone> {
    current: Atomic<T>,
}

impl<T: Clone> Container<T> {
    fn new(initial: T) -> Container<T> {
        Container { current: Atomic::new(initial) }
    }

    fn set_current(&self, new: T) {
        epoch::pin(|scope| {
            let prev = self.current.swap(Owned::new(new).into_ptr(&scope),
                                         Ordering::AcqRel, &scope);
            unsafe {
                scope.defer_drop(prev);
            }
        })
    }

    fn get_current(&self) -> T {
        epoch::pin(|scope| {
            let obj_ref = unsafe {
                self.current.load(Ordering::Acquire, &scope).as_ref().unwrap()
            };
            obj_ref.clone()
        })
    }
}

When run with the same main() as in the question, it does not leak memory.

One thing to take into account is that, according to the documentation, epoch::pin() comes with the cost of a SeqCst fence and a few atomic ops. (Note that epoch::pin() was not free under Crossbeam either, and was in fact much more expensive.) The delay of 10-15 ns on modern hardware might not be relevant for most uses, but users should be aware of it when writing code that attempts to squeeze every nanosecond out of its lock-free operations.

Helm answered 18/9, 2017 at 20:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.