How can I do operations on Span<T> in parallel?
Asked Answered
G

1

6

I want to do operations on Span<T> in parallel but something like this is not legal:

void DoSomething(Span<int> buffer, int option1, int option2)
{
   ....... 
}

void ParallelDoSomething(Span<int> buffer)
{
    var size = buffer.Length;
    Parallel.Invoke(() => DoSomething(buffer, 0, size / 2),
        () => DoSomething(buffer, size/2, size)); //not legal
}

since compiler complains that: Cannot use ref, out, or in parameter 'buffer' inside an anonymous method, lambda expression, query expression, or local function

How can I execute in parallel methods which take Span<T> as arguments?

Gyno answered 22/3, 2021 at 13:9 Comment(8)
Possible duplicate of What is a working alternative to being unable to pass a Span<T> into lambda expression?Shawana
as a side note: I wonder if the option1/option2 should be removed from DoSomething(); that looks like something that should be applied via slice, instead; as a side note, doing so can improve performance: for(int i = 0 ; i < span.Length; i++) and foreach(var val in span) have JIT optimizations (bounds checks elision), but for (int i = option1; i < option2; i++) does notHangnail
What about parallelizing the caller of ParallelDoSomething insteadAnthesis
@Mark Gravell, sure option1/ option2 doesn't matter here. I was just trying to quickly change a Mergesort implementation to use stack allocated Spans instead of arrays, because MemoryExtensions.Sort doesn't help me.Gyno
@Charlieface, there's nothing to parallelize in caller.Gyno
How is buffer created?Anthesis
@Charlieface, it was something like Span<int> buffer = stackallock int[size].Gyno
So generate a separate Span inside each thread then merge at the end, maybe?Anthesis
H
9

The problem here is that a Span<T> cannot be allowed onto the heap - it is only valid on the stack - which means that it can't be either boxed, or used as a field on a class (or a struct except a ref struct). This rules out both captured variables and most common forms of state parameters.

If you have the luxury of changing the input to be a memory, you can capture the memory, and get the span inside the lambda body:

void ParallelDoSomething(Memory<int> memory)
{
    var size = memory.Length;
    Parallel.Invoke(
        () => DoSomething(memory.Span, 0, size / 2),
        () => DoSomething(memory.Span, size/2, size)
    );
}

If you can't change the input, you can still do it... by cheating. You can pin the existing span, create a memory that covers that pinned data, and use that memory just like you would if it had been passed in as a memory. This isn't entirely trivial, since you need to write your own pointer-based memory manager implementation, but: it works. Here's an example from protobuf-net: https://github.com/protobuf-net/protobuf-net/blob/main/src/protobuf-net.Core/Meta/TypeModel.cs#L767-L789

Or perhaps more conveniently, pin the span and capture the pointer directly, noting that the compiler doesn't normally allow this (to prevent the pointer being used by a delegate later), but since we know the timing semantics, we can make it happy by duplicating the pointer:

unsafe void ParallelDoSomething(Span<int> span)
{
    var size = span.Length;
    fixed (int* ptr = span)
    {
        int* evil = ptr; // make the compiler happy
        Parallel.Invoke(
            () => DoSomething(new Span<int>(evil, size), 0, size / 2),
            () => DoSomething(new Span<int>(evil, size), size / 2, size)
        );
    }
}

or if we wanted to fix slice the span at the point of input:

unsafe void ParallelDoSomething(Span<int> span)
{
    var size = span.Length;
    fixed (int* ptr = span)
    {
        int* evil = ptr; // make the compiler happy
        Parallel.Invoke(
            () => DoSomething(new Span<int>(evil, size / 2));
            () => DoSomething(new Span<int>(evil + (size/2), size - (size/2));
        );
    }
}
Hangnail answered 22/3, 2021 at 13:54 Comment(5)
For flair and style you can even declare a local DoSomething method that handles the spanning and hides the evil (reducing the chance that evil spreads across the land). Less appropriate if you have many DoSomethings of course. (Or many spans.) Worth noting also, perhaps, that in the drive to reduce copying of memory (which is often why Spans are pulled in) this code of course has plenty of hidden allocations, so for the OP it may be worth taking a step back and looking at what we're really trying to accomplish.Birmingham
@JeroenMostert yep, agree entirely; I do quite a lot of this kind of parallelism (it drives some key parts of Stack Overflow), but I don't think I've ever used the Parallel type, precisely because it doesn't have an API surface that is well suited to super-low allocations. I'm a huge fan of the new static modifier on lambdas and local functions!Hangnail
@Jeroen Mostert yes, I'm trying to reduce both memory copy and allocations, so what would you suggest using instead of Parallel?Gyno
Thank you, I think using Memory isn't an option since the buffer is a stack allocated Span. I was afraid that using unsafe magic would be the only way. You confirmed it, thanks.Gyno
@VladRadu: That is probably food for another question and depends a lot on your actual scenario. If you're operating on more than enough data to make parallelism interesting, a decent question would be whether you really need Span at all (as opposed to passing around Memory). Conversely, if the data isn't that much but is used to fuel expensive operations that need to be parallelized, copying it is probably not a significant overhead (spinning up threads isn't cheap either, after all, even with Task). Given his experience, Marc could probably weigh in on a follow-up question too.Birmingham

© 2022 - 2024 — McMap. All rights reserved.