Using C#'s Vector<T>
, how can we most efficiently vectorize the operation of finding the index of a particular element in a set?
As constraints, the set will always be a Span<T>
of an integer primitive, and it will contain at most 1 matching element.
I have come up with a solution that seems alright, but I'm curious if we can do better. Here is the approach:
- Create a
Vector<T>
consisting only of the target element, in each slot. - Use
Vector.Equals()
between the input set vector and the vector from the previous step, to get a mask containing a 1 in the single matching slot (or only 0s if there is no match). - Using a pre-initialized vector containing 1-based indexes (1, 2, 3, 4, ...), call
Vector.Dot()
between that vector and the mask from the previous step. Each index will be multiplied by 0, except the potential matching index, which will be multiplied by 1. What we get back is the sum of those multiplications, which is either 0, or the 1-based index of the matching element. If the result was 0, return -1 for no match. Otherwise, subtract one from the result to make it 0-based, and return that.
// One-time initialized vector containing { 1, 2, 3, 4, ... } Vector<ushort> indexes = MemoryMarshal.Cast<ushort, Vector<ushort>>(Enumerable.Range(1, Vector<ushort>.Count).Select(index => (ushort)index).ToArray())[0]; // The input set and the element to search for Span<ushort> set = stackalloc ushort[]{ 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 }; ushort element = 22; // Interpret input set as a sequence of vectors (set is assumed to have length power of two for brevity) var setVectors = MemoryMarshal.Cast<ushort, Vector<ushort>>(set); // Create a vector that contains the target element in each slot var elementVector = new Vector<ushort>(element); // Loop per vector rather than per element foreach (var vector in setVectors) { // Get a mask that has a 1 in the single matching slot, or only 0s var mask = Vector.Equals(vector, elementVector); // Get the dot product of the mask and the indexes // This will multiple each index by 0, or by 1 if it is the matching one, and return their sum, i.e. the matching index or 0 // Note that the indexes are deliberately 1-based, to distinguished from 0 (no match) var index = Vector.Dot(indexes, mask); // Either return 0 for no match, or reduce the index by 1 to get the 0-based index return index == 0 ? -1 : index - 1; }
JsonReaderHelper.IndexOfOrLessThan
: If we regularly expect to have multiple vectors without a match in the first one, we could optimize someVector.Dot
calls away by short-circuiting ifVector<ushort>.Zero.Equals(mask)
. – Keverne