Testing which trits are set in a binary representation

I have a problem for which I have eight elements that can contain 0, 1, or 2. I can easily represent this in 16 bits, but for SIMD efficiency reasons, I need it to occupy 13 bits (it is not the only thing present in the lane).

Fortunately, 2^13==8192, and 3^8==6561, so the states I want can fit. However, here's where things get interesting. Naively, I would just represent these states by counting the ternary numeral states. For example, to represent the tritmask 0t12211012 (I'll use this as an example thoughout), I could just write 0t12211012 = 2*3^0+1*3^1+0*3^2+1*3^3+1*3^4+2*3^5+2*3^6+1*3^7 = 4244 = 0b1000010010100.

I have a set of operations I need to support:

Modify trits. This is easy in the default representation. For instance, if I have tritmask 0t12211012 and I wish to place a 2 in the position holding a zero, I can simply add 0t200=18. (Note that the conversion to tritspace is easy, because I only have 8 trits, so I can store the base powers in a register and index it with pshufw).
Find all elements set to a particular value. For example, given the tritmask 0t12211012, I want to be able to extract the bitmask for 0, which is 0b00000100, for 1, which is 0b10011010, and for 2, which is 0b01100001. This I have not figured out how to do, and is what I would like assistance with. How can I do this in a small number of operations suitable for x86 SIMD?

Thank you!

Edit 11/18/20: To give an example of an approach I consider too slow: we can iteratively find the value mod 3 and divide by 3 to pull trits off the least-significant end of the representation, then assemble the mask that way. C++ snippet:

uint32_t trits = <something>;
uint8_t mask0 = 0, mask1 = 0, mask2 = 0;
for (uint8_t shift = 0; shift < 8; ++shift) {
  const uint32_t remainder = trits % 3;
  mask0 |= (!remainder) << shift;
  mask1 |= (remainder == 1) << shift;
  mask2 |= (remainder == 2) << shift;
  trits /= 3;
}

When actually writing this in a SIMD language, we would use the standard multiply-and-shift trick for division by a constant. But you can see it's linear in the number of trits, and has a lot of ops per iteration. We could code-golf this down a bit, but I think it is fundamentally the wrong approach. It should ideally be possible to do something in parallel for each trit... but I don't see it.

Edit 11/20/20: I've made a halfhearted effort to apply Aha to this problem without success. Maybe an interesting subproblem to solve instead is - is there a short sequence of bitwise ops under the same constraints as above that acts as a 'ternary bitwise AND'? That is, an op that compares two encoded numbers in tritspace and returns a bitmask that is 1 when the corresponding trits are equal and zero otherwise? That would be a primitive from which we could build up the ops needed. We have left and right shift in tritspace (just multiply or divide by 3); and we have +/- a value. So what we are missing is the ability to test if trits are particular values...

Recommended topics

Hot tags