You can find some applications listed in the paper regarding the hardware unit for PDEP/PEXT
There are many emerging applications, such as cryptography, imaging and biometrics, where more advanced bit manipulation operations are needed. While these can be built from the simpler logical and shift operations, the applications using these advanced bit manipulation operations are significantly sped up if the processor can support more powerful bit manipulation instructions. Such operations include arbitrary bit permutations, performing multiple bit-field extract operations in parallel, and performing multiple bit-field deposit operations in parallel. We call these permutation (perm), parallel extract (pex) or bit gather, and parallel deposit (pdep) or bit scatter operations, respectively.
Performing Advanced Bit Manipulations Efficiently in General-Purpose Processors
Bit permutation is extremely common in bitboards, for example reverse bytes/words or mirror bit arrays. There are lots of algorithms in it that require extensive bit manipulation and people had to get creative to do that before the era of PEXT/PDEP. Later many card game engines also use that technique to deal with a single game set in just one or a few registers
PDEP/PEXT is also used to greatly improve bit interleaving performance, which is common in algorithms like Morton code. Some examples on this:
The multiplication technique invented for bitboards is also commonly used in many algorithms in Bit Twiddling Hacks, for example interleave bits with 64-bit multiply. This technique is no longer needed when PDEP/PEXT is available
You can find more detailed information in Bit permutations and Hacker's Delight
Another usage for PDEP/PEXT is to extract/combine fields where the bits are not in contiguous positions, for example disassemble RISC-V instructions where immediates scatter around to make hardware design simpler but also make it a bit messier to work with on software without PDEP/PEXT
Some other applications:
I think the pext / pdep instructions have HUGE implications to 4-coloring problem, 3-SAT, Constraint Solvers, etc. etc. More researchers probably should look into those two instructions.
Just look at Binary Decision Diagrams, and other such combinatorial data structures, and you can definitely see the potential uses of PEXT / PDEP all over the place.
https://news.ycombinator.com/item?id=19137260
How would the compiler know when to use this instruction?
Compilers can recognize common patterns and optimize the instruction sequence, but for advanced things like this then programmers usually need to explicitly call intrinsics from high level code
pext
and carry-less multiplication, what did you find? – Gentlemanfarmersite:stackoverflow.com pext pdep
for more. IDK if there are any cases where a compiler would use it for you; normally you'd use it via intrinsics, i.e._pdep_u64
– Burglarypdep/pext
is that they are sort of a "because we can" instruction. It has various special cases that could be useful (packing/unpacking values scattered across bitfields, interleaving bits for Z-ordering, etc) and so perhaps they just went ahead and implemented a generic version to allow for programmers to be creative. It's one of those things, like popcount or lzcount, that can be done many times more efficiently in hardware with a dedicated instruction than in software using existing instructions. – Prakash