How to read the "Intel Intrinsics Guide"?
Asked Answered
F

1

6

I am trying to get started with AVX512 intrinsics by reading the Intel Intrinsics Guide but so far I have found that it does not define the named datatypes or the pseudocode syntax used for explanation. Without such definitions, the so-called guide is not guiding me in the least.

For example, if I look up the function _mm512_slli_epi32 (__m512i a, unsigned int imm8) which takes a vector a of packed 32-bit integers and does something to it, the guide says the result is stored in something called dst (undefined) and the operation is as follows.

FOR j := 0 to 15
    i := j*32
    IF imm8[7:0] > 31
        dst[i+31:i] := 0
    ELSE
        dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0])
    FI
ENDFOR
dst[MAX:512] := 0

What on earth am I supposed to make out of this without proper documentation? There isn't even a link to documentation on the syntax used.

Kindly help. I am looking for a guide to "Intel Intrinsics Guide". Alternatively, I would also appreciate any other pedagogical introduction to Intel intrinsics. This answer does not help. Thanks!

Fusiform answered 12/6, 2020 at 17:22 Comment(19)
imm8[7:0], dst[i+31:i] and dst[MAX:512] refer to bit ranges within operands. Anything else?Dartboard
A document where I can learn about all these definitions and syntax instead of having to ask about every new undefined term or pseudocode?Fusiform
Start with §3.7 OPERAND ADDRESSING in Intel 64 and IA-32 Architectures Software Developer's Manual.Dartboard
It's the same pseudocode syntax they use in their asm manuals, see SDM vol.2 software.intel.com/content/www/us/en/develop/articles/…. ZeroExtend32 should be pretty self-explanatory.Saga
@MaximEgorushkin: IDK, you tell me. I really don't know Pascal. Whatever it is, I've found it clear enough / self-explanatory, although sometimes it takes some time to think through the more complex code, and AVX512 masking of course adds complication to the code. But the general pattern of looping over SIMD elements, and then having a bit-range for that element with another tmp var, is pretty common. Also, the data types are always integer for integer instructions.Saga
Okay, thanks. I will look into these documents. I don't know Pascal.Fusiform
It is definitely not Pascal, there would be begin and end required at every block (at least, if it is more than one line). This is essentially some pseudo code -- maybe mostly "BASIC"-style (which has thousands of dialects, though).Esemplastic
@PeterCordes I am sorry, I am pretty new to this, what does ZeroExtend32 do? You see that is my point. It seems to be written only for experts. Where does a beginner begin?Fusiform
@NanashiNoGombe You need to trust the words here, IMO. Extend the operand to 32 bits. If the operand is smaller than 32 bits, fill the top bits with 0 (zero-extend). The alternative is sign-extend - fill the top bits with the sign bit. In this particular case, the operand is 32-bit, not sure why ZeroExtend32 is there.Dartboard
@MaximEgorushkin: I think they're using ZeroExtend32 to describe truncating the arbitrary-precision shift result to 32-bit. The SDM vol.2 manual only uses ZeroExtend not ZeroExtend32, e.g. for the pslld entry. The asm manual has a diagram; if you ever find the intrinsics guide not clear, check the asm manual. But this seems clear enough to me, especially given the known fact that this is a SIMD left shift of separate elements (as described by the name and the English text), so bits can't shift between element boundaries.Saga
@PeterCordes You are quite right, I suspected that Extend also means Truncate for this function.Dartboard
You may also like agner.org/optimize/optimizing_assembly.pdfDartboard
Maybe these explanations are easier to understand for you: officedaytime.com/simd512e. But in the end, there is not a single "guide" to transition from a "beginner" to an "expert" programmer -- I'll vote this question as off-topic.Esemplastic
I found this one quite useful: scc.ustc.edu.cn/zlsc/chinagrid/intel/compiler_c/main_cls/…Fusiform
@PeterCordes I do not want to make another question for this, so asking you here. Why is the immediate operand indexed as 32*imm8[3:0] when shifting by 32-bit elements and 64*imm8[2:0] when shifting by 64-bit elements and sometimes without any indexing at all? And what is the meaning of imm8[7:0] > 31 in the above instruction? Thanks a ton! Appreciate it.Fusiform
imm8[7:0] > 31 is checking the whole 8-bit count for being greater than 31. If so, it's like it shifts out all the bits, leaving 0. i.e. it saturates the shift count instead of masking it like scalar integer shifts (like shl). As for 32*imm8[3:0], that's treating the low 4 bits as an integer and multiplying it by 32. Where are you seeing that? It's not in the pseudocode for pslld, but presumably the high bits are irrelevant for whatever operation you found this in.Saga
@PeterCordes Thanks. Isn't imm8 supposed to be 32 bits wide? I got the other index ranges in the functions _mm512_alignr_epi32 and _mm512_alignr_epi64. Taking only a partial range would change the number, no? I am sorry if I am being stupid.Fusiform
imm8 is 8 bits wide (1 byte). That's what the 8 means; pretty much any time a name ends with a number (like rel8 or imm32), it's a bit width.Saga
alignr_epi32 shifts in units of 32-bit elements, so you can express it as a bit-shift by count*32. And the number of bits it needs to look at is determined by the number of elements in the full vector.Saga
C
10

Intel calls dst the return value of the instruction. Overall, that instruction does this:

inline std::array<int, 16> slli( std::array<int, 16> a, int imm )
{
    for( int& tmp : a )
        tmp = ( imm > 31 ) ? 0 : tmp << imm;
    return a;
}

Here’s my article: http://const.me/articles/simd/simd.pdf I hope a good introduction.

Cloddish answered 30/6, 2020 at 17:39 Comment(1)
Very well written article, thanks for sharing.Hydrology

© 2022 - 2024 — McMap. All rights reserved.