How to read the "Intel Intrinsics Guide"? - McMap

About

How to read the "Intel Intrinsics Guide"?

Asked 12/6, 2020 at 17:22 Answered 30/6, 2020 at 17:39

Solved intel simd intrinsics

F

1

6

I am trying to get started with AVX512 intrinsics by reading the Intel Intrinsics Guide but so far I have found that it does not define the named datatypes or the pseudocode syntax used for explanation. Without such definitions, the so-called guide is not guiding me in the least.

For example, if I look up the function _mm512_slli_epi32 (__m512i a, unsigned int imm8) which takes a vector a of packed 32-bit integers and does something to it, the guide says the result is stored in something called dst (undefined) and the operation is as follows.

FOR j := 0 to 15
    i := j*32
    IF imm8[7:0] > 31
        dst[i+31:i] := 0
    ELSE
        dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0])
    FI
ENDFOR
dst[MAX:512] := 0

What on earth am I supposed to make out of this without proper documentation? There isn't even a link to documentation on the syntax used.

Kindly help. I am looking for a guide to "Intel Intrinsics Guide". Alternatively, I would also appreciate any other pedagogical introduction to Intel intrinsics. This answer does not help. Thanks!

Fusiform answered 12/6, 2020 at 17:22 Comment(19)

imm8[7:0], dst[i+31:i] and dst[MAX:512] refer to bit ranges within operands. Anything else? – Dartboard 12/6, 2020 at 17:30

A document where I can learn about all these definitions and syntax instead of having to ask about every new undefined term or pseudocode? – Fusiform 12/6, 2020 at 17:32

Start with §3.7 OPERAND ADDRESSING in Intel 64 and IA-32 Architectures Software Developer's Manual. – Dartboard 12/6, 2020 at 17:32

It's the same pseudocode syntax they use in their asm manuals, see SDM vol.2 software.intel.com/content/www/us/en/develop/articles/…. ZeroExtend32 should be pretty self-explanatory. – Saga 12/6, 2020 at 17:33

@MaximEgorushkin: IDK, you tell me. I really don't know Pascal. Whatever it is, I've found it clear enough / self-explanatory, although sometimes it takes some time to think through the more complex code, and AVX512 masking of course adds complication to the code. But the general pattern of looping over SIMD elements, and then having a bit-range for that element with another tmp var, is pretty common. Also, the data types are always integer for integer instructions. – Saga 12/6, 2020 at 17:34

Okay, thanks. I will look into these documents. I don't know Pascal. – Fusiform 12/6, 2020 at 17:36

It is definitely not Pascal, there would be begin and end required at every block (at least, if it is more than one line). This is essentially some pseudo code -- maybe mostly "BASIC"-style (which has thousands of dialects, though). – Esemplastic 12/6, 2020 at 17:41

@PeterCordes I am sorry, I am pretty new to this, what does ZeroExtend32 do? You see that is my point. It seems to be written only for experts. Where does a beginner begin? – Fusiform 12/6, 2020 at 17:42

@NanashiNoGombe You need to trust the words here, IMO. Extend the operand to 32 bits. If the operand is smaller than 32 bits, fill the top bits with 0 (zero-extend). The alternative is sign-extend - fill the top bits with the sign bit. In this particular case, the operand is 32-bit, not sure why ZeroExtend32 is there. – Dartboard 12/6, 2020 at 17:46

@MaximEgorushkin: I think they're using ZeroExtend32 to describe truncating the arbitrary-precision shift result to 32-bit. The SDM vol.2 manual only uses ZeroExtend not ZeroExtend32, e.g. for the pslld entry. The asm manual has a diagram; if you ever find the intrinsics guide not clear, check the asm manual. But this seems clear enough to me, especially given the known fact that this is a SIMD left shift of separate elements (as described by the name and the English text), so bits can't shift between element boundaries. – Saga 12/6, 2020 at 17:54

@PeterCordes You are quite right, I suspected that Extend also means Truncate for this function. – Dartboard 12/6, 2020 at 17:54

You may also like agner.org/optimize/optimizing_assembly.pdf – Dartboard 12/6, 2020 at 18:1

Maybe these explanations are easier to understand for you: officedaytime.com/simd512e. But in the end, there is not a single "guide" to transition from a "beginner" to an "expert" programmer -- I'll vote this question as off-topic. – Esemplastic 12/6, 2020 at 18:10

I found this one quite useful: scc.ustc.edu.cn/zlsc/chinagrid/intel/compiler_c/main_cls/… – Fusiform 13/6, 2020 at 13:16

@PeterCordes I do not want to make another question for this, so asking you here. Why is the immediate operand indexed as 32*imm8[3:0] when shifting by 32-bit elements and 64*imm8[2:0] when shifting by 64-bit elements and sometimes without any indexing at all? And what is the meaning of imm8[7:0] > 31 in the above instruction? Thanks a ton! Appreciate it. – Fusiform 15/6, 2020 at 17:26

imm8[7:0] > 31 is checking the whole 8-bit count for being greater than 31. If so, it's like it shifts out all the bits, leaving 0. i.e. it saturates the shift count instead of masking it like scalar integer shifts (like shl). As for 32*imm8[3:0], that's treating the low 4 bits as an integer and multiplying it by 32. Where are you seeing that? It's not in the pseudocode for pslld, but presumably the high bits are irrelevant for whatever operation you found this in. – Saga 15/6, 2020 at 17:36

@PeterCordes Thanks. Isn't imm8 supposed to be 32 bits wide? I got the other index ranges in the functions _mm512_alignr_epi32 and _mm512_alignr_epi64. Taking only a partial range would change the number, no? I am sorry if I am being stupid. – Fusiform 15/6, 2020 at 17:42

imm8 is 8 bits wide (1 byte). That's what the 8 means; pretty much any time a name ends with a number (like rel8 or imm32), it's a bit width. – Saga 15/6, 2020 at 17:52

alignr_epi32 shifts in units of 32-bit elements, so you can express it as a bit-shift by count*32. And the number of bits it needs to look at is determined by the number of elements in the full vector. – Saga 15/6, 2020 at 17:54

C

10

Intel calls dst the return value of the instruction. Overall, that instruction does this:

inline std::array<int, 16> slli( std::array<int, 16> a, int imm )
{
    for( int& tmp : a )
        tmp = ( imm > 31 ) ? 0 : tmp << imm;
    return a;
}

Here’s my article: http://const.me/articles/simd/simd.pdf I hope a good introduction.

Cloddish answered 30/6, 2020 at 17:39 Comment(1)

Very well written article, thanks for sharing. – Hydrology 1/2 at 0:49

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.