Well, I've just been searching for the answer to the same question, and also with no success. So I can only be guessing.
Intel introduced packed and scalar instructions already in their MMX technology. For example, they introduced a function
__m64 _mm_add_pi8 (__m64 a, __m64 b)
At that time there was no such a thing as "extended packed". The only data type was __m64
and all operations worked on integers.
With SSE there came 128-bit registers and operations on floating point numbers. However, SSE2 included a superset of MMX operations on integers performed in 128-bit registers. For example,
__m128i _mm_add_epi8 (__m128i a, __m128i b)
Here for the first time we see the "ep" (extended packed") part of the function name. Why it was introduced? I believe this was a solution to the problem of the name _mm_add_pi8
being already taken by the MMX instruction listed above. The interface of SSE/AVX is in the C language, where there's no polymorphism of function names.
With AVX, Intel chose a different strategy, and started to add the register length just after the opening "_mm" letters, c.f.:
__m256i _mm256_add_epi8 (__m256i a, __m256i b)
__m512i _mm512_add_epi8 (__m512i a, __m512i b)
Why they here chose "ep" and not "p" is a mystery, irrelevant for programmers. Actually, they seem to use "p" for operations on floats and doubles and "ep" for integers.
__m128d _mm_add_pd (__m128d a, __m128d b); // "d": function operates on doubles
__m256 _mm256_add_ps (__m256 a, __m256 b); // "s": function operates on floats
Perhaps this goes back to the transition from MMX to SSE, where "ep" was introduced for operations on integers (no floats were handled by MMX) and an attempt to make AVX mnemonics as close to the SSE ones as possible
Thus, basically, from the perspective of a programmer, there's no difference between "ep" ("extended packed") and "p" ("packed"), for we are already aware of the register length that we target in our code.
As for the next part of the question, "unpacking" belongs to a completely different category of notions than "scalar" and "packed". This is rather a colloquial term for a particular data rearrangement or shuffle, like rotation or shift.
The reason for using "epi" in the name of intrinsics like _mm256_unpackhi_epi16
is that it is a truly vector (not scalar) function on a vector of 16-bit integer elements. Notice that here "unpack" belongs to the part of the function name that describe its action (like mul, add, or permute), whereas "s" / "p" / "ep" (scalar, packed, extended packed) belong to the part describing the operation mode (scalar for "s", vector for "p" or "ep").
(There are no scalar-integer instructions that operate between two XMM registers, but "si" does appear in the intrinsic name for movd eax, xmm0
: _mm_cvtsi128_si32
. There are a few similar intrinsics.)
addps xmm0, xmm1
. "unpacked" isn't a normal way to describe a SIMD data format; can you give a specific context where you saw this used? Unpacking is something you can do with data, e.g. widen each element, or in the case of SSE instructions, interleave elements from 2 vectors. IDK why that's called "unpack". – Hickspmovsxwd
or whatever: "Sign extend packed 16-bit integers in a to packed 32-bit integers". Extend is a verb, the operation being performed, not part of the description of the storage format. A search for "Extended packed" doesn't find any hits in that guide, which is good because it sounds meaningless. – Hicks