What is packed and unpacked and extended packed data
Asked Answered
I

1

12

I have been going through Intel Intrinsics and every function is working on integers or floats or double that are packed or unpacked or extended packed.

It seems like this question should be answered some where on the internet but I can't find the answer at all.

What is that packing thing?

Incarnadine answered 29/10, 2020 at 23:21 Comment(5)
packed normally just means that you have 4 floats in one 16-byte vector, like addps xmm0, xmm1. "unpacked" isn't a normal way to describe a SIMD data format; can you give a specific context where you saw this used? Unpacking is something you can do with data, e.g. widen each element, or in the case of SSE instructions, interleave elements from 2 vectors. IDK why that's called "unpack".Hicks
packed as in "packed together in a single register". extended packed seems to mean "extended to work with packed integers". "unpacked" is, IDK, maybe working with just the scalar or considering the register as a wholeMidkiff
Here is what I mean by unpacking. I guess it is like what @PeterCordes means.Incarnadine
thank y'all I understand them now. I guess you should answer the question because I can't seem to find it all on stackoverflow.Incarnadine
@MargaretBloom: I'm guessing "extended packed" is just a misinterpretation of pmovsxwd or whatever: "Sign extend packed 16-bit integers in a to packed 32-bit integers". Extend is a verb, the operation being performed, not part of the description of the storage format. A search for "Extended packed" doesn't find any hits in that guide, which is good because it sounds meaningless.Hicks
J
8

Well, I've just been searching for the answer to the same question, and also with no success. So I can only be guessing.

Intel introduced packed and scalar instructions already in their MMX technology. For example, they introduced a function

__m64 _mm_add_pi8 (__m64 a, __m64 b)

At that time there was no such a thing as "extended packed". The only data type was __m64 and all operations worked on integers. With SSE there came 128-bit registers and operations on floating point numbers. However, SSE2 included a superset of MMX operations on integers performed in 128-bit registers. For example,

__m128i _mm_add_epi8 (__m128i a, __m128i b)

Here for the first time we see the "ep" (extended packed") part of the function name. Why it was introduced? I believe this was a solution to the problem of the name _mm_add_pi8 being already taken by the MMX instruction listed above. The interface of SSE/AVX is in the C language, where there's no polymorphism of function names.

With AVX, Intel chose a different strategy, and started to add the register length just after the opening "_mm" letters, c.f.:

__m256i _mm256_add_epi8 (__m256i a, __m256i b)
__m512i _mm512_add_epi8 (__m512i a, __m512i b)

Why they here chose "ep" and not "p" is a mystery, irrelevant for programmers. Actually, they seem to use "p" for operations on floats and doubles and "ep" for integers.

__m128d _mm_add_pd (__m128d a, __m128d b); // "d": function operates on doubles
__m256 _mm256_add_ps (__m256 a, __m256 b); // "s": function operates on floats

Perhaps this goes back to the transition from MMX to SSE, where "ep" was introduced for operations on integers (no floats were handled by MMX) and an attempt to make AVX mnemonics as close to the SSE ones as possible

Thus, basically, from the perspective of a programmer, there's no difference between "ep" ("extended packed") and "p" ("packed"), for we are already aware of the register length that we target in our code.


As for the next part of the question, "unpacking" belongs to a completely different category of notions than "scalar" and "packed". This is rather a colloquial term for a particular data rearrangement or shuffle, like rotation or shift.

The reason for using "epi" in the name of intrinsics like _mm256_unpackhi_epi16 is that it is a truly vector (not scalar) function on a vector of 16-bit integer elements. Notice that here "unpack" belongs to the part of the function name that describe its action (like mul, add, or permute), whereas "s" / "p" / "ep" (scalar, packed, extended packed) belong to the part describing the operation mode (scalar for "s", vector for "p" or "ep").

(There are no scalar-integer instructions that operate between two XMM registers, but "si" does appear in the intrinsic name for movd eax, xmm0: _mm_cvtsi128_si32. There are a few similar intrinsics.)

Jamilla answered 12/1, 2021 at 13:41 Comment(3)
If you're right about that the question's reference to "extended packed" was intrinsic names, then yes, good explanation of the history, I agree with your conclusions / guesses about Intel's naming choices. (Including the fact that SSE _ps and SSE2 _pd intrinsics weren't "extended" from any other FP vector width.)Hicks
@PeterCordes Thanks for editing. I'm truly amazed that so many people volunteer to be so helpful.Jamilla
Intel's own brief docs on the naming "ep" vs. "p": intel.com/content/www/us/en/develop/documentation/… - doesn't actually say anything about "ep" being for SSE integer vs. "p" being for MMX integer, though. Related Q&A: What are the names and meanings of the intrinsic vector element types, like epi64x or pi32?Hicks

© 2022 - 2024 — McMap. All rights reserved.