You are asking about a specific implementation issue (using SSE2) and not about the standard. You've answered your own question "signed integer overflow is undefined in C".
When you are dealing with c intrinsics you aren't even programming in C! These are inserting assembly instructions in line. It is doing it in a some what portable way, but it is no longer true that your data is a signed integer. It is a vector type being passed to an SSE intrinsic. YOU are then casting that to an integer and telling C that you want to see the result of that operation. Whatever bytes happen to be there when you cast is what you will see and has nothing to do with signed arithmetic in the C standard.
Things are a bit different if the compiler inserts SSE instructions (say in a loop). Now the compiler is guaranteeing that the result is the same as a signed 32 bit operation ... UNLESS there is undefined behaviour (e.g. an overflow) in which case it can do whatever it likes.
Note also that undefined doesn't mean unexpected ... whatever behaviour your observe for auto-vectorization might be consistent and repeatable (maybe it does always wrap on your machine ... that might not be true with all cases for surrounding code, or all compilers. Or if the compiler selects different instructions depending on availability of SSSE3, SSE4, or AVX*, possibly not even all processors if it makes different code-gen choices for different instruction-sets that do or don't take advantage of signed overflow being UB).
EDIT:
Okay, well now that we are asking about "the Intel standards" (which don't exist, I think you mean the x86 standards), I can add something to my answer. Things are a little bit convoluted.
Firstly, the intrinsic _mm_add_epi32 is defined by Microsoft to match Intel's intrinsics API definition (https://software.intel.com/sites/landingpage/IntrinsicsGuide/ and the intrinsic notes in Intel's x86 assembly manuals). They cleverly define it as doing to a __m128i
the same thing the x86 PADDD
instruction does to an XMM register, with no more discussion (e.g. is it a compile error on ARM or should it be emulated?).
Secondly, PADDD isn't only a signed addition! It is a 32 bit binary add. x86 uses two's complement for signed integers, and adding them is the same binary operation as unsigned base 2. So yes, paddd
is guaranteed to wrap. There is a good reference for all the x86 instructions here.
So what does that mean: again, the assumption in your question is flawed because there isn't even any overflow. So the output you see should be defined behaviour. Note that it is defined by Microsoft and x86 (not by the C Standard).
Other x86 compilers also implement Intel's intrinsics API the same way, so _mm_add_epi32
is portably guaranteed to just wrap.
__mi128i
is (obviously) an architecture specific concept so the C and C++ standards say nothing about the type or its behaviour or the intrinsic. You need to look at the "vendor" documentation for any guarantees above and beyond what the standards give. – Yukikoyukiopaddd
" (which wraps), I'm not sure how much of a guarantee that's supposed to give though – Bathometer_mm_add_epi32
will never be emulated by simple C wrappers. – Mctyrepaddd
to wrap. – Bathometer_mm_add_epi32()
license to give different results fromPADDD
. – Farnsworth