Is SSE2 signed integer overflow undefined?
Asked Answered
D

2

6

Signed integer overflow is undefined in C and C++. But what about signed integer overflow within the individual fields of an __m128i? In other words, is this behavior defined in the Intel standards?

#include <inttypes.h>
#include <stdio.h>
#include <stdint.h>
#include <emmintrin.h>

union SSE2
{
    __m128i m_vector;
    uint32_t m_dwords[sizeof(__m128i) / sizeof(uint32_t)];
};

int main()
{
    union SSE2 reg = {_mm_set_epi32(INT32_MAX, INT32_MAX, INT32_MAX, INT32_MAX)};
    reg.m_vector = _mm_add_epi32(reg.m_vector, _mm_set_epi32(1, 1, 1, 1));

    printf("%08" PRIX32 "\n", (uint32_t) reg.m_dwords[0]);
    return 0;
}
[myria@polaris tests]$ gcc -m64 -msse2 -std=c11 -O3 sse2defined.c -o sse2defined
[myria@polaris tests]$ ./sse2defined
80000000

Note that the 4-byte-sized fields of an SSE2 __m128i are considered signed.

Degraded answered 22/10, 2014 at 21:2 Comment(12)
This is a very good question!Bouchard
In practice these elements behave as expected, i.e. normal 2s complement wraparound etc. I don't know if you'll find anything in the Intel docs to guarantee this though.Indo
An SSE2 __mi128i is (obviously) an architecture specific concept so the C and C++ standards say nothing about the type or its behaviour or the intrinsic. You need to look at the "vendor" documentation for any guarantees above and beyond what the standards give.Yukikoyukio
@PaulR appendix C (intrinsics) lists it as "equivalent to paddd" (which wraps), I'm not sure how much of a guarantee that's supposed to give thoughBathometer
Intel® 64 and IA-32 Architectures Software Developer’s Manual Table 9-2 lists it as Wrap-around, but I can't find any explicit guarantees that _mm_add_epi32 will never be emulated by simple C wrappers.Mctyre
@harold: THere is no appendix C, did you mean annex c (informative, sequence points)?Fluting
@Fluting are you looking in the C standard? Of course there's nothing about SSE in there.Bathometer
@CharlesBailey: Yes; the question is therefore how Intel defined the spec.Degraded
@Degraded Intel defined paddd to wrap.Bathometer
@Myria, please edit your question title accordingly.Sonny
@thatotherguy: I don't know what you think gives a software emulation of _mm_add_epi32() license to give different results from PADDD.Farnsworth
@BenVoigt No license is required unless there's a formal specification somewhere. I do agree that not having it behave identically would be a low blow, though. The responsible way to do it would be to not define this function and provide a similar alternative.Mctyre
D
8
  1. You are asking about a specific implementation issue (using SSE2) and not about the standard. You've answered your own question "signed integer overflow is undefined in C".

  2. When you are dealing with c intrinsics you aren't even programming in C! These are inserting assembly instructions in line. It is doing it in a some what portable way, but it is no longer true that your data is a signed integer. It is a vector type being passed to an SSE intrinsic. YOU are then casting that to an integer and telling C that you want to see the result of that operation. Whatever bytes happen to be there when you cast is what you will see and has nothing to do with signed arithmetic in the C standard.

Things are a bit different if the compiler inserts SSE instructions (say in a loop). Now the compiler is guaranteeing that the result is the same as a signed 32 bit operation ... UNLESS there is undefined behaviour (e.g. an overflow) in which case it can do whatever it likes.

Note also that undefined doesn't mean unexpected ... whatever behaviour your observe for auto-vectorization might be consistent and repeatable (maybe it does always wrap on your machine ... that might not be true with all cases for surrounding code, or all compilers. Or if the compiler selects different instructions depending on availability of SSSE3, SSE4, or AVX*, possibly not even all processors if it makes different code-gen choices for different instruction-sets that do or don't take advantage of signed overflow being UB).

EDIT:

Okay, well now that we are asking about "the Intel standards" (which don't exist, I think you mean the x86 standards), I can add something to my answer. Things are a little bit convoluted.

Firstly, the intrinsic _mm_add_epi32 is defined by Microsoft to match Intel's intrinsics API definition (https://software.intel.com/sites/landingpage/IntrinsicsGuide/ and the intrinsic notes in Intel's x86 assembly manuals). They cleverly define it as doing to a __m128i the same thing the x86 PADDD instruction does to an XMM register, with no more discussion (e.g. is it a compile error on ARM or should it be emulated?).

Secondly, PADDD isn't only a signed addition! It is a 32 bit binary add. x86 uses two's complement for signed integers, and adding them is the same binary operation as unsigned base 2. So yes, paddd is guaranteed to wrap. There is a good reference for all the x86 instructions here.

So what does that mean: again, the assumption in your question is flawed because there isn't even any overflow. So the output you see should be defined behaviour. Note that it is defined by Microsoft and x86 (not by the C Standard).

Other x86 compilers also implement Intel's intrinsics API the same way, so _mm_add_epi32 is portably guaranteed to just wrap.

Disappear answered 22/10, 2014 at 21:42 Comment(4)
Well, yes; the question is how Intel defined the standard, and whether ICC, GCC, Clang, MSVC, etc. follow the Intel standard. It's not literally a C Standard question.Degraded
In that case you might want to edit your question: you are asking about the x86 assembly instruction set and whether SSE overflows are defined.Disappear
@Degraded Intel did not define any standard. They just defined how their CPUs behaveHawley
From Intel's SSE4 manual: "Intel C/C++ Compiler Intrinsic Equivalent PBLENDVB __m128i _mm_blendv_epi8 (__m12 8i v1, __m128i v2, __m128i mask);"Degraded
F
3

This isn't "signed integer overflow within the fields of an __m128i". This is a function call. (Being a compiler intrinsic is just an optimization, much like inlining, and that doesn't interact with the C standard as long as the as-if rule is respected)

Its behavior must follow the contract (preconditions, postconditions) that the function developer documented. Usually intrinsics are documented by the compiler vendor, although they tend to coordinate the naming and contract of intrinsics to aid in porting code.

Farnsworth answered 22/10, 2014 at 23:21 Comment(1)
Re portability: In the case of _mm... intrinsics, they're defined by Intel for ICC ( software.intel.com/sites/landingpage/IntrinsicsGuide), and implemented in a compatible way by MSVC, GCC, clang, and a few other less mainstream x86 compilers so Intel's documentation applies. (A few compilers are sometimes missing some version of an _mm256_setr_m128 or something, or alternate names like _mm_bslli_si128 for byte-shift pslldq, but the intrinsics that map to a single instruction are very portable.)Zaccaria

© 2022 - 2024 — McMap. All rights reserved.