Why does Clang optimize away x * 1.0 but NOT x + 0.0?
Asked Answered
W

2

133

Why does Clang optimize away the loop in this code

#include <time.h>
#include <stdio.h>

static size_t const N = 1 << 27;
static double arr[N] = { /* initialize to zero */ };

int main()
{
    clock_t const start = clock();
    for (int i = 0; i < N; ++i) { arr[i] *= 1.0; }
    printf("%u ms\n", (unsigned)(clock() - start) * 1000 / CLOCKS_PER_SEC);
}

but not the loop in this code?

#include <time.h>
#include <stdio.h>

static size_t const N = 1 << 27;
static double arr[N] = { /* initialize to zero */ };

int main()
{
    clock_t const start = clock();
    for (int i = 0; i < N; ++i) { arr[i] += 0.0; }
    printf("%u ms\n", (unsigned)(clock() - start) * 1000 / CLOCKS_PER_SEC);
}

(Tagging as both C and C++ because I would like to know if the answer is different for each.)

Wightman answered 22/10, 2015 at 3:38 Comment(9)
Which optimization flags are currently active?Alberta
@IwillnotexistIdonotexist: I just used -O3, I don't know how to check what that activates though.Wightman
It would be interesting to see what happens if you add -ffast-math to the command line.Asphyxia
static double arr[N] is not permitted in C; const variables do not count as constant expressions in that languageClout
@M.M: clang 3.6.1 compiles it just fine for me though.Wightman
@Mehrdad maybe a compiler bug; gcc gives an error even with the default optionsClout
[Insert snarky comment about how C is not C++, even though you already called it out.]Marlow
@Mehrdad or compiler extension. The code should be constexpr size_t N = 1 << 27;, with C++11.Pizzeria
Why does MSVS not optimize away +0?Medullary
A
174

The IEEE 754-2008 Standard for Floating-Point Arithmetic and the ISO/IEC 10967 Language Independent Arithmetic (LIA) Standard, Part 1 answer why this is so.

IEEE 754 § 6.3 The sign bit

When either an input or result is NaN, this standard does not interpret the sign of a NaN. Note, however, that operations on bit strings — copy, negate, abs, copySign — specify the sign bit of a NaN result, sometimes based upon the sign bit of a NaN operand. The logical predicate totalOrder is also affected by the sign bit of a NaN operand. For all other operations, this standard does not specify the sign bit of a NaN result, even when there is only one input NaN, or when the NaN is produced from an invalid operation.

When neither the inputs nor result are NaN, the sign of a product or quotient is the exclusive OR of the operands’ signs; the sign of a sum, or of a difference x − y regarded as a sum x + (−y), differs from at most one of the addends’ signs; and the sign of the result of conversions, the quantize operation, the roundTo-Integral operations, and the roundToIntegralExact (see 5.3.1) is the sign of the first or only operand. These rules shall apply even when operands or results are zero or infinite.

When the sum of two operands with opposite signs (or the difference of two operands with like signs) is exactly zero, the sign of that sum (or difference) shall be +0 in all rounding-direction attributes except roundTowardNegative; under that attribute, the sign of an exact zero sum (or difference) shall be −0. However, x + x = x − (−x) retains the same sign as x even when x is zero.

The Case of Addition

Under the default rounding mode (Round-to-Nearest, Ties-to-Even), we see that x+0.0 produces x, EXCEPT when x is -0.0: In that case we have a sum of two operands with opposite signs whose sum is zero, and §6.3 paragraph 3 rules this addition produces +0.0.

Since +0.0 is not bitwise identical to the original -0.0, and that -0.0 is a legitimate value that may occur as input, the compiler is obliged to put in the code that will transform potential negative zeros to +0.0.

The summary: Under the default rounding mode, in x+0.0, if x

  • is not -0.0, then x itself is an acceptable output value.
  • is -0.0, then the output value must be +0.0, which is not bitwise identical to -0.0.

The Case of Multiplication

Under the default rounding mode, no such problem occurs with x*1.0. If x:

  • is a (sub)normal number, x*1.0 == x always.
  • is +/- infinity, then the result is +/- infinity of the same sign.
  • is NaN, then according to

    IEEE 754 § 6.2.3 NaN Propagation

    An operation that propagates a NaN operand to its result and has a single NaN as an input should produce a NaN with the payload of the input NaN if representable in the destination format.

    which means that the exponent and mantissa (though not the sign) of NaN*1.0 are recommended to be unchanged from the input NaN. The sign is unspecified in accordance with §6.3p1 above, but an implementation may specify it to be identical to the source NaN.

  • is +/- 0.0, then the result is a 0 with its sign bit XORed with the sign bit of 1.0, in agreement with §6.3p2. Since the sign bit of 1.0 is 0, the output value is unchanged from the input. Thus, x*1.0 == x even when x is a (negative) zero.

The Case of Subtraction

Under the default rounding mode, the subtraction x-0.0 is also a no-op, because it is equivalent to x + (-0.0). If x is

  • is NaN, then §6.3p1 and §6.2.3 apply in much the same way as for addition and multiplication.
  • is +/- infinity, then the result is +/- infinity of the same sign.
  • is a (sub)normal number, x-0.0 == x always.
  • is -0.0, then by §6.3p2 we have "[...] the sign of a sum, or of a difference x − y regarded as a sum x + (−y), differs from at most one of the addends’ signs;". This forces us to assign -0.0 as the result of (-0.0) + (-0.0), because -0.0 differs in sign from none of the addends, while +0.0 differs in sign from two of the addends, in violation of this clause.
  • is +0.0, then this reduces to the addition case (+0.0) + (-0.0) considered above in The Case of Addition, which by §6.3p3 is ruled to give +0.0.

Since for all cases the input value is legal as the output, it is permissible to consider x-0.0 a no-op, and x == x-0.0 a tautology.

Value-Changing Optimizations

The IEEE 754-2008 Standard has the following interesting quote:

IEEE 754 § 10.4 Literal meaning and value-changing optimizations

[...]

The following value-changing transformations, among others, preserve the literal meaning of the source code:

  • Applying the identity property 0 + x when x is not zero and is not a signaling NaN and the result has the same exponent as x.
  • Applying the identity property 1 × x when x is not a signaling NaN and the result has the same exponent as x.
  • Changing the payload or sign bit of a quiet NaN.
  • [...]

Since all NaNs and all infinities share the same exponent, and the correctly rounded result of x+0.0 and x*1.0 for finite x has exactly the same magnitude as x, their exponent is the same.

sNaNs

Signaling NaNs are floating-point trap values; They are special NaN values whose use as a floating-point operand results in an invalid operation exception (SIGFPE). If a loop that triggers an exception were optimized out, the software would no longer behave the same.

However, as user2357112 points out in the comments, the C11 Standard explicitly leaves undefined the behaviour of signaling NaNs (sNaN), so the compiler is allowed to assume they do not occur, and thus that the exceptions that they raise also do not occur. The C++11 standard omits describing a behaviour for signaling NaNs, and thus also leaves it undefined.

Rounding Modes

In alternate rounding modes, the permissible optimizations may change. For instance, under Round-to-Negative-Infinity mode, the optimization x+0.0 -> x becomes permissible, but x-0.0 -> x becomes forbidden.

To prevent GCC from assuming default rounding modes and behaviours, the experimental flag -frounding-math can be passed to GCC.

Conclusion

Clang and GCC, even at -O3, remains IEEE-754 compliant. This means it must keep to the above rules of the IEEE-754 standard. x+0.0 is not bit-identical to x for all x under those rules, but x*1.0 may be chosen to be so: Namely, when we

  1. Obey the recommendation to pass unchanged the payload of x when it is a NaN.
  2. Leave the sign bit of a NaN result unchanged by * 1.0.
  3. Obey the order to XOR the sign bit during a quotient/product, when x is not a NaN.

To enable the IEEE-754-unsafe optimization (x+0.0) -> x, the flag -ffast-math needs to be passed to Clang or GCC.

Alberta answered 22/10, 2015 at 4:22 Comment(26)
Caveat: what if it is a signaling NaN? (I actually thought that might have been the reason somehow, but I didn't really know how, so I asked.)Wightman
@Mehrdad Good question. I suspect that, since IEEE requires by default that the invalid-operation-exception be masked, that there is no difference between the processing of a sNaN and qNaN. That being said, Clang must be relying on that fact and possibly also the function calling convention's guarantees on the state of the IEEE-754 control flags. Otherwise it will have to add a runtime check and dynamically decide to do it (exceptions unmasked) or not (exceptions masked).Alberta
@Mehrdad: Annex F, the (optional) part of the C standard that specifies C adherence to IEEE 754, explicitly does not cover signaling NaNs. (C11 F.2.1., first line: "This specification does not define the behavior of signaling NaNs.") Implementations that declare conformance to Annex F remain free to do what they want with signaling NaNs. The C++ standard has its own handling of IEEE 754, but whatever it is (I'm not familiar), I doubt it specifies signaling NaN behavior either.Consequently
@Consequently Excellent find!Alberta
@Mehrdad: sNaN invokes undefined behavior according to the standard (but it's probably well defined by the platform) so the compiler squashing here is allowed.Sharpwitted
@user2357112: I wonder what purpose is served by an expectation that singalling NaNs trap on all forms of usage, rather than merely on operations like comparisons or integer conversions which would otherwise lose their "NaN"-iness?Identity
@supercat: I've seen two proposed use cases for signaling NaNs. The first is as a trap value to fill uninitialized data structures, so if the NaN is used for anything, it's an error. The second is where you're encoding something into the payload and you want meaningful_nan + 1 or meaningful_nan + other_meaningful_nan to trigger your own handling instead of just preserving one of the payloads. In either case, the concern isn't limited to getting a non-NaN from a NaN.Consequently
This answer goes into a lot of detail about exactly why x += 0.0 isn't a NOOP and x *= 1.0 can be treated as one, but since the results of the loop aren't used, the whole thing could still be stripped out anyway. Changes like making arr local might let the optimizer realize the results are unused and remove the loop.Consequently
@user2357112: The possibility of error-trapping as a side-effect for otherwise unused calculations generally interferes with a lot of optimization; if the result of a calculation is sometimes ignored, a compiler might usefully defer the calculation until it can know whether the result will be used, but if the calculation would have produced an important signal, that can be bad.Identity
(I know nothing about these cases beyond what's written in your answer, but) It looks like the optimization would be valid for a roundiing-direction of roundTowardNegative. ​Sanies
@RickyDemer But wouldn't then +0.0 fall afoul of that optimization, for the same reason that -0.0 fails in roundTowardsPositive? (+0.0)+0.0 is an "exact zero sum", but roundTowardsNegative requires -0.0 for it.Alberta
@IwillnotexistIdonotexist : ​ No, since "x + x = x − (−x) retains the same sign as x even when x is zero." ​ ​ ​ ​​ ​ ​ ​ ​Sanies
@RickyDemer Nice catch! Then yes, the optimization appears safe for roundTowardsNegative mode, but that mode is very rare; Almost all applications leave it at round-to-nearest, ties-to-even.Alberta
@IwillnotexistIdonotexist : ​ ​ ​ I now realize the quote I gave only matters if roundTowardsNegative does not cause the string "0.0" to evaluate as "-0.0". ​ (Like I mentioned, I have no idea which way it works.) ​ ​ ​ ​ ​ ​ ​ ​Sanies
"If two or more inputs are NaN, then ..." - why is this relevant? At most one input could be NaN.Marlow
@immibis That was me being dumb and taking the paragraph below the one I wanted to copy-paste. I corrected it.Alberta
Oh look, a question that legitimately applies to both C and C++ that is accurately answered for both languages by a reference to a single standard. Will this make people less likely to complain about questions tagged with both C and C++, even when the question deals with a language commonality? Sadly, I think not.Clarinda
@Mehrdad I found an interesting section on value-changing optimizations in IEEE 754-2008 and edited it in for you.Alberta
I don't know if this was mentioned but x - 0 can be simplified to x. It's just x + 0 that is the problem.Balfore
Okay I think I get it. -0 - 0 is the difference of two values with different sign. "When the sum of two operands with opposite signs (or the difference of two operands with like signs) is exactly zero, the sign of that sum (or difference) shall be +0". Another way to say it is -0 - 0 = -0 + -0 which is the sum of two values with the same sign.Balfore
@Zboson From above: the sign of a sum, or of a difference x − y regarded as a sum x + (−y), differs from at most one of the addends’ signs; Assume x + (-0) -> x. Then case analysis gives: NaN unchanged, infinities unchanged, finite non-zeros unchanged. x = +0 in x + (-0) -> x has the result differ from at most one addend in sign when x passes through unchanged. x = -0 in x + (-0) -> x differs from no addend's sign when x passed through unchanged. All conditions satisfied, optimization valid.Alberta
This may interest you https://mcmap.net/q/23972/-gnu-c-native-vectors-how-to-broadcast-a-scalar-like-x86-39-s-_mm_set1_epi16. It took me a few iterations to get it right. First I used x+0 then I used (1 + 0)*x which worked but was unnecessarily complicated. And finally I used x-0. The simple solution.Balfore
@Zboson brilliant use of the trick to splat a scalar to vector!Alberta
When you say that x-0. is the same as x, it would be worth specifying very clearly that this is not true when rounding round, since 0.-0. is -0. in that case (and indeed gcc refuses to simplify if you use -frounding-math).Vania
@MarcGlisse rounding round -> Round to Negative Infinity Mode? Aside from that, the comments here did bring up that the permissibility of these optimizations may vary with rounding mode. I'll point out more clearly that this optimization assumes default rounding mode and that -frounding-math disallows that assumption (although, note that man gcc states this option to be "experimental").Alberta
Oups, I meant to write "rounding down", not "round"... -frounding-math does not work so well, but then neither does -ftrapping-math, which is enabled by default...Vania
C
38

x += 0.0 isn't a NOOP if x is -0.0. The optimizer could strip out the whole loop anyway since the results aren't used, though. In general, it's hard to tell why an optimizer makes the decisions it does.

Consequently answered 22/10, 2015 at 3:44 Comment(7)
I actually posted this after I had just read why x += 0.0 is not a no-op, yet I thought that probably isn't the reason because the entire loop should be optimized out either way. I can buy it, it's just not as entirely convincing as I was hoping...Wightman
Given the propensity for object-oriented languages to produce side-effects, I would imagine that it would be difficult to be sure the optimizer isn't changing actual behavior.Rohde
Could be the reason, since with long long the optimization is in effect (did it with gcc, which behaves the same for double at least)Slavery
@ringø: long long is an integral type, not an IEEE754 type.Expunction
@Expunction Yes, that's why the optimization is in effect (maybe my choice of words was not the best in my previous comment ?!)Slavery
What about x -= 0, is it the same?Denunciate
@ViktorMellgren: x -= 0 can be optimized away, because it's always a no-op, even when for negative zero. And is in practice optimized away by real compilers, as in: In IEEE 754, why does adding negative zero result in a no-op but adding positive zero does not?. (This assumes the rounding mode is the default. Most compilers assume GCC/clang -fno-rounding-math or equivalent.)Rosaline

© 2022 - 2024 — McMap. All rights reserved.