Does Java strictfp modifier have any effect on modern CPUs?
Asked Answered
E

1

15

I know the meaning of the strictfp modifier on methods (and on classes), according to the JLS:

JLS 8.4.3.5, strictfp methods:

The effect of the strictfp modifier is to make all float or double expressions within the method body be explicitly FP-strict (§15.4).

JLS 15.4 FP-strict expressions:

Within an FP-strict expression, all intermediate values must be elements of the float value set or the double value set, implying that the results of all FP-strict expressions must be those predicted by IEEE 754 arithmetic on operands represented using single and double formats.

Within an expression that is not FP-strict, some leeway is granted for an implementation to use an extended exponent range to represent intermediate results; the net effect, roughly speaking, is that a calculation might produce "the correct answer" in situations where exclusive use of the float value set or double value set might result in overflow or underflow.

I've been trying to come up with a way to get an actual difference between an expression in a strictfp method and one that is not strictfp. I've tried this on two laptops, one with a Intel Core i3 CPU and one with an Intel Core i7 CPU. And I can't get any difference.

A lot of posts suggest that native floating point, not using strictfp, could be using 80-bit floating point numbers, and have extra representable numbers below the smallest possible java double (closest to zero) or above the highest possible 64-bit java double.

I tried this code below with and without a strictfp modifier and it gives exactly the same results.

public static strictfp void withStrictFp() {
    double v = Double.MAX_VALUE;
    System.out.println(v * 1.0000001 / 1.0000001);
    v = Double.MIN_VALUE;
    System.out.println(v / 2 * 2);
}

Actually, I assume that any difference would only show up when the code is compiled to assembly so I am running it with the -Xcomp JVM argument. But no difference.

I found another post explaining how you can get the assembly code generated by HotSpot (OpenJDK documentation). I'm running my code with java -Xcomp -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly. The first expression (v * 1.0000001 / 1.0000001) with the strictfp modifier, and also the same without it, is compiled to:

  0x000000010f10a0a9: movsd  -0xb1(%rip),%xmm0        # 0x000000010f10a000
                                                ;   {section_word}
  0x000000010f10a0b1: mulsd  -0xb1(%rip),%xmm0        # 0x000000010f10a008
                                                ;   {section_word}
  0x000000010f10a0b9: divsd  -0xb1(%rip),%xmm0        # 0x000000010f10a010
                                                ;   {section_word}

There is nothing in that code that truncates the result of each step to 64 bits like I had expected. Looking up the documentation of movsd, mulsd and divsd, they all mention that these (SSE) instructions operate on 64-bit floating point values, not 80-bit values as I expected. So it seems logical that the double value-set that these instructions operate on is already the IEEE 754 value set, so there would be no difference between having strictfp and not having it.

My questions are:

  1. Is this analysis correct? I don't use Intel assembly very often so I'm not confident of my conclusion.
  2. Is there any (other) modern CPU architecture (that has a JVM) for which there is a difference between operation with and without the strictfp modifier?
Emmaemmalee answered 21/3, 2014 at 15:10 Comment(10)
Note that the classic x87 FPU is still available, and that would use 80 bits precision. Just that the compiler didn't choose to use it here. Maybe if you tried something fancier, that doesn't have a simple SSE counterpart, like sin() then it would use the FPU.Tarsus
The SSE vector units always use 64- or 32-bit precision (AVX(?) adds 16-bit floats). So that's strictfp implicitly. But you might get a difference for code that could use FMA(fused-multiply-add)-instructions. Those avoid intermediate rounding errors, and could thus possibly be disabled for strictfp.Carduaceous
But what Java expression will result in such code? I've search everything on SO and on the web that I could find, and nowhere that I could find there's an example that triggers a difference in evaluation between strictfp and not having it.Emmaemmalee
@ErwinBolwidt: Well, first you have to have a machine capable of FMA, (~Sandy bridge era?) then a compiler that knows about it, compiler flags that allow the compiler to use them (i.e. target the machine with FMA, not a generic x86-64). Then find documentation on FMA, construct a possible use case (like vectresut[n] = vecta[n]*vectb[n]+vectc[n]) and try looking whether strictfp makes a difference.Carduaceous
Using the 32 bit java compiler it should be possible to disable SSE support and thus force FPU.Tarsus
@EOF Bulldozer for AMD and Haswell for Intel. The latter of which is still pretty new.Malapropism
@Tarsus I do not think there is any provision in the Java language definition for calling the crappy 387 sin function.Forecastle
@EOF The wording of the paragraph quoted in the question does not allow to replace a multiplication and an addition to be replaced by an FMA. Only an extended exponent range is allowed. With general arguments, the effect of the FMA is not limited to allowing an extended exponent range for the intermediate result, but also extended precision. Consequently, the compiler is not allowed to generate FMA for a general multiplication+addition, even without strictfp.Forecastle
@PascalCuoq my java version happily used FSIN but then it went on to use SSE (because on 64 bit that's the usual thing). I'll try sin(cos(x)) :)Tarsus
@PascalCuoq: Well, in that case, the x87-extended precision floating-point calculations are right out as well: 80-bit extended precision has 15 bits of exponent and 64 bits of mantissa, while a standard 64-bit double has 11 bits of exponent and 53 bits of mantissaCarduaceous
F
11

If by “modern” you mean processors supporting the sort of SSE2 instructions that you quote in your question as produced by your compiler (mulsd, …), then the answer is no, strictfp does not make a difference, because the instruction set does not allow to take advantage of the absence of strictfp. The available instructions are already optimal to compute to the precise specifications of strictfp. In other words, on that kind of modern CPU, you get strictfp semantics all the time for the same price.

If by “modern” you mean the historical 387 FPU, then it is possible to observe a difference if an intermediate computation would overflow or underflow in strictfp mode (the difference being that it may not overflow or, on underflow, keep more precision bits than expected).

A typical strictfp computation compiled for the 387 will look like the assembly in this answer, with well-placed multiplications by well-chosen powers of two to make underflow behave the same as in IEEE 754 binary64. A round-trip of the result through a 64-bit memory location then takes care of overflows.

The same computation compiled without strictfp would produce one 387 instruction per basic operation, for instance just the multiplication instruction fmulp for a source-level multiplication. (The 387 would have been configured to use the same significand width as binary64, 53 bits, at the beginning of the program.)

Forecastle answered 22/3, 2014 at 0:32 Comment(5)
I saw the SSE was introduced in 1999 and SSE2 in 2001. And my 2009 MacBookPro has the Core i7. That's what I had in mind when I said "modern". But I'm not sure if other CPU manufacturers have evolved in the same way.Emmaemmalee
If most CPU's the were produced in the last five years produce the same result (and are just as fast) with or without strictfp, it starts to sound like a good Java practice to mark all your classes that do floating point arithmetic with strictfp. Especially if the application is distributed to a wider audience (rather than run on the server); like an open source library or a desktop app. The reduce the chances of bugs on platforms that you haven't been able to test on, and improves WORA-ness, without penalty on the great majority of computers.Emmaemmalee
@ErwinBolwidt I have spent a significant chunk of time on the floating-point part of a static analyzer for a programming language that isn't as well-defined as Java in that regard, and I cannot tell you how happy it makes me to find someone at last who thinks like this.Forecastle
If i understand this correctly, with modern cpu's(past five years) even without strictfp the intermediate fp calculations are not double-extended-exponent in java because the machine code is already optimal and behaves like strictfp itself? This is like a gordian knot to me :)Scholastic
Actually, strictfp will be the default again in Java 17, so there will be absolutely no difference anymore.Lahdidah

© 2022 - 2024 — McMap. All rights reserved.