Does Java strictfp modifier have any effect on modern CPUs?

I know the meaning of the strictfp modifier on methods (and on classes), according to the JLS:

The effect of the strictfp modifier is to make all float or double expressions within the method body be explicitly FP-strict (§15.4).

JLS 15.4 FP-strict expressions:

Within an FP-strict expression, all intermediate values must be elements of the float value set or the double value set, implying that the results of all FP-strict expressions must be those predicted by IEEE 754 arithmetic on operands represented using single and double formats.

Within an expression that is not FP-strict, some leeway is granted for an implementation to use an extended exponent range to represent intermediate results; the net effect, roughly speaking, is that a calculation might produce "the correct answer" in situations where exclusive use of the float value set or double value set might result in overflow or underflow.

I've been trying to come up with a way to get an actual difference between an expression in a strictfp method and one that is not strictfp. I've tried this on two laptops, one with a Intel Core i3 CPU and one with an Intel Core i7 CPU. And I can't get any difference.

A lot of posts suggest that native floating point, not using strictfp, could be using 80-bit floating point numbers, and have extra representable numbers below the smallest possible java double (closest to zero) or above the highest possible 64-bit java double.

I tried this code below with and without a strictfp modifier and it gives exactly the same results.

public static strictfp void withStrictFp() {
    double v = Double.MAX_VALUE;
    System.out.println(v * 1.0000001 / 1.0000001);
    v = Double.MIN_VALUE;
    System.out.println(v / 2 * 2);
}

Actually, I assume that any difference would only show up when the code is compiled to assembly so I am running it with the -Xcomp JVM argument. But no difference.

I found another post explaining how you can get the assembly code generated by HotSpot (OpenJDK documentation). I'm running my code with java -Xcomp -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly. The first expression (v * 1.0000001 / 1.0000001) with the strictfp modifier, and also the same without it, is compiled to:

  0x000000010f10a0a9: movsd  -0xb1(%rip),%xmm0        # 0x000000010f10a000
                                                ;   {section_word}
  0x000000010f10a0b1: mulsd  -0xb1(%rip),%xmm0        # 0x000000010f10a008
                                                ;   {section_word}
  0x000000010f10a0b9: divsd  -0xb1(%rip),%xmm0        # 0x000000010f10a010
                                                ;   {section_word}

There is nothing in that code that truncates the result of each step to 64 bits like I had expected. Looking up the documentation of movsd, mulsd and divsd, they all mention that these (SSE) instructions operate on 64-bit floating point values, not 80-bit values as I expected. So it seems logical that the double value-set that these instructions operate on is already the IEEE 754 value set, so there would be no difference between having strictfp and not having it.

My questions are:

Is this analysis correct? I don't use Intel assembly very often so I'm not confident of my conclusion.
Is there any (other) modern CPU architecture (that has a JVM) for which there is a difference between operation with and without the strictfp modifier?

If by “modern” you mean processors supporting the sort of SSE2 instructions that you quote in your question as produced by your compiler (mulsd, …), then the answer is no, strictfp does not make a difference, because the instruction set does not allow to take advantage of the absence of strictfp. The available instructions are already optimal to compute to the precise specifications of strictfp. In other words, on that kind of modern CPU, you get strictfp semantics all the time for the same price.

If by “modern” you mean the historical 387 FPU, then it is possible to observe a difference if an intermediate computation would overflow or underflow in strictfp mode (the difference being that it may not overflow or, on underflow, keep more precision bits than expected).

A typical strictfp computation compiled for the 387 will look like the assembly in this answer, with well-placed multiplications by well-chosen powers of two to make underflow behave the same as in IEEE 754 binary64. A round-trip of the result through a 64-bit memory location then takes care of overflows.

The same computation compiled without strictfp would produce one 387 instruction per basic operation, for instance just the multiplication instruction fmulp for a source-level multiplication. (The 387 would have been configured to use the same significand width as binary64, 53 bits, at the beginning of the program.)

Recommended topics

Hot tags