Why (n mod const) is faster than (const mod n)?

Asked 6/9, 2016 at 19:3 Answered 6/9, 2016 at 19:25

While playing with jmh I came across a weird thing I cannot explain.

@BenchmarkMode(Mode.SingleShotTime)
@Measurement(iterations = 10, batchSize = Integer.MAX_VALUE)
@Warmup(iterations = 5, batchSize = Integer.MAX_VALUE)
@State(Scope.Thread)
public class Tests {
    private int value;

    @Setup(Level.Iteration)
    public void setUp() {
        value = 1230;
    }

    @Benchmark
    public boolean testConstModN() {
        return 12345 % value == 0;
    }

    @Benchmark
    public boolean testNModConst() {
       return value % 12345 == 0;
   }
}

The results are below

Benchmark                  Mode  Cnt   Score   Error  Units
Tests.testConstModN          ss   10  10.789 ± 0.305   s/op
Tests.testNModConst          ss   10   7.550 ± 0.067   s/op

I am running on JDK 1.8.0_101, VM 25.101-b13, Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz (family: 0x6, model: 0x3c, stepping: 0x3)

If I set the const equal to the value or if I set the value to 0xffffffff, nothing changes.

Emulous answered 6/9, 2016 at 19:3 Comment(6)

Try to run a CONST % number while they are equals and show me the results please :) – Apulia 6/9, 2016 at 19:10

What happens if you set value to 12345 and use 1230 as your constant? – Appetizing 6/9, 2016 at 19:12

I added the additional information to the question. It does not matter what the values of const and value are. – Emulous 6/9, 2016 at 19:23

If you're on Linux, try using -prof perfasm to dump the actual assembly generated for the two functions. I suspect that testNModConst has been optimized to replace the division with a multiplication. – Constructivism 6/9, 2016 at 19:27

If you do your tests in the other order, does that change? Also, see the Constant Folding section in java-performance.info/jmh: value is the same every time, and the JVM may notice this and opportunistically do a if (value == 12345) return the_answer as the first operation, or something like that. That doc suggests techniques to avoid this. (IDK, I haven't used JMH, I'm seeing this question because of the x86-64 tag. If you can show the actual JIT-compiled asm that runs for each version, I can tell you how many cycles it should take on your Haswell CPU.) – Gigantic 6/9, 2016 at 19:33

A related question: why can I work out (n mod 1000000) faster than I can work out (1000000 mod n)? This doesn't exactly address your question, but perhaps it shows that you might not necessarily expect an operation to still be as easy if you switch the operands. – Gilbye 6/9, 2016 at 19:37

The CPU's DIV and MOD instructions are very expensive, costing 50 cycles or more. When you use a variable divisor, using these instructions is unavoidable. However, when you use a constant divisor, this can be compiled into a short sequence of much cheaper instructions such as MUL, ADD, and SHR.

Hacker's delight, chapter 10 covers several algorithms that solve this problem.

Pomology answered 6/9, 2016 at 19:25 Comment(6)

Specifically the following strength reduction is probably at play: gmplib.org/~tege/divcnst-pldi94.pdf – Constructivism 6/9, 2016 at 19:26

More likely a modular multiplicative inverse (multiply by a big magic constant, and use the upper half) – Gigantic 6/9, 2016 at 19:28

This can be resolved by inspecting the actual machine code emitted from the JIT compiler. I did it once before, maybe I can find that answer... – Pomology 6/9, 2016 at 19:29

Here it is: #31373065 – Pomology 6/9, 2016 at 19:32

My favourite treatment is ridiculousfish.com/blog/posts/… – Constructivism 6/9, 2016 at 19:38

You can see what a C compiler does with the two cases for the OP's constant of 12345 on the Godbolt compiler explorer. (clang3.8.1 -O3 -mtune=haswell). gcc does essentially the same thing with the same multiplicative modular inverse, but uses the implicit-operand form of (imul r32) to produce the high half of the product in edx, instead of imul r64, r64, imm32 + shift to get the high half of the product in eax. – Gigantic 6/9, 2016 at 19:50

Beware this answer it's only intuition

It's because of the nature of the % operator

In testNModConst() 1230 is less than 12345, so modulus only needs to internally check their size and realize that 12345 is bigger (one operation)

In testConstModN() 12345 is greater than 1230, so modulus will have to do a series of operations (subtraction) to calculate the answer.

It depends on the size of the integer relative to the constant

Lawsuit answered 6/9, 2016 at 19:11 Comment(3)

Good explanation. Would be interesting to see the results in a native C or assembly environment. – Bifoliolate 6/9, 2016 at 19:14

Good guess. Unfortunately, it does not depend on the values of n and const. – Emulous 6/9, 2016 at 19:25

This answer does not really reflect how modulo operations are performed on a hardware level. – Constructivism 6/9, 2016 at 19:31

Recommended topics

Hot tags