Why is the short primitive type significantly slower than long or int?
Asked Answered
D

2

7

I tried to optimize the RAM usage of a Android game by changing int primitives to shorts. Before I did this I was interested in the performance of the primitive types in Java.

So I created this little test benchmark using the caliper library.

public class BenchmarkTypes extends Benchmark {

    @Param("10") private long testLong;
    @Param("10") private int testInt;
    @Param("10") private short testShort;


    @Param("5000") private long resultLong = 5000;
    @Param("5000") private int resultInt = 5000;
    @Param("5000") private short resultShort = 5000;

    @Override
    protected void setUp() throws Exception {
        Random rand = new Random();

        testShort = (short) rand.nextInt(1000);
        testInt = (int) testShort;
        testLong = (long) testShort;
    }

    public long timeLong(int reps){
        for(int i = 0; i < reps; i++){
            resultLong += testLong;
            resultLong -= testLong;         
        }
        return resultLong;
    }

    public int timeInt(int reps){
        for(int i = 0; i < reps; i++){
            resultInt += testInt;
            resultInt -= testInt;           
        }
        return resultInt;
    }

    public short timeShort(int reps){
        for(int i = 0; i < reps; i++){
            resultShort += testShort;
            resultShort -= testShort;
        }
        return resultShort;
    }
}

The results of the test surprised me.

Test circumstances

Benchmark run under the Caliper library.

Test results

https://microbenchmarks.appspot.com/runs/0c9bd212-feeb-4f8f-896c-e027b85dfe3b

Int 2.365 ns

Long 2.436 ns

Short 8.156 ns

Test conclusion?

The short primitive type is significantly slower (3-4~ times) than the long and int primitive type?

Question

  1. Why is the short primitive significantly slower than int or long? I would expect the int primitive type to be the fastest on a 32bit VM and the long and short to be equal in time or the short to be even faster.

  2. Is this also the case on Android phones? Knowing that Android phones in general run in a 32bit environment and now the days more and more phones start to ship with 64bit processors.

Demodena answered 19/6, 2014 at 10:31 Comment(14)
You haven't warmed the JIT. You haven't done enough iterations. This isn't how you microbench Java.Uis
It is (most likely) caused by Java converting short (every time) to int (or long) for arithmetic operationsOrvas
@GermannArlington - No. The real explanation for the 1000x difference in the timings is that the benchmark is incorrectly written. See the linked Q&A.Simba
True, no real benchmark here! But 1000 times slower? Would creating a good benchmark really make such a great difference?Weightlessness
Rolf - read the first comment on the Answer you linked to. The person who wrote that answer probably made the same mistake that you did.Simba
Rolf - It certainly could do. Try rewriting your benchmark following the recommendations and see what you get.Simba
Since your benchmark is flawed, your "result" has no validity, and there is not really much point trying to explain it ... as if it was a real result. (And it is not just slightly invalid. A 1000x slow down is simply implausible.) While this Question is not strictly a duplicate of the other Question, the other Question's answers are the best response to this one.Simba
If you updated the Question with a valid benchmark and valid results, it might be worth reopening ...Simba
Here's proof (ish) - https://mcmap.net/q/153221/-in-java-is-it-more-efficient-to-use-byte-or-short-instead-of-int-and-float-instead-of-double. The problem is not the environment. It is that your benchmark doesn't resemble what a real application would do ... and what a typical JVM is designed to run efficiently. What is most likely happening is that the "slow" part of your benchmark is distorted by JIT compilation. You are actually measuring the time taken to JIT compile and optimize the code!Simba
Updated the question with new test results using the java caliper library.Weightlessness
Note that what you're doing is a no-op and a smart enough JIT could blow it all away. This is easy to fix (I'd suggest using += alone or combined with ^=). @StephenC The current close reason is no more right, so I'd suggest reopening (though it might gets closed as a duplicate of the question you linked).Scoutmaster
@RolfSmit It's better to show that you've tried to ensure that your benchmark may make sense than asking for a proof that it doesn't. Explaining the behavior may sometimes take hours and we don't want the time to be wasted. What the benchmarking harnesses do should be close to what happens in real, assuming the app runs long enough (which is what Java was meant for). Indeed, on Android it may be different.Scoutmaster
I agree, I'm trying to optimize RAM usage, using shorts would improve it. But I don't want the performance to go down. What should I do? The objects using the shorts are used allot!Weightlessness
Reopened - this question now has a sensible benchmark with a more plausible set of numbers. I'm fairly sure, though, that the question is entirely answered by the one linked to by @StephenC. It's unlikely that using short will reduce your memory footprint.Uis
F
7

Java byte code does not support basic operations (+, -, *, /, >>,>>>, <<, %) on primitive types smaller than int. There are simply no byte codes allocated for such operations in the instruction set. Thus the VM needs to convert the short(s) to int(s), performs the operation, then truncates the int back to short and stores that in the result.

Check out the generated byte code with javap to see the difference between your short and int tests.

The VM/JIT optimizations are apparently heavily biased towards int/long operations, which makes sense since they are the most common.

Types smaller than int have their uses, but primarily for saving memory in arrays. They are not as well suited as simple class members (of course you still do use them when its the appropiate type for the data). Smaller members may not even reduce an objects size. Current VM's are (again) mainly tailored for execution speed, so the VM may even align fields to native machine word boundaries to increase access performance at the expense of memory spend.

Faucal answered 19/6, 2014 at 17:2 Comment(0)
O
3

It is possible due to the way java/android handles integer arithmetics with regard to primitives that are lesser than an int.

When two primitives are added in java that are of a datatype that is smaller than an int, they are automatically promoted to the integer datatype. A cast is normally required to convert the result back into the necessary datatype.

The trick comes with shorthand operations like +=, -= and so on where the cast happens implicitly such that the final result of the operation:

resultShort += testShort;

actually resembles something like this:

resultShort = (short)((int) resultShort + (int) testShort);

If we look at the disassembled bytecode of a method:

public static int test(int a, int b){
    a += b;
    return a;
}

we see:

public static int test(int, int);
    Code:
       0: iload_0       
       1: iload_1       
       2: iadd          
       3: istore_0      
       4: iload_0       
       5: ireturn   

comparing this to the identical method with datatype replaced for short we get:

public static short test(short, short);
    Code:
       0: iload_0       
       1: iload_1       
       2: iadd          
       3: i2s           
       4: istore_0      
       5: iload_0
       6: ireturn

Notice the additional instruction i2s (integer to short). This is the likely culprit of the loss of performance. Another thing you can notice is that all instructions are integer-based denoted by the prefix i (e.g. iadd meaning integer-add). Which means somewhere during the iload phase, the shorts were getting promoted to integers which is likely to cause performance degradations as well.

If you can take my word for it, the bytecode for long arithmetics is identical to the integer one with exception that the instructions are long-specific (e.g. ladd instead of iadd).

Overtone answered 19/6, 2014 at 17:6 Comment(2)
This answer is on the right track. However, it is important to remember that the JVM does not directly execute bytecodes (once it has been JIT compiled). Hence the bytecodes do not directly explain the difference. For instance, if the native instruction set had direct support for 16 arithmetic, and the JIT compiler was smart enough to use it, then you might expect short arithmetic to be faster than long arithmetic.Simba
But the reality is that instruction sets for PC, server and even smart phone devices are tuned for 32 bit / 64 bit operations rather that 16 bit operations. Hence, doing 16 bit arithmetic typically requires more native instructions and more clock cycles, which makes it slower than 32 and 64 bit ... as the OP's benchmarking shows. But this is highly dependent on the target platform hardware ... and potentially on the JIT compiler.Simba

© 2022 - 2024 — McMap. All rights reserved.