Somewhat related question, and a year old: Do any JVM's JIT compilers generate code that uses vectorized floating point instructions?
Preface: I am trying to do this in pure java (no JNI to C++, no GPGPU work, etc...). I have profiled and the bulk of the processing time is coming from the math operations in this method (which is probably 95% floating point math and 5% integer math). I've already reduced all Math.xxx() calls to an approximation that's good enough so most of the math is now floating point multiplies with a few adds.
I have some code that deals with audio processing. I've been making tweaks and have already come across great gains. Now I'm looking into manual loop unrolling to see if there's any benefit (at least with a manual unroll of 2, I am seeing approximately a 25% improvement). While trying my hand at a manual unroll of 4 (which is starting to get very complicated since I am unrolling both loops of a nested loop) I am wondering if there's anything I can do to hint to the jvm that at runtime it can use vector operations (e.g. SSE2, AVX, etc...). Each sample of the audio can be calculated completely independently of other samples, which is why I've been able to see a 25% improvement already (reducing the amount of dependencies on floating point calculations).
For example, I have 4 floats, one for each of the 4 unrolls of the loop to hold a partially computed value. Does how I declare and use these floats matter? If I make it a float[4] does that hint to the jvm that they are unrelated to each other vs having float,float,float,float or even a class of 4 public floats? Is there something I can do without meaning to that will kill my chance at code being vectorized?
I've come across articles online about writing code "normally" because the compiler/jvm knows the common patterns and how to optimize them and deviating from the patterns can mean less optimization. At least in this case however, I wouldn't have expected unrolling the loops by 2 to have improved performance by as much as it did so I'm wondering if there's anything else I can do (or at least not do) to help my chances. I know that the compiler/jvm are only going to get better so I also want to be wary of doing things that will hurt me in the future.
Edit for the curious: unrolling by 4 increased performance by another ~25% over unrolling by 2, so I really think vector operations would help in my case if the jvm supported it (or perhaps already is using them).
Thanks!