I was curious if we finally get a fast datetime library with JDK8. Nearly all LocalDate
computations use toEpochDay
so I looked at the source and the large number of divisions and branches made me curious if I could do better.
I eliminated some branching and all but one division, however the speedup is worse than expected. So my first question how is it possible that an algorithm using multiple division takes only about 30 cycles (throughput). Holger's comments seem to have answered this: Division by a small constant gets JIT-ed to multiplication. I did it manually and now I'm consistently beating the original implementation by a factor of 2.
The benchmark is pretty straightforward, just iterate though an array of random LocalDate
s and convert each of them toEpochDay
. Despite the randomness, the results are pretty consistent. The size of the array is a parameter and my main question is where the big slowdown between 2000 and 30000 comes from. There's should be some slowdown as the data no more fit in the L1 cache, but the memory accesses of both algorithms are exactly the same (i.e., none but fetching the date
from the array).
The still open question is: How it comes that the behaviour of two simple memory-access-free implementations of the same function change when iterating over an array? The original algorithm suffers a much bigger slowdown than mine.
My algorithm is probably not worth copying here, it's undocumented and about as cryptic as the original, and there's only a very rudimentary test.