I was just migrating a module from the old java dates to the new java.time API, and noticed a huge drop in performance. It boiled down to parsing of dates with timezone (I parse millions of them at a time).
Parsing of date string without a time zone (yyyy/MM/dd HH:mm:ss
) is fast - about 2 times faster than with the old java date, about 1.5M operations per second on my PC.
However, when the pattern contains a time zone (yyyy/MM/dd HH:mm:ss z
), the performance drops about 15 times with the new java.time
API, while with the old API it is about as fast as without a time zone. See the performance benchmark below.
Does anyone have an idea if I can somehow parse these strings quickly using the new java.time
API? At the moment, as a workaround, I am using the old API for parsing and then convert the Date
to Instant, which is not particularly nice.
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.time.format.DateTimeFormatter;
import java.time.format.DateTimeFormatterBuilder;
import java.util.concurrent.TimeUnit;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OperationsPerInvocation;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@BenchmarkMode(Mode.AverageTime)
@OperationsPerInvocation(1)
@Fork(1)
@Warmup(iterations = 3)
@Measurement(iterations = 5)
@State(Scope.Thread)
public class DateParsingBenchmark {
private final int iterations = 100000;
@Benchmark
public void oldFormat_noZone(Blackhole bh, DateParsingBenchmark st) throws ParseException {
SimpleDateFormat simpleDateFormat =
new SimpleDateFormat("yyyy/MM/dd HH:mm:ss");
for(int i=0; i<iterations; i++) {
bh.consume(simpleDateFormat.parse("2000/12/12 12:12:12"));
}
}
@Benchmark
public void oldFormat_withZone(Blackhole bh, DateParsingBenchmark st) throws ParseException {
SimpleDateFormat simpleDateFormat =
new SimpleDateFormat("yyyy/MM/dd HH:mm:ss z");
for(int i=0; i<iterations; i++) {
bh.consume(simpleDateFormat.parse("2000/12/12 12:12:12 CET"));
}
}
@Benchmark
public void newFormat_noZone(Blackhole bh, DateParsingBenchmark st) {
DateTimeFormatter dateTimeFormatter = new DateTimeFormatterBuilder()
.appendPattern("yyyy/MM/dd HH:mm:ss").toFormatter();
for(int i=0; i<iterations; i++) {
bh.consume(dateTimeFormatter.parse("2000/12/12 12:12:12"));
}
}
@Benchmark
public void newFormat_withZone(Blackhole bh, DateParsingBenchmark st) {
DateTimeFormatter dateTimeFormatter = new DateTimeFormatterBuilder()
.appendPattern("yyyy/MM/dd HH:mm:ss z").toFormatter();
for(int i=0; i<iterations; i++) {
bh.consume(dateTimeFormatter.parse("2000/12/12 12:12:12 CET"));
}
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder().include(DateParsingBenchmark.class.getSimpleName()).build();
new Runner(opt).run();
}
}
And the results for 100K operations:
Benchmark Mode Cnt Score Error Units
DateParsingBenchmark.newFormat_noZone avgt 5 61.165 ± 11.173 ms/op
DateParsingBenchmark.newFormat_withZone avgt 5 1662.370 ± 191.013 ms/op
DateParsingBenchmark.oldFormat_noZone avgt 5 93.317 ± 29.307 ms/op
DateParsingBenchmark.oldFormat_withZone avgt 5 107.247 ± 24.322 ms/op
UPDATE:
I just did some profiling of the java.time classes, and indeed, the time zone parser seems to be implemented quite inefficiently. Just parsing a standalone timezone is responsible for all the slowness.
@Benchmark
public void newFormat_zoneOnly(Blackhole bh, DateParsingBenchmark st) {
DateTimeFormatter dateTimeFormatter = new DateTimeFormatterBuilder()
.appendPattern("z").toFormatter();
for(int i=0; i<iterations; i++) {
bh.consume(dateTimeFormatter.parse("CET"));
}
}
There is a class called ZoneTextPrinterParser
in the java.time
bundle, which is internally making a copy of the set of all available time zones in every parse()
call (via ZoneRulesProvider.getAvailableZoneIds()
), and this is accountable for 99% of the time spent in the zone parsing.
Well, an answer then might be to write my own zone parser, which would not be too nice either, because then I could not build the DateTimeFormatter
via appendPattern()
.
ZoneTextPrinterParser
line 3718 calls the static methodZoneRulesProvider.getAvailableZoneIds()
which isreturn new HashSet<>(ZONES.keySet());
. However this is called only once everyparse
invocation for a time zone. Creating the set seems to be time consuming as it contains hundreds of objects. – McdermottDateTimeFormatterBuilder
, though I'm not sure what and what effect it has on the behavior of the invocation. – McdermottZoneIdPrinterParser
instead, but it appears to be the same issue. – PrudegetTree
of bothZoneIdPrinterParser
andZoneTextPrinterParser
. Same issue, good find. I doubt it will be fixed before JDK 9, though. – McdermottZoneRulesProvider
which would allow you to override this method as well, so you can try to do what you suggested. Oracle will "have to" figure this out by themselves. – Mcdermott