Why parsing a String into Date in Java is slow? Can we accelerate it?
Asked Answered
V

3

12

I am reading a text file containing dates, and I want to parse the Strings representing the dates into Date objects in java. What I notice is the operation is slow. Why? is there any way to accelerate it? My file looks like:

2012-05-02 12:08:06:950, secondColumn, thirdColumn
2012-05-02 12:08:07:530, secondColumn, thirdColumn
2012-05-02 12:08:08:610, secondColumn, thirdColumn

I am reading the file line by line, then I am getting the date String from each line, then I am parsing it into a Date object using a SimpleDateFormat as follow:

DataInputStream in = new DataInputStream(myFileInputStream);
BufferedReader  br = new BufferedReader(new InputStreamReader(in));
String strLine;

SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
while ((strLine = br.readLine()) != null)
{
    ....Do things....
    Date myDateTime = (Date)formatter.parse(myDateString);
    ...Do things....
}
Volteface answered 3/8, 2012 at 14:17 Comment(16)
did you try using the same SimpleDateFormat instance throughout the entire file parse operation?Pluperfect
how have you determined that it is slow?Rascon
@Micheal, I just comment the operations related to the parse, the reading loop (line by line) is much quicker then.Volteface
@jtqhlborn yes the SimpleDateFormat is outside the reading loop, it is common for all the file.Volteface
The posted code is not enough to tell how you are handing the situation. How many lines do you have in your file, and how long is it taking?Interstitial
If you are creating a new SimpleDateFormat instance in a loop everytime, your code will be slow. Creating SimpleDateFormat is expensive, try to define it outside the loop and resuse it. [A nice article on SimpleDateFormat performance.][1] [1]: thedwick.com/2008/04/simpledateformat-performance-pigPyrochemical
Define slow. How slow is slow?Grooms
@BheshGurung I just edited my code... my files contain about 3000 lines each.Volteface
Take a look at the code of SimpleDateFormat::parse(String) to see it's not an easy task. Especially the error handling is quite a bit of stuff. If your dates always look the same, you could parse them from the line yourself and fill the date instance accordingly. If that is faster I wouldn't dare to answer beforehand though.Pavyer
@Pyrochemical I am actually defining my SimpleDateFormat outside the loop, I just edited my code.Volteface
@Pavyer yes maybe is my only solution the... Thank you for this propositionVolteface
Have you measured just the parsing of the date? Or is it possible that the "Do things" parts are the real bottleneck?Pavyer
@Pavyer yes, 'Do things' is just dummy operations like incrementing an integer counter. I completely removed do Things. Without the date parsing reading the file line by line is just a matter of seconds, but with the parsing operation, it takes several minutes for one file.Volteface
If you have control over the creation of the file you want to read in, you could this: Add the date as a long when creating the file, read the long instead of parsing the above string and use the Date(long date) constructor.Pavyer
I really wish people would stop mixing DataInputStream with BufferedReader. Whoever started this meme ..... grrr.Poss
FYI, the troublesome old date-time classes such as java.util.Date, java.util.Calendar, and java.text.SimpleDateFormat are now legacy, supplanted by the java.time classes built into Java 8 & Java 9. See Tutorial by Oracle.Anitaanitra
P
8

The converting of dates and timezone is expensive. If you can assume your date/times are similar to each other, you can convert the date and hours/minutes (or only dates if you use GMT) whenever minutes change and generate the seconds yourself.

This will call parse once per minute. Depending on your assumptions you could make it once per hours or once per day.

String pattern = "yyyy-MM-dd HH:mm";
SimpleDateFormat formatter = new SimpleDateFormat(pattern);
String lastTime = "";
long lastDate = 0;
while ((strLine = br.readLine()) != null) {
    String myDateString = strLine.split(", ")[0];
    if (!myDateString.startsWith(lastTime)) {
        lastTime = myDateString.substring(0, pattern.length());
        lastDate = formatter.parse(lastTime).getTime();
    }
    Date date = new Date(lastDate + Integer.parseInt(myDateString.substring(pattern.length() + 1).replace(":", "")));
}
Poss answered 3/8, 2012 at 15:2 Comment(1)
+1 for a code sample. OP - can we assume the file is in date/time order?Niggard
A
4

tl;dr

  • Use java.time rather than legacy classes.
  • Each parse of String to LocalDateTime with DateTimeFormatter takes less than 1,500 nanoseconds each (0.0000015 seconds).

java.time

You are using troublesome old date-time classes that are now legacy, supplanted by the java.time classes.

Let's do a bit of micro-benchmarking to see just how slow/fast is parsing a date-time string in java.time.

ISO 8601

The ISO 8601 standard defines sensible practical formats for textually representing date-time values. The java.time classes use these standard formats by default when parsing/generating strings.

Use these standard formats instead of inventing your own, as seen in the Question.

DateTimeFormatter

Define a formatting pattern to match your inputs.

DateTimeFormatter f = DateTimeFormatter.ofPattern( "uuuu-MM-dd HH:mm:ss:SSS" );

We will parse each such input as a LocalDateTime because your input lacks an indicator of time zone or offset-from-UTC. Keep in mind that such values do not represent a moment, are not a point on the timeline. To be an actual moment requires the context of a zone/offset.

String inputInitial = "2012-05-02 12:08:06:950" ;
LocalDateTime ldtInitial = LocalDateTime.parse( inputInitial , f );

Let's make a bunch of such inputs.

int count = 1_000_000;
List < String > inputs = new ArrayList <>( count );

for ( int i = 0 ; i < count ; i++ )
{
    String s = ldtInitial.plusSeconds( i ).format( f );
    inputs.add( s );
}

Test harness.

long start = System.nanoTime();
for ( String input : inputs )
{
    LocalDateTime ldt = LocalDateTime.parse( input , f );
}
long stop = System.nanoTime();
long elapsed = ( stop - start );
long nanosPerParse = (elapsed / count ) ;
Duration d = Duration.ofNanos( elapsed );

Dump to console.

System.out.println( "Parsing " + count + " strings to LocalDateTime took: " + d  + ". About " + nanosPerParse + " nanos each.");

Parsing 1000000 strings to LocalDateTime took: PT1.320778647S. About 1320 nanos each.

Too slow?

So it takes about a second and a half to parse a million such inputs, on a MacBook Pro laptop with quad-core Intel i7 CPU. In my test runs, each parse takes about 1,000 to 1,500 nanoseconds each.

To my mind, that is not a performance problem.


About java.time

The java.time framework is built into Java 8 and later. These classes supplant the troublesome old legacy date-time classes such as java.util.Date, Calendar, & SimpleDateFormat.

The Joda-Time project, now in maintenance mode, advises migration to the java.time classes.

To learn more, see the Oracle Tutorial. And search Stack Overflow for many examples and explanations. Specification is JSR 310.

You may exchange java.time objects directly with your database. Use a JDBC driver compliant with JDBC 4.2 or later. No need for strings, no need for java.sql.* classes.

Where to obtain the java.time classes?

The ThreeTen-Extra project extends java.time with additional classes. This project is a proving ground for possible future additions to java.time. You may find some useful classes here such as Interval, YearWeek, YearQuarter, and more.

Anitaanitra answered 5/3, 2018 at 0:29 Comment(2)
By far the best answer!! Great!!Tetrameter
Interestingly, "LocalDateTime.parse(..).toEpochSecond(ZoneOffset.UTC)" is actually slower than "SimpleDateFormat#parse(..).getTime()" using identical pattern ("yyyy-MM-dd'T'HH:mm:ss"). By about 10-15% for me. I didn't expect that.Hass
G
2

I would suggest writing a custom parser, which is going to be faster. Something like:

Date parseYYYYMMDDHHMM(String strDate) {
   String yearString = strDate.substring(0, 4);
   int year = Integer.parseInt(yearString);
   ...

Another way is using pre-computed hashmap of datetime (w/o millis) to unix-timestamp. Will work if there are no much distinct dates (or you can recompute it once the date flips over).

Goldofpleasure answered 9/6, 2016 at 11:32 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.