Java DateFormat.parse thinks "100 112TH AVE NE" is a date
Asked Answered
P

3

6

I'm using the code included here to determine whether given values are valid dates. Under one specific case, it's evaluating the following street address:

100 112TH AVE NE

Obviously not a date, but Java interprets it as:

Sun Jan 12 00:00:00 EST 100

The code in question:

String DATE_FORMAT = "yyyyMMdd";
try {
    DateFormat dfyyyyMMdd = new SimpleDateFormat(DATE_FORMAT);
    dfyyyyMMdd.setLenient(false);
    Date formattedDate;
    formattedDate = dfyyyyMMdd.parse(aValue);
    console.debug(String.format("%s = %s","formattedDate",formattedDate));
} catch (ParseException e) {
    // Not a date
}

The console returns:

11:41:40.063 DEBUG TestValues | formattedDate = Sun Jan 12 00:00:00 EST 100

Any idea what's going on here?

Parchment answered 15/4, 2014 at 17:7 Comment(7)
So, why are you passing an address to the date formatter??Popp
(Looks to me like it parses year = 100, month = 1, day = 12, which is about the best one could hope for.)Popp
this is funny :-) [sorry]Cyndie
setLenient seem not to be working hereCyndie
I always thought setLenient was something about forcing the date to be correct, and it is somehow - docs.oracle.com/javase/7/docs/api/java/util/… but it's not in a way someone would expect from a Formatter class I guessCyndie
@Bobby, I think you'll have to parse and then format again, and then check if both are the same, to be sure it's parsing correctly. It's a shame IMO.Cyndie
@HotLicks, picking flea-s$!t out of pepper, as the saying goes. I've got a bajillion values coming out of a file, and I need to separate the dates from the not-dates.Parchment
C
8

The parse method does not verify that the entire string was consumed when parsing; you can have random garbage after a valid date and everything works. In this case, it's a little surprising that 100 112 can be successfully parsed as a date, but it can.

You can supply a ParsePosition to verify that the entire string was consumed when parsing.

ParsePosition pos = new ParsePosition(0);
dfyyyyMMdd.parse(aValue, pos);
if (pos.getIndex() != aValue.length()) {
    // there's garbage at the end
}
Columelliform answered 15/4, 2014 at 17:25 Comment(3)
Of course, one should probably make sure that the "excess" isn't simply whitespace.Popp
I also had to check that each string was not empty. When pos==0 and string is empty, pos.getIndex==aValue.length()==0.Parchment
@Joni, I ran a test with this, and ParsePosition seems much more efficient. The test parsed 3 million values, some valid dates, some errors. Results in nanoseconds: joda.parseDateTime == 16,480,274,565; dateTime.parse == 18,584,338,145; dateTime.parse+pos == 2,292,225,286. I couldn't find an equivalent to ParsePosition in joda.Parchment
L
4

As per the documentation, the parse method may NOT use the entire text of the string -

   public Date parse(String source)
               throws ParseException

    Parses text from the beginning of the given string to produce a date. 
    The method may not use the entire text of the given string. 

I checked the source code for SimpleDateFormat and found that it is parsing the string only up to the length of the compiledPattern.

Thus, the strings of the form -

yyyyMMdd(followed by anything)

will be parsed without any errors.

So, for e.g. it also parses -

"10000514blabla" --> Tue May 14 00:00:00 EST 1000
"100 112"        --> Sun Jan 12 00:00:00 EST 100
"1 112xyz"       --> Wed Jan 12 00:00:00 EST 1
Lisp answered 15/4, 2014 at 17:31 Comment(0)
B
0

java.time

The legacy date-time API (java.util date-time types and their formatting API, SimpleDateFormat) are outdated and error-prone. It is recommended to stop using them completely and switch to java.time, the modern date-time API*.

The format, yyyyMMdd is actually wrong for the string, 100 112TH AVE NE. The correct format would be y Mdd or y MMd. You can use yyy instead of a single y in any of these two formats. However, instead of throwing the relevant exception, it silently parsed the string erroneously.

Apart from this, note the following line from the documentation of SimpleDateFormat:

Parses text from the beginning of the given string to produce a date. The method may not use the entire text of the given string.

Using modern date-time API:

Unlike SimpleDateFormat, the modern date-time API is strict (doesn't assume the things in most cases) in terms of format which makes it much cleaner. It requires specifying the things explicitly to avoid ambiguity e.g. you can use ParsePosition to parse the given date-time string and thus you can see clearly what is happening.

Demo:

import java.text.ParsePosition;
import java.time.LocalDate;
import java.time.format.DateTimeFormatter;

public class Main {
    public static void main(String args[]) {
        System.out.println(
                LocalDate.from(DateTimeFormatter.ofPattern("u Mdd").parse("100 112TH AVE NE", new ParsePosition(0))));
        
        System.out.println(
                LocalDate.from(DateTimeFormatter.ofPattern("u MMd").parse("100 112TH AVE NE", new ParsePosition(0))));
    }
}

Output:

0100-01-12
0100-11-02

Learn more about the the modern date-time API* from Trail: Date Time.


* For any reason, if you have to stick to Java 6 or Java 7, you can use ThreeTen-Backport which backports most of the java.time functionality to Java 6 & 7. If you are working for an Android project and your Android API level is still not compliant with Java-8, check Java 8+ APIs available through desugaring and How to use ThreeTenABP in Android Project.

Basically answered 1/5, 2021 at 11:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.