NumberFormat.parse("3.14") returning 314 when used with German Locale
Asked Answered
M

2

2

When I try to read the string "3.14" as a float using German Locale I expect one of two things to happen:

(1) throw an error, because that is not a valid way to write 3.14 in German
(2) fallback to a more standard decimal notation and read the number as 3.14 because that is what any German would read in that number

But instead I am getting 314.

import java.text.NumberFormat;
import java.util.Locale;


public class MyClass {
    public static void main(String args[]) throws Exception {
        System.out.println(
            NumberFormat.getNumberInstance(Locale.GERMANY).parse("3.14")
        ); // prints 314
    }
}

The oracle-documentation for parse states:

Number parse(String source)
Parses text from the beginning of the given string to produce a number.

Which does not really explain what I am seeing here as it doesn't specify any non-happy path. What is the javas understanding of a German decimal number, and how can I fail-fast and safely convert Strings to numbers assuming a German decimal notation?

Megacycle answered 14/2 at 16:22 Comment(3)
I think it takes . as thousand separator (and so just ignored). Parsing numbers (and dates) is still not yet a solved problem (all or most locale formats just expect formatting in the other direction, so they fails to handle the common formats, but one)Hoyt
Are you expecting/desiring 3,14?Never
@Never I was expecting the number 3.14f or an error, Neither is the case, see mine and the other answer.Megacycle
M
0

The basic assumption that NumberFormat would validate its input is wrong. A modern dev might expect a validation, especially because the method throws a ParseException as a checked exception, but with the magic of open-source I can look at the source and realize I am very wrong and this Java 1.1 code was written with different design principles than I am used to.

The critical code section in the concrete class that we are using here (for one implementation) is in openjdk > DecimalFormat.java > int subparseNumber, where the input string gets converted into a "DigitList". The digit-list for "3.14" with a German locale is indeed [3, 1, 4] because the thousands-separator is indeed ignored as @GiacomoCatenazzi pointed out in his comment 1, so subsequent code has to interpret it as 314. Also, when an invalid character is encountered, the parsing just stops, so for example "0x134" -> 0 with no error.

There is more to learn from the source-code: NumberFormat is not threadsafe, you may not reuse the same instance across multiple threads. The modern assumption that a function like format.parse(input) -> obj would be trivially safe because input and format are only accessed readonly does not hold - parsing changes internal state of the NumberFormat-instance. You can only reuse the instance after parse completes.


So how do I make a failfast conversion of Strings to numbers in Java?

(1) If you know the target type and the number is in the standard decimal format, this works:

Float.valueOf("3,14"); // NumberFormatException
Float.valueOf("3.14"); // 3.14f

Note that NumberFormat.getNumberInstance().parse("3,14") will return 314 - not an error - so this no-validation-problem is in no way exclusive to the German Locale.

(2) If I have to use German-locale-number-strings for reading numbers, I must check if the input-string matches expectation and NumberFormat does not provide any way to do that, nor does there seem to be a satisfying fail-fast/non-gigo answer to this 12-year old question about the problem: Convert String with Dot or Comma to Float Number

The best idea I have is to validate the input myself and restrict it that way. Here is a solution that is stricter than necessary, banning thousands-separators completely, but for my usecase, this is fine:

if (inputString.contains(".")) {
           // throw
}
return Float.valueOf(inputString.replace(',', '.'));

1 You can actually do format.setGroupingUsed(false), and then you can parse "3.14" as a 3 instead of a 314, so it is not entirely true they get fully ignored. But there is no code that uses the grouping-character to judge the correctness of the input String, even though there is format.setGroupingSize and getter which controls how many digits should be grouped together.

Megacycle answered 15/2 at 9:47 Comment(0)
E
1

The problem that you're facing is because the . is considered the grouping separator in the German locale:

enter image description here

The previous image is extracted from the DecimalFormat.java class

After that, if the parse finds the grouping character is just ignoring it:

} else if (!isExponent && ch == grouping && isGroupingUsed()) {
    if (sawDecimal) {
        break;
    }
    // Ignore grouping characters, if we are using them, but
    // require that they be followed by a digit.  Otherwise
    // we backup and reprocess them.
    backup = position;
}

Before you ask, sawDecimal is false and the backup is initially -1 at the start of the loop and -1 when the next digit 1 is found. So, backup = position; is not doing anything.

Edva answered 15/2 at 9:19 Comment(1)
Thanks. I began writing my answer before your was there, looks like we found the same stdlib-function as the culprit. I know that "." is the German thousands-separator, sorry that wasn't clear from my question. I would have expected the input string to be invalid because the thousands-separator is not in the thousands-position.Megacycle
M
0

The basic assumption that NumberFormat would validate its input is wrong. A modern dev might expect a validation, especially because the method throws a ParseException as a checked exception, but with the magic of open-source I can look at the source and realize I am very wrong and this Java 1.1 code was written with different design principles than I am used to.

The critical code section in the concrete class that we are using here (for one implementation) is in openjdk > DecimalFormat.java > int subparseNumber, where the input string gets converted into a "DigitList". The digit-list for "3.14" with a German locale is indeed [3, 1, 4] because the thousands-separator is indeed ignored as @GiacomoCatenazzi pointed out in his comment 1, so subsequent code has to interpret it as 314. Also, when an invalid character is encountered, the parsing just stops, so for example "0x134" -> 0 with no error.

There is more to learn from the source-code: NumberFormat is not threadsafe, you may not reuse the same instance across multiple threads. The modern assumption that a function like format.parse(input) -> obj would be trivially safe because input and format are only accessed readonly does not hold - parsing changes internal state of the NumberFormat-instance. You can only reuse the instance after parse completes.


So how do I make a failfast conversion of Strings to numbers in Java?

(1) If you know the target type and the number is in the standard decimal format, this works:

Float.valueOf("3,14"); // NumberFormatException
Float.valueOf("3.14"); // 3.14f

Note that NumberFormat.getNumberInstance().parse("3,14") will return 314 - not an error - so this no-validation-problem is in no way exclusive to the German Locale.

(2) If I have to use German-locale-number-strings for reading numbers, I must check if the input-string matches expectation and NumberFormat does not provide any way to do that, nor does there seem to be a satisfying fail-fast/non-gigo answer to this 12-year old question about the problem: Convert String with Dot or Comma to Float Number

The best idea I have is to validate the input myself and restrict it that way. Here is a solution that is stricter than necessary, banning thousands-separators completely, but for my usecase, this is fine:

if (inputString.contains(".")) {
           // throw
}
return Float.valueOf(inputString.replace(',', '.'));

1 You can actually do format.setGroupingUsed(false), and then you can parse "3.14" as a 3 instead of a 314, so it is not entirely true they get fully ignored. But there is no code that uses the grouping-character to judge the correctness of the input String, even though there is format.setGroupingSize and getter which controls how many digits should be grouped together.

Megacycle answered 15/2 at 9:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.