How can I change NumberFormatter::parseCurrency() behavior of accepting white space and non breaking space?
Asked Answered
D

1

11

I'm trying to parse localized currency strings to currency and float value.

Everything works well for a while, now we experiencing some problems. It seems that NumberFormatter::parseCurrency uses an additional invisible character:

Testcode:

<?php
$formatter = new NumberFormatter("de_DE", NumberFormatter::CURRENCY);
var_dump(array(
    $formatter->parseCurrency("88,22 €", $curr), // taken from output of $formatter->format(88.22)
    $formatter->parseCurrency("88,22 €", $curr), // input with keyboard
    $formatter->parseCurrency("88,22 \xE2\x82\xAc", $curr), // just a test
    $formatter->format(88.22),
    "88,22 €" // keyboard input
));

Output:

array(5) {
  [0]=> float(88,22)
  [1]=> bool(false)
  [2]=> bool(false)
  [3]=> string(10) "88,22 €" // this as input works
  [4]=> string(9) "88,22 €" // this not...
}

As you can see, there is a difference in string length of output 3 and 4.

I get same results in PHP 5.3 (ubuntu with mbstring enabled) and 5.4 (Zend Server on Mac OS X).

The main problem is, input values from my form (ZF1 Application) are equally to output with index 4...

any suggestions? thanks in advance

Edit1:

hexdump of working value:

00000000  38 38 2c 32 32 c2 a0 e2  82 ac 0a                 |88,22......|
0000000b

hexdump of non working value:

00000000  38 38 2c 32 32 20 e2 82  ac 0a                    |88,22 ....|
0000000a

Edit2:

It seems to be a problem with the used whitepsace. c2 a0 is NO-BREAK SPACE and (maybe?) required by NumberFormatter::parseCurrency(). but 0x20 is the default space (which is entered in the input form). Current workaround is replacing the whitespace with NO-BREAK SPACE with $value = str_replace("\x20", "\xC2\xA0", $value);

Edit3:

On another System (Mac OS X with Zend Server 5.6, mbstring enabled, PHP 5.3.14) everything works as expected:

array(5) {
  [0]=> float(88,22)
  [1]=> float(88,22)
  [2]=> float(88,22)
  [3]=> string(9) "88,22 €"
  [4]=> string(9) "88,22 €"
}

Edit4:

The main difference between working with space and working with non break space configuration is the ICU version:

working version:

intl

Internationalization support => enabled
version => 1.1.0
ICU version => 3.8.1

Directive => Local Value => Master Value
intl.default_locale => no value => no value
intl.error_level => 0 => 0

not working version:

intl

Internationalization support => enabled
version => 1.1.0
ICU version => 4.8.1.1
ICU Data version => 4.8.1

Directive => Local Value => Master Value
intl.default_locale => no value => no value
intl.error_level => 0 => 0
Dysgraphia answered 8/5, 2013 at 10:24 Comment(4)
Just an idea: Is the € sign from the formatter UTF-8 encoded (0x20AC) and the one from keyboard Latin-1 (0x80)? As far as I know the strlen() function is not aware of Unicode characters. If it is internally used by var_dump(), that would explain the additional character.Amrita
my terminal app (iTerm2) uses Unicode(UTF-8) as Terminal Emulation. Also, this error/behavior happens from input data from browser via html form text input fields. I added the hexdump output for clarification.Dysgraphia
is the file saved as UTF-8?Beware
Yes it is, on all tested systemsDysgraphia
G
3

NumberFormatter::parseCurrency is a thin wrapper around the ICU library function unum_parseDoubleCurrency (see source).

The ICU library function is restrictive in that it will only parse strings that would result from its dual function unum_formatDoubleCurrency. The format is driven by the Unicode locale data, which specifies a non-breaking space between the currency value and the numeric value. Evidently the earlier version of the library accepted other whitespace characters.

In short, you can't make NumberFormatter::parseCurrency accept spaces. However, Zend_Currency should also output non-breaking spaces by default:

$currency = new Zend_Currency(array(
     'currency' => 'EUR',
     'value'    => 88.22,
), 'de_DE');

var_dump(
    strval($currency),             // 88,22 €
    strpos($currency, "\x20"),     // false
    strpos($currency, "\xc2\xa0")  // 5
);

The question is which part of your application is outputting a space and how you address it. You mention it's part of your form, so maybe you could look at having the form return the currency and the value as separate fields, so that you don't have to worry about parsing the number. If the user is entering the string "88,22 €" themselves, you could potentially run in to more problems than just the whitespace issue. Having said that, the workaround you mention (replacing \x20 with \xc2\xa0) is the only way to address that if you want to use NumberFormatter.

Gregarious answered 12/5, 2013 at 16:7 Comment(1)
thx for the explanation! Zend_Currency returns correct value. But my form allows direct user input with float in local format with currency symbol. A full solution with Zend Framework is a custom filter which is added to the element, because this isn't a real issue on NumberFormatter or pecl-intl.Dysgraphia

© 2022 - 2024 — McMap. All rights reserved.