JavaScript can't convert Hindi/Arabic numbers to real numeric variables
Asked Answered
K

4

28

I'm trying to use a DOM coming from an external source, and in it there are some numeric values in Hindi/Arabic transcription, like "۱۶۶۰", and when I want to convert it into numeric value I get NaN.

What's wrong here?

A small code snippet to be tried:

alert(Number("۱۶۶۰") + ' - ' + Number("1660"));
Kulp answered 10/6, 2013 at 13:23 Comment(3)
I tried your code in Chrome's console, and I got TypeError: Object function Number() { [native code] } has no method 'parseLocale'. No luck searching Mozilla's documentation either. Is that an IE only thing?Yoshi
No need to use parseLocale method.Zapata
JavaScript doesn't yet support parsing such numbers.Coincident
K
56

Well, the Number function does expect the digits 0 to 9 and does not handle arabic ones.

You will need to take care of that yourself:

function parseArabic(str) {
  return Number(str
    .replace(/[٠١٢٣٤٥٦٧٨٩]/g, d => d.charCodeAt(0) - 1632) // convert Arabic digits
    .replace(/[۰۱۲۳۴۵۶۷۸۹]/g, d => d.charCodeAt(0) - 1776) // convert Persian digits
  );
}
// usage example:
console.log( parseArabic("۱۶۶۰") )
Kolodgie answered 10/6, 2013 at 13:44 Comment(4)
The first number set is Arabic, the second one is Persian.Enlargement
Because I couldn't find any answer for the other way around, here it is (Arabic only) function eurToArab(num) { return num.toString().replace(/\d/g, function(d) { return String.fromCharCode(parseInt(d[0], 10) + 1632); }); }Huckleberry
Are there any examples of number systems that do not use Base 10 that we may need to handle?Greengage
@RichardCorfield None that would matter, at least not in this questionKolodgie
S
15

I would suggest you handle it at a lower level: replace the Arabic digits with the corresponding ASCII digits and then convert.

For example:

>a='\u0661\u0666\u0666\u0660'
"١٦٦٠"
>b='\u06f1\u06f6\u06f6\u06f0'
"۱۶۶۰"
>r=/[\u0660-\u0669\u06F0-\u06F9]/g;
/[\u0660-\u0669\u06F0-\u06F9]/g
>a.replace(r,function(c) { return '0123456789'[c.charCodeAt(0)&0xf]; } )
"1660"
>b.replace(r,function(c) { return '0123456789'[c.charCodeAt(0)&0xf]; } )
"1660"
Saurel answered 10/6, 2013 at 13:33 Comment(0)
W
0

Here is a function called paserNumber that converts a string representing a number into an actual JS Number object. It can also accept number strings with fractions (decimal numbers) and Arabic/Persian/English thousands separators. I don't know whether this solution is the best, performance-wise.

function parseNumber(numberText: string) {
    return Number(
        // Convert Persian (and Arabic) digits to Latin digits
        normalizeDigits(numberText)
        // Convert Persian/Arabic decimal separator to English decimal separator (dot)
        .replace(/٫/g, ".")
        // Remove other characters such as thousands separators
        .replace(/[^\d.]/g, "")
    );
}

const persianDigitsRegex = [/۰/g, /۱/g, /۲/g, /۳/g, /۴/g, /۵/g, /۶/g, /۷/g, /۸/g, /۹/g];
const arabicDigitsRegex = [/٠/g, /١/g, /٢/g, /٣/g, /٤/g, /٥/g, /٦/g, /٧/g, /٨/g, /٩/g];

function normalizeDigits(text: string) {
    for (let i = 0; i < 10; i++) {
        text = text
                .replace(persianDigitsRegex[i], i.toString())
                .replace(arabicDigitsRegex[i], i.toString());
    }
    return text;
}

Note that the parse function is quite forgiving and the number string can be a combination of Persian/Arabic/Latin numerals and separators.

Side note

After getting a Number you can format it back however you want with Number.toLocaleString function:

let numberString = "۱۲۳۴.5678";
let number = parseNumber(numberString);
val formatted1 = number.toLocaleString("fa"); // OR "fa-IR" for IRAN
val formatted2 = number.toLocaleString("en"); // OR "en-US" for USA
val formatted3 = number.toLocaleString("ar-EG"); // OR "ar" which uses western numerals

For more information about formatting numbers, refer to this answer.

Warsle answered 24/12, 2021 at 14:58 Comment(0)
X
0

This is a function which can convert all Unicode decimal digits to ASCII decimal digits. Unicode has a Nd (Number decimal) for the digits and we can search for it using /\p{Nd}/u regex. The category has 650 characters - that is 65 different numeric alphabets, which makes it impractical to hardcode the replace calls, if you'd want to support them all.

We can take advantage of the fact that all numeric blocks either start at U+...0, or end at U+...F, as you can verify in this table: https://www.compart.com/en/unicode/category/Nd Then we can simply take the last hexadecimal digit of the codepoint (using & 0xf), and subtract 6 for the blocks that end at U+...F. This is easily checked by asking whether code | 0xf is also in Nd, using the same regex we are searching with. Fortunately, no Nd character is above U+FFFF, we don't have to work with UTF-16 surrogate pairs.

function normalizeDigits(str) {
  // find all characters which are DecimalNumber (property Nd), except for ASCII 0-9
  return str.replace(/(?![0-9])\p{Nd}/ug, g => {
    // all Nd blocks start at 0x...0 or end at 0x...F (and starts at 0x...6)
    // if it starts at 0x...0, the ASCII decimal number is (i & 0xf)
    // if it ends at 0x...F, the ASCII decimal number is (i & 0xf) - 6
    // we recognize the 2 cases by testing if code | 0xf == 0x...F is still a decimal number
    const code = g.charCodeAt(0)
    return (code & 0xf) - 6 * /\p{Nd}/u.test(String.fromCodePoint(code | 0xf))
  })
}
Xeres answered 20/2, 2024 at 13:12 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.