BreakIterator in Android counts character wrongly
Asked Answered
D

1

3

I am using BreakIterator to count the number of visible character in a String. This works perfectly for English language. But in case of Hindi language it doesn't work as expected.

The below String has a length of 3, but is considered as single character visually.

ज्य

When I used BreakIterator, I expect it to consider it as a single unit, but it considers it as 2 units. The below is my code:

    final String text = "ज्य";
    final Locale locale = new Locale("hi","IN");
    final BreakIterator breaker = BreakIterator.getCharacterInstance(locale);
    breaker.setText(text);
    int start = breaker.first();
    for (int end = breaker.next();
         end != BreakIterator.DONE;
         start = end, end = breaker.next()) {

        final String substring = text.substring(start, end);
    }

Ideally, the for loop should be executed ONCE with start=0 and end=3; But for the String above it's executed twice (start=0, end=2 and start=2, end=3).

How can I get BreakIterator to work exactly?

UPDATE:

The above piece of code works perfectly when run as a JAVA program. It misbehaves only when used in ANDROID.

Since this happens only in Android, I have reported a bug in android: https://code.google.com/p/android/issues/detail?id=230832

Degrease answered 21/12, 2016 at 19:1 Comment(2)
It's been a year and still Google hasn't found time to fix this. Sad baby always cries :(Ararat
Have you solved this issue? I am also stucked with this behaviour of Android.Slender
L
0

I think you need to play with unicode characters

Oracle Doc. for Character Boundaries

    final String text = "\u091C\u094D\u092F";
    final Locale locale = new Locale("hi","IN");
    final BreakIterator breaker = BreakIterator.getCharacterInstance(locale);
    breaker.setText(text);
    int start = breaker.first();
    for (int end = breaker.next();
         end != BreakIterator.DONE;
         start = end, end = breaker.next()) {

        final String substring = text.substring(start, end);
        System.out.println(substring);
    }
Lustrous answered 22/12, 2016 at 7:3 Comment(1)
Thanks SujitKumar. But look at my Update in the question. The piece of code works perfectly in Java. It misbehaves only when I use it in Android.Degrease

© 2022 - 2024 — McMap. All rights reserved.