Compare strings ignoring accented characters [duplicate]
Asked Answered
P

2

23

I would like to know if there is a method that compares 2 strings and ignores the accents making "noção" equal to "nocao". it would be something like string1.methodCompareIgnoreAccent(string2);

Pettway answered 3/3, 2015 at 14:4 Comment(3)
Have you looked at Collator?Intended
You can also have a look at #1009302.Incorrigible
I have written a class for searching trough arabic texts by ignoring diacritic (NOT removing them). maybe you can get the idea or use it in some way. gist.github.com/mehdok/e6cd1dfccab0c75ac7a9536c6afac8ffJeopardous
N
51

You can use java Collators for comparing the tests ignoring the accent and case, see a simple example:

import java.text.Collator;

/**
 * @author Kennedy
 */
public class SimpleTest
{

  public static void main(String[] args)
  {
    String a = "nocao";
    String b = "noção";

    final Collator instance = Collator.getInstance();

    // This strategy mean it'll ignore the accents and the case
    instance.setStrength(Collator.PRIMARY);

    // Will print 0 because its EQUAL
    System.out.println(instance.compare(a, b));
  }
}

Documentation: JavaDoc

Be aware that this collator also ignores differences in case, i.e. it also treats "NOCAO" as equal to "noção". To create a collator that ignores accent differences but distingishes case, you might be able to use a RuleBasedCollator

Do not confuse Collator.setStrength() with Collator.setDecomposition(). The Collator constants PRIMARY, SECONDARY, TERTIARY and IDENTICAL must only be used with setStrength(), while the constants NO_DECOMPOSITION, CANONICAL_DECOMPOSITION and FULL_DECOMPOSITION must only be used with setDecomposition(). (A previous version of this code mixed this up and only worked because NO_DECOMPOSITION and PRIMARY happen to have the same integer value.)

Nub answered 3/3, 2015 at 15:3 Comment(5)
thank you. Didn't know about CollatorPettway
this doesn't work, it won't print 0. Sometimes it prints -1 other times 1Pettway
It works, get the code before the weston edited.Nub
your first answer worked. Now I want to edit the answer so that it shows your answer, do you now how I can do it?Pettway
did it! you can accept this asnwer again if you want too.Nub
A
9

There is no built in method to do this, so you have to build your own:

A part of this is solution is from here : This first splits all accented characters into their deAccented counterparts followed by their combining diacritics. Then you simply remove all combining diacritics. Also see https://mcmap.net/q/100874/-converting-symbols-accent-letters-to-english-alphabet

And then your equals method will look like this:

import java.text.Normalizer;
import java.text.Normalizer.Form;

public boolean equals(Object o) {
    // Code omitted
    if (yourField.equals(removeAccents(anotherField))) {
        return true;
    }
}

public static String removeAccents(String text) {
    return text == null ? null : Normalizer.normalize(text, Form.NFD)
            .replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
}
Anima answered 3/3, 2015 at 14:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.