Unicode categorizes characters as belonging to a script, such as the Latin script.
How do I test whether a particular character (code point) is in a particular script?
Unicode categorizes characters as belonging to a script, such as the Latin script.
How do I test whether a particular character (code point) is in a particular script?
Java represents the various Unicode scripts in the Character.UnicodeScript
enum, including for example Character.UnicodeScript.LATIN
. These match the Unicode Script Properties.
You can test a character by submitting its code point integer number to the of
method on that enum.
int codePoint = "a".codePointAt( 0 ) ;
Character.UnicodeScript script = Character.UnicodeScript.of( codePoint ) ;
if( Character.UnicodeScript.LATIN.equals( script ) ) { … }
Alternatively:
boolean isLatinScript =
Character.UnicodeScript.LATIN
.equals(
Character.UnicodeScript.of( codePoint )
)
;
Example usage.
System.out.println(
Character.UnicodeScript.LATIN // Constant defined on the enum.
.equals( // `java.lang.Enum.equals()` comparing two constants defined on the enum.
Character.UnicodeScript.of( // Determine which Unicode script for this character.
"😷".codePointAt( 0 ) // Get the code point integer number of the first (and only) character in this string.
) // Returns a `Character.UnicodeScript` enum object.
) // Returns `boolean`.
);
See this code run at IdeOne.com.
false
FYI, the Character
class lets you ask if a code point represents a character that isDigit
, isLetter
, isLetterOrDigit
, isLowerCase
, and more.
© 2022 - 2024 — McMap. All rights reserved.