You simply loop over the content and use the Character features to test it. I use real codepoints, so it supports supplementary characters of Unicode.
When dealing with code points, the index cannot simply be incremented by one, since some code points actually read two characters (aka code units). This is why I use the while and Character.charCount(int cp)
.
/** Method counts and prints number of lower/uppercase codepoints. */
static void countCharacterClasses(String input) {
int upper = 0;
int lower = 0;
int other = 0;
// index counts from 0 till end of string length
int index = 0;
while(index < input.length()) {
// we get the unicode code point at index
// this is the character at index-th position (but fits only in an int)
int cp = input.codePointAt(index);
// we increment index by 1 or 2, depending if cp fits in single char
index += Character.charCount(cp);
// the type of the codepoint is the character class
int type = Character.getType(cp);
// we care only about the character class for lower & uppercase letters
switch(type) {
case Character.UPPERCASE_LETTER:
upper++;
break;
case Character.LOWERCASE_LETTER:
lower++;
break;
default:
other++;
}
}
System.out.printf("Input has %d upper, %d lower and %d other codepoints%n",
upper, lower, other);
}
For this sample the result will be:
// test with plain letters, numbers and international chars:
countCharacterClasses("AABBÄäoßabc0\uD801\uDC00");
// U+10400 "DESERET CAPITAL LETTER LONG I" is 2 char UTF16: D801 DC00
Input has 6 upper, 6 lower and 1 other codepoints
It count the german sharp-s as lowercase (there is no uppercase variant) and the special supplement codepoint (which is two codeunits/char long) as uppercase. The number will be counted as "other".
Using Character.getType(int cp)
instead of Character.isUpperCase()
has the advantage that it only needs to look at the code point once for multiple (all) character classes. This can also be used to count all different classes (letters, whitespace, control and all the fancy other unicode classes (TITLECASE_LETTER etc).
For a good background read on why you need to care about codepoints und units, check out: http://www.joelonsoftware.com/articles/Unicode.html
Character.isUpperCase(char ch)
static method as well as aCharacter.isLowerCase(char ch)
static method that can help you out. – ChartismCharAt
. If you look at the documentation for String (which you should have done before starting this endeavor) you will see that there is an instance method calledcharAt
. Instance method means that it must be qualified by an "instance" -- an object of the class. (In this case the class is, of course,String
, and you have an object of that class in your code.) – Idem