POSIX character equivalents in Java regular expressions
Asked Answered
L

3

8

I would like to use a regular expression like this in Java : [[=a=][=e=][=i=]].

But Java doesn't support the POSIX classes [=a=], [=e=] etc.

How can I do this? More precisely, is there a way to not use US-ASCII?

Lois answered 7/7, 2011 at 15:12 Comment(1)
Please do not add third-party stat trackers to your posts. Thanks.Tucci
D
15

Java does support posix character classes. The syntax is just different, for instance:

\p{Lower}
\p{Upper}
\p{ASCII}
\p{Alpha}
\p{Digit}
\p{Alnum}
\p{Punct}
\p{Graph}
\p{Print}
\p{Blank}
\p{Cntrl}
\p{XDigit}
\p{Space}
Destitution answered 7/7, 2011 at 15:16 Comment(2)
US ASCII only. Is there a way to use some locale ?Lois
@Stephan, unfortunately no way that I know of. You can always match unicode characters manually though to create your own character groups.Disbar
S
6

Quoting from http://download.oracle.com/javase/1.6.0/docs/api/java/util/regex/Pattern.html

POSIX character classes (US-ASCII only)

\p{Lower}   A lower-case alphabetic character: [a-z]
\p{Upper}   An upper-case alphabetic character:[A-Z]
\p{ASCII}   All ASCII:[\x00-\x7F]
\p{Alpha}   An alphabetic character:[\p{Lower}\p{Upper}]
\p{Digit}   A decimal digit: [0-9]
\p{Alnum}   An alphanumeric character:[\p{Alpha}\p{Digit}]
\p{Punct}   Punctuation: One of !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
\p{Graph}   A visible character: [\p{Alnum}\p{Punct}]
\p{Print}   A printable character: [\p{Graph}\x20]
\p{Blank}   A space or a tab: [ \t]
\p{Cntrl}   A control character: [\x00-\x1F\x7F]
\p{XDigit}  A hexadecimal digit: [0-9a-fA-F]
\p{Space}   A whitespace character: [ \t\n\x0B\f\r]
Scrimpy answered 7/7, 2011 at 15:15 Comment(3)
I think POSIX also allows only ASCII, am I wrong? That must be a side note for users expecting posix to handle unicode.Scrimpy
On Oracle, they have implemented their regex flavor by following POSIX spec. They accept the special class [= =]. I didn't verify if the class adpats for the various locales Oracle supports though.Lois
The posix specification does support different locales with collation equivalence classes described under point seven of the Posix Specification for Regular Expressions: pubs.opengroup.org/onlinepubs/009695399/basedefs/…Hauteur
C
2

Copied from here

Java does not support POSIX bracket expressions, but does support POSIX character classes using the \p operator. Though the \p syntax is borrowed from the syntax for Unicode properties, the POSIX classes in Java only match ASCII characters as indicated below. The class names are case sensitive. Unlike the POSIX syntax which can only be used inside a bracket expression, Java's \p can be used inside and outside bracket expressions.

Combe answered 7/7, 2011 at 15:15 Comment(1)
Tks for prompt reply, but is there a way to use some locale ?Lois

© 2022 - 2024 — McMap. All rights reserved.