I'm trying to catch a section of Hebrew text (the origin is comments on a news site) using the following regex:
[\u0590-\u05FF \\p{Graph} \\s]+
It works for most comments but some comments are missed.
I've tried to debug this and it seems there's a Hebrew letter that doesn't match the pattern.
When I extract this letter and print it's integer value it seems to be correct but still the regex doesn't catch it...
Ideas?
Pattern.UNICODE_CASE
inside yourPattern.compile
method? – StephinePattern p = Pattern.compile("YOUR_REGEX", Pattern.UNICODE_CASE);
– Stephine