Trying to write a regex matcher for roman numerals. In sed (which I think is considered 'standard' for regex?), if you have multiple options delimited by the alternation operator, it will match the longest. Namely, "I|II|III|IV"
will match "IV" for "IV" and "III" for "III"
In Java, the same pattern matches "I" for "IV" and "I" for "III". Turns out Java chooses between alternation matches left-to-right; that is, because "I" appears before "III" in the regex, it matches. If I change the regex to "IV|III|II|I"
, the behavior is corrected, but this obviously isn't a solution in general.
Is there a way to make Java choose the longest match out of an alternation group, instead of choosing the 'first'?
A code sample for clarity:
public static void main(String[] args)
{
Pattern p = Pattern.compile("six|sixty");
Matcher m = p.matcher("The year was nineteen sixty five.");
if (m.find())
{
System.out.println(m.group());
}
else
{
System.out.println("wtf?");
}
}
This outputs "six"