I am conducting some research on identify DOI in free format text.
I am using Java 8 and REGEX
I Have found these REGEX's that are supposed to fulfil my requirements
/^10.\d{4,9}/[-._;()/:A-Z0-9]+$/i
/^10.1002/[^\s]+$/i
/^10.\d{4}/\d+-\d+X?(\d+)\d+<[\d\w]+:[\d\w]*>\d+.\d+.\w+;\d$/i
/^10.1021/\w\w\d++$/i
/^10.1207/[\w\d]+\&\d+_\d+$/i
The code I am trying is
private static final Pattern pattern_one = Pattern.compile("/^10.\\d{4,9}/[-._;()/:A-Z0-9]+$/i", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern_one.matcher("http://journals.ametsoc.org/doi/full/10.1175/JPO3002.1");
while (matcher.find()) {
System.out.print("Start index: " + matcher.start());
System.out.print(" End index: " + matcher.end() + " ");
System.out.println(matcher.group());
}
However the matcher doesnt find anything.
Where have I gone wrong?
UPDATE
I have encountered a valid DOI that my set of REGEXs do not match
heres an example DOI : 10.1175/1520-0485(2002)032<0870:CT>2.0.CO;2
Why doesn't this pattern work?
/^10.\d{4}/\d+-\d+X?(\d+)\d+<[\d\w]+:[\d\w]*>\d+.\d+.\w+;\d$/i