DFA Based Regular Expression Engines for Java with Capture
Asked Answered
T

5

11

Are there any (free) regular expression engines for Java, that can compile a regular expression to a DFA, and do group capturing while matching the DFA ?

I've found dk.brics.automaton and jrexx, which both compile to DFA, but neither seems to be able to do group capture. While the other engines I've found seem to compile to NFA.

Thagard answered 26/12, 2009 at 18:21 Comment(5)
For performance optimization.Thagard
I'm asking because usually those performances benefits arise from the inability of DFA engines to backtrack. If that's the case, perhaps you could achieve the same using atomic grouping/possessive quantifiers. Maybe you could post some examples of what you want to achieve?Sternick
How about some examples so we can see if your problem can be solved in a different way - since nobody seems to know a Java DFA regex engine with group capture... if it really is a problem and not premature optimization :)Sternick
Thank you for you interest, but there really isn't a problem that needs to be solved in a different way. I was just wanting to see, what performance effect a DFA based engine would have had for my application.Thagard
There shouldn't be any appreciable difference between a DFA and NFA engine if the NFA doesn't use backtracking. If you find a good linear space-time NFA engine you should use it. Even a naive implementation will still be constant-space/linear-time in the absence of groups.Crocidolite
L
3

try this one (probably not DFA but faster than java.util) http://jregex.sourceforge.net/gstarted-advanced.html#ngroups, or this one: http://userguide.icu-project.org

according to that test: http://tusker.org/regex/regex_benchmark.html, both are fast (we all know that the benchmarks only tests what the creator of the benchmark wanted to test).

When I needed really fast DFA regex I have spawned a process that used grep ;-) (For a 6GB log file it cut my times from 10minutes to a few seconds).

Laurielaurier answered 12/5, 2011 at 9:25 Comment(1)
I doubt that it’s faster than java.util.regex. These small libraries come and go, java.util.regex gets optimized year after year. If you’re not using a better algorithm, java.util.regex will beat you eventually. See my answer for a regular expression engine that is quite fundamentally different from java.util.regex, DFA-based, and therefore faster.Sunnysunproof
S
1

I recently wrote one: tree-regex.

Sunnysunproof answered 28/9, 2011 at 13:39 Comment(0)
F
0

For C there is TRE and Google's RE2 libraries. TRE uses DFA, RE2 uses NFA (as far as I understand), both could subgroup matching. But I didn't see such a library for Java.

Fodder answered 29/7, 2010 at 9:1 Comment(2)
RE2 is actually REALLY fast. It is worth pointing to it when people ask for regex and speed.Sunnysunproof
You've got it mixed up. TRE uses NFA, RE2 uses both NFAs and DFAs. Specifically, RE2 uses a DFA if there's at most one capture group, otherwise an NFA.Sunnysunproof
N
-2

you can try Pat regular expressions library @ http://www.javaregex.com/ .

Novosibirsk answered 7/2, 2010 at 3:19 Comment(2)
From the website it's at least not obvious at all, that this engine would be DFA based nor that it would support group capture. If it is, and does, could you please confirm.Thagard
That lib (Stevesoft Pat) does support capturing groups, but it's definitely not DFA-based.Bridgeboard
H
-2

dk.brics.automaton is DFA does appear to do capturing groups. I expect that feature is new in the two years since this question. Check out class AutomatonMatcher.

See http://www.brics.dk/automaton/doc/dk/brics/automaton/AutomatonMatcher.html#group(int)

Hugh answered 22/7, 2011 at 22:31 Comment(3)
It doesn't actually support group matching.Sunnysunproof
Updated with link to group-capture APIHugh
Yes, but did you read that link? "does not support capturing groups the only valid group is 0 (the entire match)."Sunnysunproof

© 2022 - 2024 — McMap. All rights reserved.