I recently had to explain Atomic Groups to someone else and I thought I'd tweak and share the example here.
Consider /the (big|small|biggest) (cat|dog|bird)/
Matches in bold
- the big dog
- the small bird
- the biggest dog
- the small cat
DEMO
For the first line, a regex engine would find the
.
It would then proceed on to our adjectives (big
, small
, biggest
), it finds big
.
Having matched big
, it proceeds and finds the space.
It then looks at our pets (cat
, dog
, bird
), finds cat
, skips it, and finds dog
.
For the second line, our regex would find the
.
It would proceed and look at big
, skip it, look at and find small
.
It finds the space, skips cat
and dog
because they don't match, and finds bird
.
For the third line, our regex would find the
,
It continues on and finds big
which matches the immediate requirement, and proceeds.
It can't find the space, so it backtracks (rewinds the position to the last choice it made).
It skips big
, skips small
, and finds biggest
which also matches the immediate requirement.
It then finds the space.
It skips cat
, and matches dog
.
For the fourth line, our regex would find the
.
It would proceed to look at big
, skip it, look at and find small
.
It then finds the space.
It looks at and matches cat
.
Consider /the (?>big|small|biggest) (cat|dog|bird)/
Note the ?>
atomic group on adjectives.
Matches in bold
- the big dog
- the small bird
- the biggest dog
- the small cat
DEMO
For the first line, second line, and fourth line, we'll get the same result.
For the third line, our regex would find the
,
It continues on and find big
which matches the immediate requirement, and proceeds.
It can't find the space, but the atomic group, being the last choice the engine made, won't allow that choice to be re-examined (prohibits backtracking).
Since it can't make a new choice, the match has to fail, since our simple expression has no other choices.
This is only a basic summary. An engine wouldn't need to look at the entirety of cat
to know that it doesn't match dog
, merely looking at the c
is enough. When trying to match bird, the c
in cat
and the d
in dog are enough to tell the engine to examine other options.
However if you had ...((cat|snake)|dog|bird)
, the engine would also, of course, need to examine snake before it dropped to the previous group and examined dog and bird.
There are also plenty of choices an engine can't decide without going past what may not seem like a match, which is what results in backtracking. If you have ((red)?cat|dog|bird)
, The engine will look at r
, back out, notice the ?
quantifier, ignore the subgroup (red)
, and look for a match.
named_captures
docs that you linked to? – Importunegrouping
I have,then how and why we need suchatomic grouping
what it can do that generalgrouping
can't. Could you help me to have such basic understanding to be cleared out? – Lilililia