Need information on Grok patterns that use non capturing group (?: )
Asked Answered
C

1

9

I understand the concept of writing regular expressions using capturing and non-capturing groups.

Ex:

a(b|c) would match and capture ab and ac

a(?:b|c) would match ab and ac but capture a

But how is it useful when I make a new custom grok pattern and what it means to use non-capturing groups.

Looking at a few existing grok patterns like the one below for HOUR:

HOUR (?:2[0123]|[01]?[0-9])

Here we can match the hour format using (2[0123]|[01]?[0-9]) as well. What makes the grok pattern use the non-capturing expression here? Based on what parameters should I decide to use this (?:subex)

Chaqueta answered 8/7, 2016 at 16:34 Comment(2)
I don't know what language you are using, but I think that's completely wrong. a(b|c) usually captures b or c (depending on whether the pattern matched ab or ac), and a(?:b|c) captures nothing at all. The difference is one of performance; why capture something when you don't need to do?Six
Re "Here we can match the hour format using (2[0123]|[01]?[0-9]) as well", No surprise there; capturing doesn't change what a pattern matches.Six
T
9

The difference between a pattern with a capturing group or without in Grok is whether you need to create a field or not.

The (?:2[0123]|[01]?[0-9]) pattern contains a non-capturing group that is only used for grouping subpattern sequences. The (2[0123]|[01]?[0-9]) regex contains a numbered capturing group that matches and captures the value (=stores in some additional buffer with ID equal to the order of the capture group in the pattern). Mind that there are also named capture groups, like (?<field>2[0123]|[01]?[0-9]) that assigns the value captured to a named group.

With named_captures_only parameter set to false, a(b|c) regex will match ab or ac and assign a b or c to a separate field. When you use a non-capturing group a(?:b|c), no field will ever get created, this text will only be matched.

Since named_captures_only parameter default value is True, the difference between a numbered capturing or non-capturing group is removed in Grok patterns. So, by default only named captures (like a(?<myfield>b|c)) can be used to create fields.

I think the preference is given to non-capturing groups in common Grok patterns in order not to depend on the named_captures_only parameter setting.

Trichite answered 8/7, 2016 at 16:51 Comment(3)
The difference between a pattern with a capturing group or without in Grok is whether you need to create a field or not. This helped answer my question and now i could implement the same in the matches.Chaqueta
I would also like to know the significance of named_captures_only. If I define a set of grok patterns, it would match only the ones that are defined in the pattern set?? That's what it means?.Chaqueta
That means only named captures will be taken, numbered ones will be ignored.Donela

© 2022 - 2024 — McMap. All rights reserved.