How do I refer to a regex group inside a custom grok pattern?
Asked Answered
D

2

6

I want to add fields for specific URI params in my log lines

here is an example log line:

2017-03-12 21:34:36 W3SVC1 webserver 1.1.1.1 GET /webpage.html param1=11111&param2=22222&param3=&param4=4444444 80 - 2.2.2.2 HTTP/1.1 Java/1.8.0_121 - - balh.com 200 0 0 311 244 247 - -

I want to add fields for param1, param2, param3 and param4.

I am using this grok filter:

  grok {
    match => [ "message", "(?<param1>param1=(.*?)&)"]
  }

So this regex uses a capture group to get text between "param1=" and "&". But grok is ignoring the capture group and getting "param1=11111&" I just want to capture the "111111"

How can I say use capture group 1 or tell grok to use my regex capture group?

Edit This almost works:

  grok {
    match => [ "message", "(?<param1>param1=(?<param1>.*?)&)"]
  }

So I guess what I'm doing here is using two named groups but with the same name. The problem is that the "param1" field has two entries in it for each group. One for "param1=11111&" and one for "11111". How do I just get that second group?

Demean answered 12/3, 2017 at 23:8 Comment(0)
M
8

How can I say use capture group 1 or tell grok to use my regex capture group?

By default, only named capturing groups are considered by grok, numbered capturing groups do not trigger a field creation. If you want to override this behavior, set named_captures_only to false:

named_captures_only
- Value type is boolean
- Default value is true
If true, only store named captures from grok.

However, there is nothing wrong in using a named capturing group (and I'd use a negated character class [^&]* instead of a lazy matching dot with a consuming & after it):

\bparam1=(?<param1>[^&]*)

[^&]* matches 0 or more characters other than &, and thus will also match the empty parameter (that you may want to avoid by changing * to +, or control with the keep_empty_captures parameter) and at the end of the string.

enter image description here

Mycah answered 13/3, 2017 at 8:1 Comment(2)
Is doing it this way faster or less resource intensive than the way I am doing it?Demean
A negated character class with a greedy quantifier is matching much faster than a lazily quantified dot. I do not believe there is a big difference in performance in practice since the input is not a very long string, however, it is best practice to use appropriate tools (here, patterns) for each situation, and in regex, that means you need to use a negated character class when you need to match chars from a define range/set or other chars than those defined.Chromaticity
D
1

This works:

  grok {
    match => [ "message", "(?:param1=(?<param1>.*?)&)"]
  }

So I guess what I'm doing here is using a non-capturing group with a named capturing group nested inside it. So the parent group's match is discarded and the nested named match is the only thing that is returned.

Is this doing what I think its doing or is this wrong and its dumb luck it does what I want?

Demean answered 13/3, 2017 at 6:26 Comment(1)
Just FYI: the (?:...), a non-capturing group, does nothing here, if you remove it, the pattern will work the same way. A non-capturing group is only necessary when it either contains alternations, or if it is quantified (when you need to match n to m occurrences, one/zero or more, or one or zero occurrences).Chromaticity

© 2022 - 2024 — McMap. All rights reserved.