Grok - parsing optional fields
Asked Answered
C

2

7

I've got data coming from kafka and I want to send them to ElasticSearch. I've got a log like this with tags:

<TOTO><ID_APPLICATION>APPLI_A|PRF|ENV_1|00</ID_APPLICATION><TN>3</TN></TOTO>

I'm trying to parse it with grok using grok debugger:

\<ID_APPLICATION\>%{WORD:APPLICATION}\|%{WORD:PROFIL}\|%{WORD:ENV}\|%{WORD:CODE}\</ID_APPLICATION\>\<TN\>%{NUMBER:TN}\</TN\>

It works, but sometimes the log has a new field like this (the one with the tag <TP>):

<TOTO><ID_APPLICATION>APPLI_A|PRF|ENV_1|00</ID_APPLICATION><TN>3</TN><TP>new</TP></TOTO>

I'd like to get lines with this field (the TP tag) and lines without. How can I do that?

Civics answered 12/1, 2016 at 15:14 Comment(4)
Are you using : grokdebug.herokuapp.com as a debugger ?Severini
It looks like you can use an optional group: <ID_APPLICATION>%{WORD:APPLICATION}\|%{WORD:PROFIL}\|%{WORD:ENV}\|%{WORD:CODE}</ID_APPLICATION><TN>%{NUMBER:TN}</TN>(?:<TP>%{WORD:TP}</TP>)?. Please try and let me know if this is working for you.Lawanda
It works ! Thanks a lot !!Civics
Please consider accepting my below answer.Lawanda
F
12

If you have an optional field, you can match it with an optional named capturing group:

(?:<TP>%{WORD:TP}</TP>)?
^^^                    ^

The non-capturing group does not save any submatches in memory and is used for grouping only, and ? quantifier matches 1 or 0 times (=optional). It will create a TP field with a value of type word. If the field is absent, the value will be null.

So, the whole pattern will look like:

<ID_APPLICATION>%{WORD:APPLICATION}\|%{WORD:PROFIL}\|%{WORD:ENV}\|%{WORD:CODE}</ID_APPLICATION><TN>%{NUMBER:TN}</TN>(?:<TP>%{WORD:TP}</TP>)?
Francklin answered 12/1, 2016 at 16:11 Comment(0)
S
0

This is the filter I used in Heroku App and reading this Documentation on how to use grok operators.

I created my own pattern, called "content" that will retrieve whatever it is inside your TP tags.

\<ID_APPLICATION\>%{WORD:APPLICATION}\|%{WORD:PROFIL}\|%{WORD:ENV}\|%{WORD:CODE}\<\/ID_APPLICATION\>\<TN>%{NUMBER:TN}\<\/TN\>(\<TP\>(?<content>(.)*)\<\/TP\>)?

Basically, I just added an optionnal tag to your pattern.

(<TP> ... </TP>)? 

To retrieve the content, which I assume can be anything, I added the following inside the optional tags.

(?<content>(.)*)
Severini answered 12/1, 2016 at 16:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.