How do I match a newline in grok/logstash?

Asked 20/10, 2014 at 21:4 Answered 30/4, 2020 at 10:13

I have a remote machine that combines multiline events and sends them across the lumberjack protocol.

What comes in is something that looks like this:

{
     "message" => "2014-10-20T20:52:56.133+0000 host 2014-10-20 15:52:56,036 [ERROR   ][app.logic     ] Failed to turn message into JSON\nTraceback (most recent call last):\n  File \"somefile.py", line 249, in _get_values\n    return r.json()\n  File \"/path/to/env/lib/python3.4/site-packages/requests/models.py\", line 793, in json\n    return json.loads(self.text, **kwargs)\n  File \"/usr/local/lib/python3.4/json/__init__.py\", line 318, in loads\n    return _default_decoder.decode(s)\n  File \"/usr/local/lib/python3.4/json/decoder.py\", line 343, in decode\n    obj, end = self.raw_decode(s, idx=_w(s, 0).end())\n  File \"/usr/local/lib/python3.4/json/decoder.py\", line 361, in raw_decode\n    raise ValueError(errmsg(\"Expecting value\", s, err.value)) from None\nValueError: Expecting value: line 1 column 1 (char 0), Failed to turn message into JSON"
}

When I try to match the message with

grok {         
    match => [ "message", "%{TIMESTAMP_ISO8601:timestamp} \[%LOGLEVEL:loglevel}%{    SPACE}\]\[%{NOTSPACE:module}%{SPACE}\]%{GREEDYDATA:message}" ]
}

the GREEDYDATA is not nearly as greedy as I would like.

So then I tried to use gsub:

mutate {
    gsub => ["message", "\n", "LINE_BREAK"]
}
# Grok goes here
mutate {
    gsub => ["message", "LINE_BREAK", "\n"]
}

but that one didn't work rather than

The Quick brown fox
jumps over the lazy
groks

I got

The Quick brown fox\njumps over the lazy\ngroks

So...

How do I either add the newline back to my data, make the GREEDYDATA match my newlines, or in some other way grab the relevant portion of my message?

Certificate answered 20/10, 2014 at 21:4 Comment(2)

Looks like a duplicate of #24308465. – Specialistic 21/10, 2014 at 5:40

@MagnusBäck basically yes, though that question doesn't care about newlines but I do require the newlines to exist in the resulting message. – Certificate 21/10, 2014 at 12:53

All GREEDYDATA is is .*, but . doesn't match newline, so you can replace %{GREEDYDATA:message} with (?<message>(.|\r|\n)*)and get it to be truly greedy.

Liquidize answered 20/10, 2014 at 21:33 Comment(2)

(?<message>(.|\r|\n)*) did it! Had 20 tabs open and here I find it in a not so highly upvoted answer. Thank you very much. – Wellstacked 14/4, 2015 at 4:50

(.|\r|\n)* is one of the most misfortunate patterns that are absolute evil as this is performance killer pattern. To match any character with ., just use the appropriate modifier, in Oniguruma, it is (?m). In PCRE and PCRE-related flavors, use (?s). In JS, use [^] or [\s\S] instead of a dot. – Nealneala 19/10, 2016 at 13:2

Adding the regex flag to the beginning allows for matching newlines:

match => [ "message", "(?m)%{TIMESTA...

Certificate answered 20/10, 2014 at 21:45 Comment(1)

Thanks. This also works for things like gsub too, not just grok. Eg. to extract the first line from a Message field (sent from Active Directory) Input:

"Message" => "The computer attempted to validate the credentials for an account.\r\n\r\nAuthentication Package:\tMICROSOFT_AUTHENTICATION_PACKAGE_V1_0\r\n

Code: gsub => [ "Message", "^(?m)([^\r]*).*", "\1" ] Output: "Message" => "The computer attempted to validate the credentials for an account." – Casement 26/1, 2016 at 0:40

My final grok for Vertica log using (?m) and [^\n]+

match => ["message","(?m)%{TIMESTAMP_ISO8601:ClientTimestamp}%{SPACE}(%{DATA:Action}:)?(%{DATA:ThreadID} )?(\[%{DATA:Module}\] )?(\<%{DATA:Level}\> )?(\[%{DATA:SubAction}\] )?(@%{DATA:Nodename}:)?( (?<Session>(\{.*?\} )?.*?/.*?): )?(?<message>[^\n]+)((\n)?(\t)?(?<StackTrace>[^\n]+))?"]

Thanks to asperla

https://github.com/elastic/logstash/issues/2282

Finnougrian answered 30/4, 2020 at 10:13 Comment(0)

Recommended topics

Hot tags