logstash grok filter for custom logs

Asked 7/8, 2015 at 18:7 Answered 29/9, 2016 at 7:51

I have two related questions. First is how best to grok logs that have "messy" spacing and so on, and the second, which I'll ask separately, is how to deal with logs that have arbitrary attribute-value pairs. (See: logstash grok filter for logs with arbitrary attribute-value pairs )

So for the first question, I have a log line that looks like this:

14:46:16.603 [http-nio-8080-exec-4] INFO  METERING - msg=93e6dd5e-c009-46b3-b9eb-f753ee3b889a CREATE_JOB job=a820018e-7ad7-481a-97b0-bd705c3280ad data=71b1652e-16c8-4b33-9a57-f5fcb3d5de92

Using http://grokdebug.herokuapp.com/ I was able to eventually come up with the following grok pattern that works for this line:

%{TIME:timestamp} %{NOTSPACE:http} %{WORD:loglevel}%{SPACE}%{WORD:logtype} - msg=%{NOTSPACE:msg}%{SPACE}%{WORD:action}%{SPACE}job=%{NOTSPACE:job}%{SPACE}data=%{NOTSPACE:data}

With the following config file:

input {
        file {
                path => "/home/robyn/testlogs/trimmed_logs.txt"
                start_position => beginning
                sincedb_path => "/dev/null" # for testing; allows reparsing
        }
}
filter {
        grok {
                match => {"message" => "%{TIME:timestamp} %{NOTSPACE:http} %{WORD:loglevel}%{SPACE}%{WORD:logtype} - msg=%{NOTSPACE:msg}%{SPACE}%{WORD:action}%{SPACE}job=%{NOTSPACE:job}%{SPACE}data=%{NOTSPACE:data}" }
        }
}
output {
        file {
                path => "/home/robyn/filteredlogs/trimmed_logs.out.txt"
        }
}

I get the following output:

{"message":"14:46:16.603 [http-nio-8080-exec-4] INFO  METERING - msg=93e6dd5e-c009-46b3-b9eb-f753ee3b889a CREATE_JOB job=a820018e-7ad7-481a-97b0-bd705c3280ad data=71b1652e-16c8-4b33-9a57-f5fcb3d5de92","@version":"1","@timestamp":"2015-08-07 T17:55:16.529Z","host":"hlt-dev","path":"/home/robyn/testlogs/trimmed_logs.txt","timestamp":"14:46:16.603","http":"[http-nio-8080-exec-4]","loglevel":"INFO","logtype":"METERING","msg":"93e6dd5e-c009-46b3-b9eb-f753ee3b889a","action":"CREATE_JOB","job":"a820018e-7ad7-481a-97b0-bd705c3280ad","data":"71b1652e-16c8-4b33-9a57-f5fcb3d5de92"}

That's pretty much what I want, but I feel like it's a really kludgy pattern, particularly with the need to use %{SPACE} and %{NOSPACE} so much. This suggests to me that I'm not really doing this the best possible way. Should I be creating a more specific pattern for the hex ids? I think I need the %{SPACE} between loglevel and logtype because of the extra space between INFO and METERING in the log, but that also feels kludgy.

Also how do I get the log's timestamp to replace the @timestamp that seems to be the time logstash ingested the log, which we don't want/need.

Obviously I'm just getting started with ELK and grok, so pointers to useful resources are also appreciated.

Spiel answered 7/8, 2015 at 18:7 Comment(0)

There is an existing pattern you can use instead of NOTSPACE, it's UUID. Also when there's a single space, there's no need to use the SPACE pattern, you can leave it out. I'm also using the USERNAME pattern (maybe wrongly named) just for the sake of capturing the http field.

So it would go like this and you only have a single SPACE pattern to capture multiple spaces.

Sample log line:

14:46:16.603 [http-nio-8080-exec-4] INFO  METERING - msg=93e6dd5e-c009-46b3-b9eb-f753ee3b889a CREATE_JOB job=a820018e-7ad7-481a-97b0-bd705c3280ad data=71b1652e-16c8-4b33-9a57-f5fcb3d5de92

Grok pattern:

%{TIME:timestamp} \[%{USERNAME:http}\] %{WORD:loglevel}%{SPACE}%{WORD:logtype} - msg=%{UUID:msg} %{WORD:action} job=%{UUID:job} data=%{UUID:data}

Grok will spit this out:

{
  "timestamp": [
    [
      "14:46:16.603"
    ]
  ],
  "HOUR": [
    [
      "14"
    ]
  ],
  "MINUTE": [
    [
      "46"
    ]
  ],
  "SECOND": [
    [
      "16.603"
    ]
  ],
  "http": [
    [
      "http-nio-8080-exec-4"
    ]
  ],
  "loglevel": [
    [
      "INFO"
    ]
  ],
  "SPACE": [
    [
      "  "
    ]
  ],
  "logtype": [
    [
      "METERING"
    ]
  ],
  "msg": [
    [
      "93e6dd5e-c009-46b3-b9eb-f753ee3b889a"
    ]
  ],
  "action": [
    [
      "CREATE_JOB"
    ]
  ],
  "job": [
    [
      "a820018e-7ad7-481a-97b0-bd705c3280ad"
    ]
  ],
  "data": [
    [
      "71b1652e-16c8-4b33-9a57-f5fcb3d5de92"
    ]
  ]
}

Hulking answered 21/8, 2015 at 5:11 Comment(2)

can you help me write grok filter for this pattern : [2016-10-28T12:13:20,388][INFO ][o.e.p.PluginsService ] [hTYKFFt] loaded module [ingest-common] I tried like this for the same : { [%{TIME:TIMESTAMP}]%{SPACE}%[%{WORD:loglevel}]%{SPACE}%[%{WORD:data}%{SPACE}%{WORD:data} %{data:message} %[{WORD:message}] } Can you help me with this grok? – Naman 2/11, 2016 at 13:17

@SoundaryaThiagarajan You should create a new question with this. – Hulking 2/11, 2016 at 13:20

There is also the possibility to use \s* instead of the SPACE pattern.

For deleting fields you can use the mutate plugin there is a method called "remove_field" --> https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html#plugins-filters-mutate-remove_field

If you delete this field, you have to add a new index in kibana. Because kibana sorts events with the @timestamp field if nothing else is choosen.

Putrid answered 29/9, 2016 at 7:51 Comment(0)

Recommended topics

Hot tags