I have two related questions. First is how best to grok logs that have "messy" spacing and so on, and the second, which I'll ask separately, is how to deal with logs that have arbitrary attribute-value pairs. (See: logstash grok filter for logs with arbitrary attribute-value pairs )
So for the first question, I have a log line that looks like this:
14:46:16.603 [http-nio-8080-exec-4] INFO METERING - msg=93e6dd5e-c009-46b3-b9eb-f753ee3b889a CREATE_JOB job=a820018e-7ad7-481a-97b0-bd705c3280ad data=71b1652e-16c8-4b33-9a57-f5fcb3d5de92
Using http://grokdebug.herokuapp.com/ I was able to eventually come up with the following grok pattern that works for this line:
%{TIME:timestamp} %{NOTSPACE:http} %{WORD:loglevel}%{SPACE}%{WORD:logtype} - msg=%{NOTSPACE:msg}%{SPACE}%{WORD:action}%{SPACE}job=%{NOTSPACE:job}%{SPACE}data=%{NOTSPACE:data}
With the following config file:
input {
file {
path => "/home/robyn/testlogs/trimmed_logs.txt"
start_position => beginning
sincedb_path => "/dev/null" # for testing; allows reparsing
}
}
filter {
grok {
match => {"message" => "%{TIME:timestamp} %{NOTSPACE:http} %{WORD:loglevel}%{SPACE}%{WORD:logtype} - msg=%{NOTSPACE:msg}%{SPACE}%{WORD:action}%{SPACE}job=%{NOTSPACE:job}%{SPACE}data=%{NOTSPACE:data}" }
}
}
output {
file {
path => "/home/robyn/filteredlogs/trimmed_logs.out.txt"
}
}
I get the following output:
{"message":"14:46:16.603 [http-nio-8080-exec-4] INFO METERING - msg=93e6dd5e-c009-46b3-b9eb-f753ee3b889a CREATE_JOB job=a820018e-7ad7-481a-97b0-bd705c3280ad data=71b1652e-16c8-4b33-9a57-f5fcb3d5de92","@version":"1","@timestamp":"2015-08-07 T17:55:16.529Z","host":"hlt-dev","path":"/home/robyn/testlogs/trimmed_logs.txt","timestamp":"14:46:16.603","http":"[http-nio-8080-exec-4]","loglevel":"INFO","logtype":"METERING","msg":"93e6dd5e-c009-46b3-b9eb-f753ee3b889a","action":"CREATE_JOB","job":"a820018e-7ad7-481a-97b0-bd705c3280ad","data":"71b1652e-16c8-4b33-9a57-f5fcb3d5de92"}
That's pretty much what I want, but I feel like it's a really kludgy pattern, particularly with the need to use %{SPACE} and %{NOSPACE} so much. This suggests to me that I'm not really doing this the best possible way. Should I be creating a more specific pattern for the hex ids? I think I need the %{SPACE} between loglevel and logtype because of the extra space between INFO and METERING in the log, but that also feels kludgy.
Also how do I get the log's timestamp to replace the @timestamp that seems to be the time logstash ingested the log, which we don't want/need.
Obviously I'm just getting started with ELK and grok, so pointers to useful resources are also appreciated.