I have been trying to parse my python traceback logs using logstash. My logs look like this:
[pid: 26422|app: 0|req: 73/73] 192.168.1.1 () {34 vars in 592 bytes} [Wed Feb 18 13:35:55 2015] GET /data => generated 2538923 bytes in 4078 msecs (HTTP/1.1 200) 2 headers in 85 bytes (1 switches on core 0)
Traceback (most recent call last):
File "/var/www/analytics/parser.py", line 257, in parselogfile
parselogline(basedir, lne)
File "/var/www/analytics/parser.py", line 157, in parselogline
pval = understandpost(parts[3])
File "/var/www/analytics/parser.py", line 98, in understandpost
val = json.loads(dct["events"])
File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Unterminated string starting at: line 1 column 355 (char 354)
So far I've been able to parse the logs except the last line i.e.
ValueError: Unterminated string starting at: line 1 column 355 (char 354)
I'm using the multiline filter to do so. My logstash configuration looks something like this:
filter {
multiline {
pattern => "^Traceback"
what => "previous"
}
multiline {
pattern => "^ "
what => "previous"
}
grok {
match => [
"message", "\[pid\: %{NUMBER:process_id:int}\|app: 0\|req: %{NUMBER}/%{NUMBER}\] %{IPORHOST:clientip} \(\) \{%{NUMBER:vars:int} vars in %{NUMBER:bytes:int} bytes\} \[%{GREEDYDATA:timestamp}\] %{WORD:method} /%{GREEDYDATA:referrer} \=\> generated %{NUMBER:generated_bytes:int} bytes in %{NUMBER} msecs \(HTTP/%{NUMBER} %{NUMBER:status_code:int}\) %{NUMBER:headers:int} headers in %{NUMBER:header_bytes:int} bytes \(%{NUMBER:switches:int} switches on core %{NUMBER:core:int}\)%{GREEDYDATA:traceback}"
]
}
if "_grokparsefailure" in [tags] {
multiline {
pattern => "^.*$"
what => "previous"
negate => "true"
}
}
if "_grokparsefailure" in [tags] {
grok {
match => [
"message", "\[pid\: %{NUMBER:process_id:int}\|app: 0\|req: %{NUMBER}/%{NUMBER}\] %{IPORHOST:clientip} \(\) \{%{NUMBER:vars:int} vars in %{NUMBER:bytes:int} bytes\} \[%{GREEDYDATA:timestamp}\] %{WORD:method} /%{GREEDYDATA:referrer} \=\> generated %{NUMBER:generated_bytes:int} bytes in %{NUMBER} msecs \(HTTP/%{NUMBER} %{NUMBER:status_code:int}\) %{NUMBER:headers:int} headers in %{NUMBER:header_bytes:int} bytes \(%{NUMBER:switches:int} switches on core %{NUMBER:core:int}\)%{GREEDYDATA:traceback}"
]
remove_tag => ["_grokparsefailure"]
}
}
}
But my last line isn't parsing. Instead it is still giving me an error and has also increased the processing time exponentially. Any advice on how to parse the last line of the traceback?
[
for my log identification and other lines are to be appended using the multiline filter. I will post the solution here in case someone else needs to parse python logs. – Ellanellard