Can fluent-bit parse multiple types of log lines from one file?
Asked Answered
S

3

7

I have a fairly simple Apache deployment in k8s using fluent-bit v1.5 as the log forwarder. My setup is nearly identical to the one in the repo below. I'm running AWS EKS and outputting the logs to AWS ElasticSearch Service.

https://github.com/fluent/fluent-bit-kubernetes-logging

The ConfigMap is here: https://github.com/fluent/fluent-bit-kubernetes-logging/blob/master/output/elasticsearch/fluent-bit-configmap.yaml

The Apache access (-> /dev/stdout) and error (-> /dev/stderr) log lines are both in the same container logfile on the node. The problem I'm having is that fluent-bit doesn't seem to autodetect which Parser to use, I'm not sure if it's supposed to, and we can only specify one parser in the deployment's annotation section, I've specified apache. So in the end, the error log lines, which are written to the same file but come from stderr, are not parsed. Should I be sending the logs from fluent-bit to fluentd to handle the error files, assuming fluentd can handle this, or should I somehow pump only the error lines back into fluent-bit, for parsing?

Am I missing something?

Thanks!

Sanmicheli answered 8/10, 2020 at 19:54 Comment(0)
S
7

I was able to apply a second (and third) parser to the logs by using the FluentBit FILTER with the 'parser' plugin (Name), like below.

Documented here: https://docs.fluentbit.io/manual/pipeline/filters/parser

[FILTER]
    Name            parser
    Match           kube.*
    Parser          apache_error_custom
    Parser          apache_error
    Preserve_Key    On
    Reserve_Data    On
    Key_Name        log
Sanmicheli answered 10/10, 2020 at 17:2 Comment(1)
but does this apply both parsers every time, or does the second parser only activate if the first fails?Carsoncarstensz
U
8

Fluentbit is able to run multiple parsers on input.

If you add multiple parsers to your Parser filter as newlines (for non-multiline parsing as multiline supports comma seperated) eg.

[Filter]
    Name Parser
    Match *
    Parser parse_common_fields
    Parser json
    Key_Name log

The 1st parser parse_common_fields will attempt to parse the log, and only if it fails will the 2nd parser json attempt to parse these logs.

If you want to parse a log, and then parse it again for example only part of your log is JSON. Then you'll want to add 2 parsers after each other like:

[Filter]
    Name Parser
    Match *
    Parser parse_common_fields
    Key_Name log

[Filter]
    Name Parser
    Match *
    Parser json
    # This is the key from the parse_common_fields regex that we expect there to be JSON
    Key_Name log

Here is an example you can run to test this out:

Example

Attempting to parse a log but some of the log can be JSON and other times not.

Example log lines

2022-07-28T22:03:44.585+0000 [http-nio-8080-exec-3] [2a166faa-dbba-4210-a328-774861e3fdef][0ed32f19-47bb-4c1f-92c2-c9b7c43aa91f] INFO  SomeService:000 - Using decorator records threshold: 0
2022-07-29T11:36:59.236+0000 [http-nio-8080-exec-3] [][] INFO  CompleteOperationLogger:25 - {"action":"Complete","operation":"healthcheck","result":{"outcome":"Succeeded"},"metrics":{"delayBeforeExecution":0,"duration":0},"user":{},"tracking":{}}

parser.conf

[PARSER]
    Name   parse_common_fields
    Format regex
    Regex ^(?<timestamp>[^ ]+)\..+ \[(?<log_type>[^ \[\]]+)\] \[(?<transaction_id>[^ \[\]]*)\]\[(?<transaction_id2>[^ \[\]]*)\] (?<level>[^ ]*)\s+(?<service_id>[^ ]+) - (?<log>.+)$
    Time_Format %Y-%m-%dT%H:%M:%S
    Time_Key    timestamp

[PARSER]
    Name   json
    Format json

fluentbit.conf

[SERVICE]
    Flush     1
    Log_Level info
    Parsers_File parser.conf

[INPUT]
    NAME   dummy
    Dummy  {"log": "2022-07-28T22:03:44.585+0000 [http-nio-8080-exec-3] [2a166faa-dbba-4210-a328-774861e3fdef][0ed32f19-47bb-4c1f-92c2-c9b7c43aa91f] INFO  AnonymityService:245 - Using decorator records threshold: 0"}
    Tag    testing.deanm.non-json

[INPUT]
    NAME   dummy
    Dummy  {"log": "2022-07-29T11:36:59.236+0000 [http-nio-8080-exec-3] [][] INFO  CompleteOperationLogger:25 - {\"action\":\"Complete\",\"operation\":\"healthcheck\",\"result\":{\"outcome\":\"Succeeded\"},\"metrics\":{\"delayBeforeExecution\":0,\"duration\":0},\"user\":{},\"tracking\":{}}"}
    Tag    testing.deanm.json

[Filter]
    Name Parser
    Match *
    Parser parse_common_fields
    Key_Name log

[Filter]
    Name Parser
    Match *
    Parser json
    Key_Name log

[OUTPUT]
    Name  stdout
    Match *

Results

After the parse_common_fields filter runs on the log lines, it successfully parses the common fields and either will have log being a string or an escaped json string

First Pass

[0] testing.deanm.non-json: [1659045824.000000000, {"log_type"=>"http-nio-8080-exec-3", "transaction_id"=>"2a166faa-dbba-4210-a328-774861e3fdef", "transaction_id2"=>"0ed32f19-47bb-4c1f-92c2-c9b7c43aa91f", "level"=>"INFO", "service_id"=>"AnonymityService:245", "log"=>"Using decorator records threshold: 0"}]
[0] testing.deanm.json: [1659094619.000000000, {"log_type"=>"http-nio-8080-exec-3", "level"=>"INFO", "service_id"=>"CompleteOperationLogger:25", "log"=>"{"action":"Complete","operation":"healthcheck","result":{"outcome":"Succeeded"},"metrics":{"delayBeforeExecution":0,"duration":0},"user":{},"tracking":{}}"}]

Once the Filter json parses the logs, we successfully have the JSON also parsed correctly

Second Pass

[0] testing.deanm.non-json: [1659045824.000000000, {"log_type"=>"http-nio-8080-exec-3", "transaction_id"=>"2a166faa-dbba-4210-a328-774861e3fdef", "transaction_id2"=>"0ed32f19-47bb-4c1f-92c2-c9b7c43aa91f", "level"=>"INFO", "service_id"=>"AnonymityService:245", "log"=>"Using decorator records threshold: 0"}]
[0] testing.deanm.json: [1659094619.000000000, {"action"=>"Complete", "operation"=>"healthcheck", "result"=>{"outcome"=>"Succeeded"}, "metrics"=>{"delayBeforeExecution"=>0, "duration"=>0}, "user"=>{}, "tracking"=>{}}]

Note: The difference between result1 and result2 above is after the first pass the json string is still within the log object while the 2nd pass parses the json into it's own keys eg.:

Pass1:

[1659094619.000000000, {"log"=>"{"action": {"Complete", ...

Pass2:

[1659094619.000000000, {"action"=>"Complete", ...
Unlade answered 29/7, 2022 at 13:52 Comment(2)
are the logs actually in that json format or is that just how fluentbit reads them? most application logs are not in json format, so wondering. also wondering what kind of output would one use for application logs? stdout?Foin
This is off-topic.. but most logging libs have the option to output as JSON or you can instrument your application to use JSON. It's the recommended way to log from an observability standpoint. The answer to your question is it depends on what your goals are. If you have the log output User X performed action Y then you might want to extract only X, Y and send this to your SIEM instead of the full log line (or store both) to make it easier to process later on and reduce the cost of storage and/or processing the saved logs.Unlade
S
7

I was able to apply a second (and third) parser to the logs by using the FluentBit FILTER with the 'parser' plugin (Name), like below.

Documented here: https://docs.fluentbit.io/manual/pipeline/filters/parser

[FILTER]
    Name            parser
    Match           kube.*
    Parser          apache_error_custom
    Parser          apache_error
    Preserve_Key    On
    Reserve_Data    On
    Key_Name        log
Sanmicheli answered 10/10, 2020 at 17:2 Comment(1)
but does this apply both parsers every time, or does the second parser only activate if the first fails?Carsoncarstensz
C
1

Didn't see this for FluentBit, but for Fluentd:

Note format none as the last option means to keep log line as is, e.g. plaintext, if nothing else worked.

You can also use FluentBit as a pure log collector, and then have a separate Deployment with Fluentd that receives the stream from FluentBit, parses, and does all the outputs. Use type forward in FluentBit output in this case, source @type forward in Fluentd. Docs: https://docs.fluentbit.io/manual/pipeline/outputs/forward

Cavalla answered 9/10, 2020 at 14:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.