Fluent-bit - Splitting json log into structured fields in Elasticsearch
Asked Answered
R

4

17

I am trying to find a way in Fluent-bit config to tell/enforce ES to store plain json formatted logs (the log bit below that comes from docker stdout/stderror) in structured way - please see image at the bottom for better explanation. For example, apart from (or along with) storing the log as a plain json entry under log field, I would like to store each property individually as shown in red.

The documentation for Filters and Parsers are really poor and not clear. On top of that the forward input doesn't have a "parser" option. I tried json/docker/regex parsers but no luck. My regex is here if I have to use regex. Currently using ES (7.1), Fluent-bit (1.1.3) and Kibana (7.1) - not Kubernetes.

If anyone can direct me to an example or give one I would be much appreciated.

Thanks

{
  "_index": "hello",
  "_type": "logs",
  "_id": "T631e2sBChSKEuJw-HO4",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp": "2019-06-21T21:34:02.000Z",
    "tag": "php",
    "container_id": "53154cf4d4e8d7ecf31bdb6bc4a25fdf2f37156edc6b859ba0ddfa9c0ab1715b",
    "container_name": "/hello_php_1",
    "source": "stderr",
    "log": "{\"time_local\":\"2019-06-21T21:34:02+0000\",\"client_ip\":\"-\",\"remote_addr\":\"192.168.192.3\",\"remote_user\":\"\",\"request\":\"GET / HTTP/1.1\",\"status\":\"200\",\"body_bytes_sent\":\"0\",\"request_time\":\"0.001\",\"http_referrer\":\"-\",\"http_user_agent\":\"curl/7.38.0\",\"request_id\":\"91835d61520d289952b7e9b8f658e64f\"}"
  },
  "fields": {
    "@timestamp": [
      "2019-06-21T21:34:02.000Z"
    ]
  },
  "sort": [
    1561152842000
  ]
}

Thanks

conf

[SERVICE]
    Flush        5
    Daemon       Off
    Log_Level    debug
    Parsers_File parsers.conf

[INPUT]
    Name   forward
    Listen 0.0.0.0
    Port   24224

[OUTPUT]
    Name  es
    Match hello_*
    Host  elasticsearch
    Port  9200
    Index hello
    Type  logs
    Include_Tag_Key On
    Tag_Key tag

ssss

Rostand answered 1/7, 2019 at 20:0 Comment(0)
R
21

Solution is as follows.

[SERVICE]
    Flush        5
    Daemon       Off
    Log_Level    debug
    Parsers_File parsers.conf

[INPUT]
    Name         forward
    storage.type filesystem
    Listen       my_fluent_bit_service
    Port         24224

[FILTER]
    Name         parser
    Parser       docker
    Match        hello_*
    Key_Name     log
    Reserve_Data On
    Preserve_Key On

[OUTPUT]
    Name            es
    Host            my_elasticsearch_service
    Port            9200
    Match           hello_*
    Index           hello
    Type            logs
    Include_Tag_Key On
    Tag_Key         tag
[PARSER]
    Name         docker
    Format       json
    Time_Key     time
    Time_Format  %Y-%m-%dT%H:%M:%S.%L
    Time_Keep    On
    # Command      |  Decoder | Field | Optional Action
    # =============|==================|=================
    Decode_Field_As   escaped_utf8    log    do_next
    Decode_Field_As   json       log
Rostand answered 22/1, 2020 at 14:50 Comment(10)
Thank you very much for this answer. The documentation is simply horrendous. One question: how is your log entry ultimately decoded? I get a line of key=value (such as name=john age=27 city=paris) and not a decoded structure (it is not a JSON string aymore, but not a structure visible by Kibana either)Plott
Not sure if I understand what exactly you mean but my application logs are in JSON format by default. So your example would be {"name":"john","age":"27","city":"paris"} if it was my application. Afterwards this whole string would also look same in Kibana under log key as shown above in the image. I hope it helps. Also have a look at this for much detailed example.Rostand
Sorry for not having been clear. I used to have {"name":"john","age":"27","city":"paris"} as the message entry in my log, displayed as such by Kibana. I was hoping that this entry can be decoded by Fluent Bit so that it goes to Elasticsearch as a true JSON entry, and so that I have the keys name, ageand city as fields (at the same level as your entre tagor source.Plott
(cont'd) What I have is still a message entry which is now name=john age=27 city=paris (instead of the JSON string representation before). I was wondering if this is the expected behaviour (which makes the decoder useless because I cannot search on key city for instance)Plott
In other words, the entry under message has been rewritten from the string {"name":"john","age":"27","city":"paris"} into the string name=john age=27 city=paris, which is not the parsing I expected (→ to "explode" the JSON string into actual fields for Kibana)Plott
Also thanks for your link - I see that what the author got at the very end is exactly what I am looking for - so this must be something on my side. I get a JSON → key/value pairs translation instead of the expected parsing.Plott
If your app log is an JSON formatted string, you should have a log field in Kibana that contains the original JSON as is. On top of that, you should also have name, age and city as individual fields. All these depend on the parser so if you used the very last parser in that blog (same as the one I have above), it should work. Pay attention to filter bit in fluent-bit.conf file as well.Rostand
Hey @BentCoder, What if i has to parse log field which is not a json its a String at one line like "2020-07-11 10:55:38,022 - INFO kv : 1" if they are strictly aligned, 0 if not then what changes do we need to do on filter sideDunning
@AmanKumarSoni, you need to use format regex (with the named capture feature) for that: docs.fluentbit.io/manual/pipeline/parsers/regular-expressionBurkholder
How to name Key_Data if the key is nested. In my case it's log_processed['message']. And I tried: log_processed['message'], log_processed.message and log_processed_message none of these work.Any
P
1

Based on a log file with JSON objects separated by newlines, I was able to get it working with this config. No filters/decoders necessary.

The key point was to create a JSON parser, and set the parser name in the INPUT section.

Note that the tail docs say you should set up the DB option. This is just a minimal config to get it working.

fluent-bit.conf:

[SERVICE]
  Parsers_File parsers.conf

[INPUT]
  Name tail
  Parser myparser
  Path /json_objs_separated_by_newlines.log

[OUTPUT]
  Name es
  Host elasticsearch
  # Required for Elasticsearch 8+
  Suppress_Type_Name On

parsers.conf:

[PARSER]
  Name myparser
  Format json
Problem answered 20/12, 2023 at 9:41 Comment(0)
T
0

Answering for a more general use case when using Firelens with the aws-for-fluent-bit image, where the message end up as a top-level key log like here:

{
    "log": "{\"time_local\":\"2019-06-21T21:34:02+0000\",\"client_ip\":\"-\",\"remote_addr\":\"192.168.192.3\",\"remote_user\":\"\",\"request\":\"GET / HTTP/1.1\",\"status\":\"200\",\"body_bytes_sent\":\"0\",\"request_time\":\"0.001\",\"http_referrer\":\"-\",\"http_user_agent\":\"curl/7.38.0\",\"request_id\":\"91835d61520d289952b7e9b8f658e64f\"}"
}

Following this official AWS example, notice that a JSON parser already exists in the image, which can be used like so:

"firelensConfiguration": {
    "type": "fluentbit",
    "options": {
        "config-file-type": "file",
        "config-file-value": "/fluent-bit/configs/parse-json.conf"
    }
}

OR it can be called with an environment variable:

"environment": [
    {
        "name": "aws_fluent_bit_init_file_1",
        "value": "/fluent-bit/configs/parse-json.conf"
    }
]

Result:

{
    "time_local": "2019-06-21T21:34:02+0000",
    "client_ip": "-",
    "remote_addr": "192.168.192.3",
    "remote_user": "",
    "request": "GET / HTTP/1.1",
    "status": "200",
    "body_bytes_sent": "0",
    "request_time": "0.001",
    "http_referrer": "-",
    "http_user_agent": "curl/7.38.0",
    "request_id": "91835d61520d289952b7e9b8f658e64f"
}
Tiffanietiffanle answered 1/11, 2023 at 17:2 Comment(0)
S
-4

You can use the Fluent Bit Nest filter for that purpose, please refer to the following documentation:

https://docs.fluentbit.io/manual/filter/nest

Sometime answered 26/7, 2019 at 5:9 Comment(3)
OP - "The documentation for Filters and Parsers are really poor and not clear.". I've spent good enough time with the doc hence reason ended up with this question.Rostand
The documentation is EXTREMELY lackingMealtime
Nest is actually the opposite of what's required here, as we want to un-nestDiplomatic

© 2022 - 2024 — McMap. All rights reserved.