multiline fluentd logs in kubernetes
Asked Answered
E

1

7

I am new to fluentd. I have configured the basic fluentd setup I need and deployed this to my kubernetes cluster as a daemon set. I'm seeing logs shipped to my 3rd party logging solution. However I now want to deal with some logs that are coming in as multiple entries when they really should be one. The logs from the node look like they are json and are formatted like

{"log":"2019-09-23 18:54:42,102 [INFO] some message \n","stream":"stderr","time":"2019-09-23T18:54:42.102Z"}
{"log": "another message \n","stream":"stderr","time":"2019-09-23T18:54:42.102Z"}

I have a config map that looks like

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-config-map
  namespace: logging
  labels:
    k8s-app: fluentd-logzio
data:
  fluent.conf: |-
@include "#{ENV['FLUENTD_SYSTEMD_CONF'] || 'systemd'}.conf"
@include kubernetes.conf
@include conf.d/*.conf

<match fluent.**>
    # this tells fluentd to not output its log on stdout
    @type null
</match>

# here we read the logs from Docker's containers and parse them
<source>
  @id fluentd-containers.log
  @type tail
  path /var/log/containers/*.log
  pos_file /var/log/es-containers.log.pos
  time_format %Y-%m-%dT%H:%M:%S.%NZ
  tag raw.kubernetes.*
  format json
  read_from_head true

</source>

# Detect exceptions in the log output and forward them as one log entry.
<match raw.kubernetes.**>
  @id raw.kubernetes
  @type detect_exceptions
  remove_tag_prefix raw
  message log
  stream stream
  multiline_flush_interval 5
  max_bytes 500000
  max_lines 1000
</match>

# Enriches records with Kubernetes metadata
<filter kubernetes.**>
  @id filter_kubernetes_metadata
  @type kubernetes_metadata
</filter>

<match kubernetes.**>
  @type logzio_buffered
  @id out_logzio
  endpoint_url "https://listener-ca.logz.io?token=####"
  output_include_time true
  output_include_tags true
  <buffer>
    # Set the buffer type to file to improve the reliability and reduce the memory consumption
    @type file
    path /var/log/fluentd-buffers/stackdriver.buffer
    # Set queue_full action to block because we want to pause gracefully
    # in case of the off-the-limits load instead of throwing an exception
    overflow_action block
    # Set the chunk limit conservatively to avoid exceeding the GCL limit
    # of 10MiB per write request.
    chunk_limit_size 2M
    # Cap the combined memory usage of this buffer and the one below to
    # 2MiB/chunk * (6 + 2) chunks = 16 MiB
    queue_limit_length 6
    # Never wait more than 5 seconds before flushing logs in the non-error case.
    flush_interval 5s
    # Never wait longer than 30 seconds between retries.
    retry_max_interval 30
    # Disable the limit on the number of retries (retry forever).
    retry_forever true
    # Use multiple threads for processing.
    flush_thread_count 2
  </buffer>
</match>

My question is how do I get these log messages shipped as one single entry instead of multiple?

Educated answered 23/9, 2019 at 20:5 Comment(0)
L
8

There are at least two ways:

multiline plugin

Thanks to @rickerp, he suggested multiline plugin.

The multiline parser plugin parses multiline logs. This plugin is the multiline version of regexp parser.

The multiline parser parses log with formatN and format_firstline parameters. format_firstline is for detecting the start line of the multiline log. formatN, where N's range is [1..20], is the list of Regexp format for multiline log.

Unlike other parser plugins, this plugin needs special code in input plugin e.g. handle format_firstline. So, currently, in_tail plugin works with multiline but other input plugins do not work with it.

fluent-plugin-concat plugin

As per fluentd documentation, fluent-plugin-concat solves this:

Concatenate multiple lines log messages

Application log is stored into "log" field in the records. You can concatenate these logs by using fluent-plugin-concat filter before send to destinations.

<filter docker.**>
@type concat
key log
stream_identity_key container_id
multiline_start_regexp /^-e:2:in `\/'/
multiline_end_regexp /^-e:4:in/
</filter>

Original events:

2016-04-13 14:45:55 +0900 docker.28cf38e21204: {"container_id":"28cf38e212042225f5f80a56fac08f34c8f0b235e738900c4e0abcf39253a702","container_name":"/romantic_dubinsky","source":"stdout","log":"-e:2:in `/'"}
2016-04-13 14:45:55 +0900 docker.28cf38e21204: {"source":"stdout","log":"-e:2:in `do_division_by_zero'","container_id":"28cf38e212042225f5f80a56fac08f34c8f0b235e738900c4e0abcf39253a702","container_name":"/romantic_dubinsky"}
2016-04-13 14:45:55 +0900 docker.28cf38e21204: {"source":"stdout","log":"-e:4:in `<main>'","container_id":"28cf38e212042225f5f80a56fac08f34c8f0b235e738900c4e0abcf39253a702","container_name":"/romantic_dubinsky"}

Filtered events:

2016-04-13 14:45:55 +0900 docker.28cf38e21204: {"container_id":"28cf38e212042225f5f80a56fac08f34c8f0b235e738900c4e0abcf39253a702","container_name":"/romantic_dubinsky","source":"stdout","log":"-e:2:in `/'\n-e:2:in `do_division_by_zero'\n-e:4:in `<main>'"}

With the plugin, you'll want to fix some regexes.

Lever answered 24/9, 2019 at 6:51 Comment(2)
this worked perfectly. Actually a little too perfectly, combining some logs that shouldn't be now, but I can edit my regex a little bit and probably get that to work. thanks for the tip.Educated
I think now you can use the multiline @type docs.fluentd.org/parser/multilinePartible

© 2022 - 2024 — McMap. All rights reserved.