Docker apps logging with Filebeat and Logstash
Asked Answered
G

6

22

I have a set of dockerized applications scattered across multiple servers and trying to setup production-level centralized logging with ELK. I'm ok with the ELK part itself, but I'm a little confused about how to forward the logs to my logstashes. I'm trying to use Filebeat, because of its loadbalance feature. I'd also like to avoid packing Filebeat (or anything else) into all my dockers, and keep it separated, dockerized or not.

How can I proceed?

I've been trying the following. My Dockers log on stdout so with a non-dockerized Filebeat configured to read from stdin I do:

docker logs -f mycontainer | ./filebeat -e -c filebeat.yml

That appears to work at the beginning. The first logs are forwarded to my logstash. The cached one I guess. But at some point it gets stuck and keep sending the same event

Is that just a bug or am I headed in the wrong direction? What solution have you setup?

Gennagennaro answered 30/10, 2015 at 9:47 Comment(2)
I've just tried the same thing with the old logstash-forwarder: docker logs -f mycontainer | ./logstash-forwarder_linux_amd64 -config forwarder.conf And it works. I suspect a bug of Filebeat. The only problem remains that there's just random connection to a logstash with no loadbalancing.Gennagennaro
Which version of filebeat are you using? This looks like a potential bug. Feel free to open an issue here so we can look deeper into the problem. For reference: Some additional discussions on docker implementation can be found here: github.com/elastic/libbeat/issues/37Auxochrome
P
10

Docker allows you to specify the logDriver in use. This answer does not care about Filebeat or load balancing.

In a presentation I used syslog to forward the logs to a Logstash (ELK) instance listening on port 5000. The following command constantly sends messages through syslog to Logstash:

docker run -t -d --log-driver=syslog --log-opt syslog-address=tcp://127.0.0.1:5000 ubuntu /bin/bash -c 'while true; do echo "Hello $(date)"; sleep 1; done'
Pensive answered 30/10, 2015 at 10:0 Comment(5)
I had a look at the logDriver but what about the High Availability? Will I need a TCP load balancer to route the requests to my logstash cluster?Gennagennaro
I am not sure about the scale of your system. Using ~200 log producers (the command in my answer) I didn't notice any problems. However I did not think about high availability or load balacing/clustering.Pensive
I'm not that concerned about the quantity of logs, it's the availability of logstash. I need at least 2 or 3 of them to ensure a good fault tolerance, and then some mechanism to switch from one to the other.Gennagennaro
Well.. I cant help you with that. But I'll leave up my solution to help others. Maybe you want to point out the load balancing in your question.Pensive
Using docker swarm or kubernetes should provide solutions to your load balancing issues. In docker swarm you specify which service should receive a message and docker forwards to one of the replicas (could be round robin or other possibilities)Bittern
W
19

Here's one way to forward docker logs to the ELK stack (requires docker >= 1.8 for the gelf log driver):

  1. Start a Logstash container with the gelf input plugin to reads from gelf and outputs to an Elasticsearch host (ES_HOST:port):

    docker run --rm -p 12201:12201/udp logstash \
        logstash -e 'input { gelf { } } output { elasticsearch { hosts => ["ES_HOST:PORT"] } }'
    
  2. Now start a Docker container and use the gelf Docker logging driver. Here's a dumb example:

    docker run --log-driver=gelf --log-opt gelf-address=udp://localhost:12201 busybox \
        /bin/sh -c 'while true; do echo "Hello $(date)"; sleep 1; done'
    
  3. Load up Kibana and things that would've landed in docker logs are now visible. The gelf source code shows that some handy fields are generated for you (hat-tip: Christophe Labouisse): _container_id, _container_name, _image_id, _image_name, _command, _tag, _created.

If you use docker-compose (make sure to use docker-compose >= 1.5) and add the appropriate settings in docker-compose.yml after starting the logstash container:

log_driver: "gelf"
log_opt:
  gelf-address: "udp://localhost:12201"
Woodworking answered 20/11, 2015 at 0:1 Comment(3)
I think the problem with gelf is it's using udp and might drop log events silently.Debutant
Good point, @urso. The syslog logging driver can be used in a similar manner to deliver logs via TCP here's an example. The Graylog Extended Format (GELF) docs mention potential problems using TCP contrasted with UDP silently dropping logging events.Woodworking
This article explains the issues with gelf (with both UDP and TCP) : claudiokuenzler.com/blog/845/…Herminahermine
P
10

Docker allows you to specify the logDriver in use. This answer does not care about Filebeat or load balancing.

In a presentation I used syslog to forward the logs to a Logstash (ELK) instance listening on port 5000. The following command constantly sends messages through syslog to Logstash:

docker run -t -d --log-driver=syslog --log-opt syslog-address=tcp://127.0.0.1:5000 ubuntu /bin/bash -c 'while true; do echo "Hello $(date)"; sleep 1; done'
Pensive answered 30/10, 2015 at 10:0 Comment(5)
I had a look at the logDriver but what about the High Availability? Will I need a TCP load balancer to route the requests to my logstash cluster?Gennagennaro
I am not sure about the scale of your system. Using ~200 log producers (the command in my answer) I didn't notice any problems. However I did not think about high availability or load balacing/clustering.Pensive
I'm not that concerned about the quantity of logs, it's the availability of logstash. I need at least 2 or 3 of them to ensure a good fault tolerance, and then some mechanism to switch from one to the other.Gennagennaro
Well.. I cant help you with that. But I'll leave up my solution to help others. Maybe you want to point out the load balancing in your question.Pensive
Using docker swarm or kubernetes should provide solutions to your load balancing issues. In docker swarm you specify which service should receive a message and docker forwards to one of the replicas (could be round robin or other possibilities)Bittern
D
8

Using filebeat you can just pipe docker logs output as you've described. Behavior you are seeing definitely sounds like a bug, but can also be the partial line read configuration hitting you (resend partial lines until newline symbol is found).

A problem I see with piping is possible back pressure in case no logstash is available. If filebeat can not send any events, it will buffer up events internally and at some point stop reading from stdin. No idea how/if docker protects from stdout becoming unresponsive. Another problem with piping might be restart behavior of filebeat + docker if you are using docker-compose. docker-compose by default reuses images + image state. So when you restart, you will ship all old logs again (given the underlying log file has not been rotated yet).

Instead of piping you can try to read the log files written by docker to the host system. The default docker log driver is the json log driver . You can and should configure the json log driver to do log-rotation + keep some old files (for buffering up on disk). See max-size and max-file options. The json driver puts one line of 'json' data for every line to be logged. On the docker host system the log files are written to /var/lib/docker/containers/container_id/container_id-json.log . These files will be forwarded by filebeat to logstash. If logstash or network becomes unavailable or filebeat is restarted, it continues forwarding log lines where it left of (given files have been not deleted due to log rotation). No events will be lost. In logstash you can use the json_lines codec or filter to parse the json lines and a grok filter to gain some more information from your logs.

There has been some discussion about using libbeat (used by filebeat for shipping log files) to add a new log driver to docker. Maybe it is possible to collect logs via dockerbeat in the future by using the docker logs api (I'm not aware of any plans about utilising the logs api, though).

Using syslog is also an option. Maybe you can get some syslog relay on your docker host load balancing log events. Or have syslog write log files and use filebeat to forward them. I think rsyslog has at least some failover mode. You can use logstash syslog input plugin and rsyslog to forward logs to logstash with failover support in case the active logstash instance becomes unavailable.

Debutant answered 20/11, 2015 at 13:59 Comment(1)
Re json-file, github.com/moby/moby/issues/17763 indicates that the docker json-files are considered internal data and not meant to be consumed by other processes.Fulfil
G
7

I created my own docker image using the Docker API to collect the logs of the containers running on the machine and ship them to Logstash thanks to Filebeat. No need to install or configure anything on the host.

Check it out and tell me if it suits your needs: https://hub.docker.com/r/bargenson/filebeat/.

The code is available here: https://github.com/bargenson/docker-filebeat

Gamma answered 7/2, 2016 at 6:17 Comment(0)
I
0

Just for helping others that need to do this, you can simply use Filebeat to ship the logs. I would use the container by @brice-argenson, but I needed SSL support so I went with a locally installed Filebeat instance.

The prospector from filebeat is (repeat for more containers):

- input_type: log
  paths:
    - /var/lib/docker/containers/<guid>/*.log
  document_type: docker_log
  fields:
    dockercontainer: container_name

It sucks a bit that you need to know the GUIDs as they could change on updates.

On the logstash server, setup the usual filebeat input source for logstash, and use a filter like this:

filter {
  if [type] == "docker_log" {
    json {
      source => "message"
      add_field => [ "received_at", "%{@timestamp}" ]
      add_field => [ "received_from", "%{host}" ]
    }
    mutate {
      rename => { "log" => "message" }
    }
    date {
      match => [ "time", "ISO8601" ]
    }
  }
}

This will parse the JSON from the Docker logs, and set the timestamp to the one reported by Docker.

If you are reading logs from the nginx Docker image, you can add this filter as well:

filter {
  if [fields][dockercontainer] == "nginx" {
    grok {
      match => { "message" => "(?m)%{IPORHOST:targethost} %{COMBINEDAPACHELOG}" }
    }
    mutate {
      convert => { "[bytes]" => "integer" }
      convert => { "[response]" => "integer" }
    }
    mutate {
      rename => { "bytes" => "http_streamlen" }
      rename => { "response" => "http_statuscode" }
    }
  }
}

The convert/renames are optional, but fixes an oversight in the COMBINEDAPACHELOG expression where it does not cast these values to integers, making them unavailable for aggregation in Kibana.

Intenerate answered 25/11, 2016 at 9:25 Comment(2)
Thanks for this! Regarding your hint about the GUIDs, I agree, however you probably would not want to make configuration like this by hand, but rather use something like Ansible. Then just "docker ps | grep container_name | awk '{print $1}'", then template the result into the config and restart filebeat.World
According to the docs, you should be able to use a pattern like this in your prospectors.paths: /var/lib/docker/containers/*/*.logEdulcorate
T
0

I verified what erewok wrote above in a comment:

According to the docs, you should be able to use a pattern like this in your prospectors.paths: /var/lib/docker/containers/*/*.log – erewok Apr 18 at 21:03

The docker container guids, represented as the first '*', are correctly resolved when filebeat starts up. I do not know what happens as containers are added.

Titanate answered 13/9, 2017 at 22:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.