Fluentd vs Kafka
Asked Answered
C

2

34

The use case is this: I've several java applications running which all have to interact with different (each one has a specific target) elasticsearch indices. For instance an application A uses the indices A,B,C of ElasticSearch to query and update. Application B uses indices A,C,D(say).

Some common interface is required which can manage all these data streams. Currently I'm evaluating Kafka and fluentd for this purpose. Can someone explain which will be better suited for this situation. I've looked at features of both Kafka and Fluentd and I don't really understand the difference it would make here. Thanks a lot.

Cameroncameroon answered 2/2, 2016 at 4:4 Comment(0)
S
67

kafka provides publish/subscribe messaging as a distributed commit log. Usually you install kafka on each host where you need to produce some data to be forwarded somewhere else and all those hosts will together form a cluster. The good thing here is that if for some reason network connectivity becomes unstable or goes down, your application can continue to produce data/logs and they won't be lost. Whereas if your application directly sends logs to some remote centralized logging host, you might lose some logs during the time the network goes down.

fluentd is a centralized log collector which is commonly installed on one host (or more if you need horizontal scaling). It connects to remote data sources, applies filtering and sends unified log data to remote data sinks.

From the fluentd docs, you can see that fluentd can consume data from kafka and produce data towards kafka as well. This alone should hint that fluentd and kafka are on different layers since the former uses the latter.

It would be more logical to compare fluentd and logstash actually. As far as fluentd is concerned, kafka is just another data source and/or data sink, but they are different beasts altogether.

If you want the best of both worlds, use kafka as input/output data pipes from/to your apps and fluentd (or logstash) as your centralized logging system reading from those kafka topics.

If you want to read more on the topic, you can read how fluentd and kafka complement each other very well, read they are not competing against each other.

Slaw answered 2/2, 2016 at 4:38 Comment(9)
You Sir, are a legend :) Thanks a lot.Cameroncameroon
Thank you, I'm always happy to help ;)Slaw
For the benefit of everyone, may the person who downvoted the question and the answer provide some comment on why he did it. Please enlighten us ;-)Slaw
I didn't downvote but I think that you should double check before writing that fluentd is only a centralised logs aggregator ;) It does not provide pub/sub functionality, but most definitely works in a High-Availability, decentralised manner.Distil
@Distil I didn't say "only" ;-)Slaw
Maybe something to compare to your discussion. Could there be more of a parallel between flume and kafka. I say this because I'm working on a pipeline with flafka and wanting to know more about whether this would work in a distributed logging scenario?Bunyan
thank you so much for this detailed explanation of the difference between themSizing
@Slaw According to you, does it make more sense to use kafka as a producer for fluentd or as a consumer (assuming I'm using fluent Bit as collector at the source) . Also , can you can think of any pro/cons for either of approaches ?Calutron
@Calutron you should probably ask a new question explaining your needs/questions in greater detailsSlaw
V
4

From: The Life Blood Of Your Data Pipeline

Kafka is primarily related to holding log data rather than moving log data. Thus, Kafka producers need to write the code to put data in Kafka, and Kafka consumers need to write the code to pull data out of Kafka.

Fluentd has both input and output plugins for Kafka so that data engineers can write less code to get data in and out of Kafka. We have many users that use Fluentd as a Kafka producer and/or consumer.

Vauban answered 19/4, 2018 at 20:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.