Fluentd vs Kafka

Asked 2/2, 2016 at 4:4 Answered 19/4, 2018 at 20:2

Solved elasticsearch apache-kafka fluentd

The use case is this: I've several java applications running which all have to interact with different (each one has a specific target) elasticsearch indices. For instance an application A uses the indices A,B,C of ElasticSearch to query and update. Application B uses indices A,C,D(say).

Some common interface is required which can manage all these data streams. Currently I'm evaluating Kafka and fluentd for this purpose. Can someone explain which will be better suited for this situation. I've looked at features of both Kafka and Fluentd and I don't really understand the difference it would make here. Thanks a lot.

Cameroncameroon answered 2/2, 2016 at 4:4 Comment(0)

kafka provides publish/subscribe messaging as a distributed commit log. Usually you install kafka on each host where you need to produce some data to be forwarded somewhere else and all those hosts will together form a cluster. The good thing here is that if for some reason network connectivity becomes unstable or goes down, your application can continue to produce data/logs and they won't be lost. Whereas if your application directly sends logs to some remote centralized logging host, you might lose some logs during the time the network goes down.

fluentd is a centralized log collector which is commonly installed on one host (or more if you need horizontal scaling). It connects to remote data sources, applies filtering and sends unified log data to remote data sinks.

From the fluentd docs, you can see that fluentd can consume data from kafka and produce data towards kafka as well. This alone should hint that fluentd and kafka are on different layers since the former uses the latter.

It would be more logical to compare fluentd and logstash actually. As far as fluentd is concerned, kafka is just another data source and/or data sink, but they are different beasts altogether.

If you want the best of both worlds, use kafka as input/output data pipes from/to your apps and fluentd (or logstash) as your centralized logging system reading from those kafka topics.

If you want to read more on the topic, you can read how fluentd and kafka complement each other very well, read they are not competing against each other.

Slaw answered 2/2, 2016 at 4:38 Comment(9)

You Sir, are a legend :) Thanks a lot. – Cameroncameroon 2/2, 2016 at 5:27

Thank you, I'm always happy to help ;) – Slaw 2/2, 2016 at 6:11

For the benefit of everyone, may the person who downvoted the question and the answer provide some comment on why he did it. Please enlighten us ;-) – Slaw 2/2, 2016 at 7:41

I didn't downvote but I think that you should double check before writing that fluentd is only a centralised logs aggregator ;) It does not provide pub/sub functionality, but most definitely works in a High-Availability, decentralised manner. – Distil 28/6, 2016 at 17:31

@Distil I didn't say "only" ;-) – Slaw 28/6, 2016 at 17:59

Maybe something to compare to your discussion. Could there be more of a parallel between flume and kafka. I say this because I'm working on a pipeline with flafka and wanting to know more about whether this would work in a distributed logging scenario? – Bunyan 1/8, 2016 at 17:42

thank you so much for this detailed explanation of the difference between them – Sizing 16/11, 2018 at 3:3

@Slaw According to you, does it make more sense to use kafka as a producer for fluentd or as a consumer (assuming I'm using fluent Bit as collector at the source) . Also , can you can think of any pro/cons for either of approaches ? – Calutron 11/2, 2019 at 13:29

@Calutron you should probably ask a new question explaining your needs/questions in greater details – Slaw 11/2, 2019 at 13:30

From: The Life Blood Of Your Data Pipeline

Kafka is primarily related to holding log data rather than moving log data. Thus, Kafka producers need to write the code to put data in Kafka, and Kafka consumers need to write the code to pull data out of Kafka.

Fluentd has both input and output plugins for Kafka so that data engineers can write less code to get data in and out of Kafka. We have many users that use Fluentd as a Kafka producer and/or consumer.

Vauban answered 19/4, 2018 at 20:2 Comment(0)

Recommended topics

Hot tags