Are SQS and Kafka same?
Asked Answered
G

5

51

Are Kafka and SQS same? I see that both are messaging queue systems and are event-based. Do they serve the same purpose, If not how are they different?

Gridley answered 21/11, 2019 at 8:2 Comment(2)
kafka is Apache product and SQS is Amazon product, high level they both are used to store data for a defined time.Marion
Kinesis is more commonly compared to Kafka than SQS, but MSK would be used instead if you actually want KafkaAdditament
C
54

Apache Kafka and Amazon SQS are both used for message streaming but are not the same.

Apache Kafka follows the publish subscriber model, where the producer sends an event/message to a topic, and one or more consumers are subscribed to that topic to get the event/message. In the topic, you find partitions for parallel streaming. There is a consumer group concept once. When a message is read from a partition of topics it will be committed to identify it already read by that consumer group to avoid inconsistency in reading in concurrent programming. However, other consumer groups can still read that message from the partition.

Where Amazon SQS follows Queue and the queue can be created in any region of Amazon SQS. You can push messages to Queue and only one consumer can subscribe to each Queue and pull messages from the Queue. That's why SQS is pull-based streaming. SQS Queues are of two types: FIFO and Standard.

There is another concept in AWS which is Amazon SNS, which is published subscriber-based like Kafka, but there is not any message retention policy in SNS. It's for instant messaging like email, SMS, etc. It can only push messages to subscribers when the subscribers are available. Otherwise, the message will be lost. However, SQS with SNS can overcome this drawback. Amazon SNS with SQS is called the fanout pattern. In this pattern, a message published to an SNS topic is distributed to multiple SQS queues in parallel and the SQS queue assures persistence, because SQS has a retention policy. It can persist message for up to 14 days(default 4 days). Amazon SQS with SNS can achieve high throughput parallel streaming and can replace Apache Kafka.

Collop answered 7/11, 2021 at 7:18 Comment(3)
only one consumer can subscribe to each Queue really? I read differently here saturncloud.io/blog/…Nazarite
@Kanagavelu Sugumar I guess the author meant that once a message is read by one consumer, it's removed from the queue and can't be consumed by other subscribersEmilio
P.S. Well, actually this IS possible due to the distributed nature of SQS, but it goes against the way it was intended and is considered a disadvantage that should be considered when designing applications that use SQS.Emilio
S
14

Yes, they are two messaging systems, but there are differences:

Kafka

Kafka is a pretty scalable system and fits on high workloads when you want to send messages in batches (to have a good message throughput).

Kafka topic consists of some number of partitions that can be read completely parallel by different consumers in one consumer group and that give us a very good performance.
For example, if you need to build a high loaded streaming system, Kafka is really suitable for it.

SQS

SQS is an Amazon managed service (so you do not have to support infrastructure by yourself).

SQS is better for eventing when you need to catch some message (event) by some client and then this message will be automatically popped out from the queue.

As for my experience, SQS is not as fast as Kafka and it doesn't fit to high workload, it's much more suitable for eventing where the count of events per second is not so much.

For example, if you want to react on some S3 file upload (to start some processing of this file) SQS is very good.

Siddons answered 21/11, 2019 at 8:28 Comment(1)
I appreciate the text but what is that big difference?...Paleobotany
D
14

SQS and Kafka are both messaging systems. The primary differences are :

  • Ordering at scale. Kafka - Produced messages are always consumed in order irrespective of the number of items in the queue. SQS - "A FIFO queue looks through the first 20k messages to determine available message groups. This means that if you have a backlog of messages in a single message group, you can't consume messages from other message groups that were sent to the queue at a later time until you successfully consume the messages from the backlog"
  • Limit on the Number of groups/topic/partitions Kafka - Although the limit is quite high, but the number of topics/partitions is usually in the order of thousands (which can increase depending on the cluster size). SQS - "There is no quota to the number of message groups within a FIFO queue."
  • Deduplication - Kafka does not support deduplication of data in case same data is produced multiple times. SQS tries to dedup messages based on the dedup-id and the dedup-interval. "Assuming that the producer receives at least one acknowledgement before the deduplication interval expires, multiple retries neither affect the ordering of messages nor introduce duplicates."
  • Partition management. Kafka - Creations or additions of partitions are created and managed by the user. SQS controls the number of partitions and it can increase or decrease it depending on the load and usage pattern.
  • Dead letter queue - Kafka does not have the concept of a DL queue (it can be explicitly created and maintained by the user thought). SQS inherently supports a DL queue by itself.

Overall if we want so summarise the points above, we can say that SQS is meant for offloading background tasks to an async pipeline. Kafka is much more scalable and should be used as a stream processing pipeline.

Discovery answered 14/2, 2022 at 7:44 Comment(0)
H
2

SQS is a queue. You have a list of messages that would need to be processed by some other part of the application. A message would ideally be processed once by one processor and would be marked as processed and removed from the queue. The purpose of the queue is to coordinate and distribute the processing of messages among the different processors.

Kafka is more similar to Kinesis which is used mainly for data streaming. Messages are stored in topics for other components to read. Any component can listen to topics and/or read all messages at any time. The main purpose is to allow the efficient delivery of messages to any number of recipients and allow the continuous streaming of data between components in a dynamic and elastic way.

Hecto answered 21/11, 2021 at 23:10 Comment(0)
H
-2

At a birds view, there is one main difference

  1. Kafka is used for pub sub model. If a producer sends a single message. If a kafka topic has 2 consumers , both the consumers will receive the message
  2. SQS is more like competing consumer pattern. If a producer sends a message and the sqs has 2 consumers. Only one consumer will receive the message. The other one wont get the message, if the 1st consumer has processed the message successfully. The 2nd consumer has a chance to recieve the message only if the message visibility times out. ie., 1st consumer is not able to process the message within the given time and cant delete the message within the visibility timeout.
Hedger answered 19/11, 2022 at 12:55 Comment(1)
Kafka has consumer groups, you can register a consumer group with 10 or more instances of a single consumer, then you can see a round-robin behavior.Cantankerous

© 2022 - 2024 — McMap. All rights reserved.