Difference between stream processing and message processing
Asked Answered
S

7

120

What is the basic difference between stream processing and traditional message processing? As people say that kafka is good choice for stream processing but essentially kafka is a messaging framework similar to ActivMQ, RabbitMQ etc.

Why do we generally not say that ActiveMQ is good for stream processing as well.

Is it the speed at which messages are consumed by the consumer determines if it is a stream?

Shugart answered 19/1, 2017 at 14:39 Comment(1)
Kafka very much is NOT "a messaging framework similar to ActivMQ, RabbitMQ etc", as described in this post: azure.microsoft.com/en-us/blog/…Kovrov
H
146

In traditional message processing, you apply simple computations on the messages -- in most cases individually per message.

In stream processing, you apply complex operations on multiple input streams and multiple records (ie, messages) at the same time (like aggregations and joins).

Furthermore, traditional messaging systems cannot go "back in time" -- ie, they automatically delete messages after they got delivered to all subscribed consumers. In contrast, Kafka keeps the messages as it uses a pull-based model (ie, consumers pull data out of Kafka) for a configurable amount of time. This allows consumers to "rewind" and consume messages multiple times -- or if you add a new consumer, it can read the complete history. This makes stream processing possible, because it allows for more complex applications. Furthermore, stream processing is not necessarily about real-time processing -- it's about processing infinite input streams (in contrast to batch processing, which is applied to finite inputs).

And Kafka offers Kafka Connect and Streams API -- so it is a stream-processing platform and not just a messaging/pub-sub system (even if it uses this in its core).

Henka answered 19/1, 2017 at 18:44 Comment(0)
E
14

If you like splitting hairs: Messaging is communication between two or more processes or components whereas streaming is the passing of event log as they occur. Messages carry raw data whereas events contain information about the occurrence of and activity such as an order. So Kafka does both, messaging and streaming. A topic in Kafka can be raw messages or and event log that is normally retained for hours or days. Events can further be aggregated to more complex events.

Expellee answered 24/8, 2020 at 16:17 Comment(0)
T
4

Although Rabbit supports streaming, it was actually not built for it(see Rabbit´s web site) Rabbit is a Message broker and Kafka is a event streaming platform.

Kafka can handle a huge number of 'messages' towards Rabbit. Kafka is a log while Rabbit is a queue which means that if once consumed, Rabbit´s messages are not there anymore in case you need it.

However Rabbit can specify message priorities but Kafka doesn´t.

It depends on your needs.

Tagliatelle answered 23/12, 2020 at 14:41 Comment(0)
N
3

Basically Kafka is messaging framework similar to ActiveMQ or RabbitMQ. There are some effort to take Kafka towards streaming:

https://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple/

Then why Kafka comes into picture when talking about Stream processing?

Stream processing framework differs with input of data.In Batch processing,you have some files stored in file system and you want to continuously process that and store in some database. While in stream processing frameworks like Spark, Storm, etc will get continuous input from some sensor devices, api feed and kafka is used there to feed the streaming engine.

Norford answered 19/1, 2017 at 14:56 Comment(0)
F
3

Message Processing implies operations on and/or using individual messages. Stream Processing encompasses operations on and/or using individual messages as well as operations on collection of messages as they flow into the system. For e.g., let's say transactions are coming in for a payment instrument - stream processing can be used to continuously compute hourly average spend. In this case - a sliding window can be imposed on the stream which picks up messages within the hour and computes average on the amount. Such figures can then be used as inputs to fraud detection systems

Faction answered 1/9, 2020 at 12:0 Comment(0)
N
3

Apologies for long answer but I think short answer will not be justice to question.

Consider queue system. like MQ, for:

  • Exactly once delivery, and to participate into two phase commit transaction
  • Asynchronous request / reply communication: the semantic of the communication is for one component to ask a second command to do something on its data. This is a command pattern with delay on the response.
  • Recall messages in queue are kept until consumer(s) got them.

Consider streaming system, like Kafka, as pub/sub and persistence system for:

  • Publish events as immutable facts of what happened in an application
  • Get continuous visibility of the data Streams
  • Keep data once consumed, for future consumers, for replay-ability
  • Scale horizontally the message consumption

What are Events and Messages

There is a long history of messaging in IT systems. You can easily see an event-driven solution and events in the context of messaging systems and messages. However, there are different characteristics that are worth considering:

Messaging: Messages transport a payload and messages are persisted until consumed. Message consumers are typically directly targeted and related to the producer who cares that the message has been delivered and processed.

Events: Events are persisted as a replayable stream history. Event consumers are not tied to the producer. An event is a record of something that has happened and so can't be changed. (You can't change history.)

enter image description here

Now Messaging versus event streaming

Messaging are to support:

  • Transient Data: data is only stored until a consumer has processed the message, or it expires.
  • Request / reply most of the time.
  • Targeted reliable delivery: targeted to the entity that will process the request or receive the response. Reliable with transaction support.
  • Time Coupled producers and consumers: consumers can subscribe to queue, but message can be remove after a certain time or when all subscribers got message. The coupling is still loose at the data model level and interface definition level.

Events are to support:

  • Stream History: consumers are interested in historic events, not just the most recent.
  • Scalable Consumption: A single event is consumed by many consumers with limited impact as the number of consumers grow.
  • Immutable Data
  • Loosely coupled / decoupled producers and consumers: strong time decoupling as consumer may come at anytime. Some coupling at the message definition level, but schema management best practices and schema registry reduce frictions.

Hope this answer help!

Nassi answered 21/9, 2022 at 16:21 Comment(0)
A
1

Recently, I have come across a very good document that describe the usage of "stream processing" and "message processing"

https://developer.ibm.com/articles/difference-between-events-and-messages/

Taking the asynchronous processing in context -

Messaging:
Consider it when there is a "request for processing" i.e. client makes a request for server to process.

Event streaming:
Consider it when "accessing enterprise data" i.e. components within the enterprise can emit data that describe their current state. This data does not normally contain a direct instruction for another system to complete an action. Instead, components allow other systems to gain insight into their data and status.

To facilitate this evaluation, consider these key selection criteria to consider when selecting the right technology for your solution:

  • Event history - Kafka
  • Fine-grained subscriptions - MQ
  • Scalable consumption - Kafka
  • Transactional behavior - MQ
Azine answered 21/6, 2021 at 6:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.