Does NATS Jetstream provide message ordering by a key?
Asked Answered
D

1

9

I am new to NATS Jetstream and I have been reading their official documentation to understand its concepts and compare it with Kafka. One of the major use cases I have, is to solve for message/event ordering based on a particular id (like a partition key in the Kafka world).

For example, there are several update events coming for an Order entity and my system needs to consume the events for a particular Order in the same order. In this case, I would use the order-id as the partition key while publishing to the Kafka topic. How do I accomplish this in Jetstream?

I have come across a de-duplication key (Nats-Msg-Id) in Jetstream, but I think this feature is more synonymous with topic compaction in Kafka. Am I right?

Nevertheless, I have written the following code in Golang for publishing:

order = Order{
    OrderId: orderId,
    Status:  status,
}
orderJson, _ := json.Marshal(order)
dedupKey := nats.MsgId(order.OrderId)
_, err := js.Publish(subjectName, orderJson, dedupKey)

Am I doing this right? Will all orders for a particular orderId go to the same consumer within a consumer group in the Jetstream world, hence maintaining the sequence?

Edit 1

This is what I get from @tbeets' suggestion. For example, I have predefined 10 stream subjects like ORDER.1, ORDER.2,ORDER.3 .... ORDER.10

On the publishing side, I can do an order-id%10+1 to find the exact stream subject to which I would want to publish. So here, we have accomplished that all update events for the same orderId will go to the same stream subject every time.

Now, on the subscriber side, I have 10 consumer groups (there are 10 consumers within each consumer group) and each consume from a particular stream subject, like consumerGroup-1 consumes from ORDER.1, consumerGroup-2 consumes from ORDER.2 and so on...

Say, 2 order update events came for order-id 111, which would get mapped to ORDER.1 stream subject, and correspondingly consumerGroup-1 will consume these 2 events. But within this consumerGroup, the 2 update events can go to different consumers and if one of the consumers is a bit busy or slow, then at an overall level, the order update events consumption maybe out-of-sync or out-of-order.

Kafka solves this using the concept of partition key as consumers of a consumer group are allocated to a particular partition. Hence, all events for the same orderId, are consumed by the same consumer, hence, maintaining the sequence of order update event consumption. How do I solve this issue in Jetstream?

Dedrick answered 30/8, 2021 at 13:22 Comment(1)
Looks like there is no solution for my Edit-1 part of the question. The best thing one can do right now, is to have only one consumer per consumerGroup (for every consumerGroup, like consumerGroup-1, consumerGroup-2 ....) Check this out: github.com/nats-io/nats-architecture-and-design/pull/36/files Jetstream is planning to solve for this use-case in the future.Dedrick
H
3

In NATS, your publish subject can contain multiple delimited tokens. So for instance your Order event could be published to ORDER.{store}.{orderid} where the last two tokens are specific to each event and provide whatever slice-and-dice dimensions you need for your use case.

You then define a JetStream for ORDER.> (i.e. all of the events). N number of Consumers (ephemeral or durable) can be created on the JetStream, each with an optional filter definition to your use case needs (e.g. ORDER.Store24.>) on the underlying stream's messages. JetStream guarantees that messages (filtered or unfiltered) are delivered in the order they were received.

Homs answered 30/8, 2021 at 17:57 Comment(4)
orderid is the unique key based on which I want to consume messages, so you can remove the store concept. Now, if I create multiple subjects on the fly like ORDER.{orderid}, then I will need thousands of consumer groups (each subscribing to a particular stream subject like ORDER.1234, ORDER.1245 etc. So there will be a consumer/consumer group for every Stream subject. Is that what you're saying?Dedrick
Not realistically. If you have a natural subject dimension (like orders from a particular store) that you want to process together, you can use that. If you don't, then you could create a synthetic subject dimension with a hash or other algorithm (in your publishing client) that splits your orders into N "buckets" each of which you can consume independently from the stream later. This is akin to Kafka persisting a topic into N static partitions by hash scheme of each message key but unlike Kafka it is completely dynamic for your application and you can change your scheme later as needed.Homs
Yes, as in the linked feature-discuss note from R.I., your subscribing application coordinates its client instances such that there is only one active instance per JS Consumer (i.e. consumerGroup-1, consumerGroup-2, ...). Further, although you could manage in your application code, setting max ack pending to 1 on each JS Consumer is further insurance that each instance receives and acks before moving on to the next ordered message in the JS Consumer definition.Homs
how to rebalance this scheme?Whisenhunt

© 2022 - 2025 — McMap. All rights reserved.