RabbitMQ: Scaling queues with the consistent hash exchange
Asked Answered
E

1

15

(Picking up from a Github Issue)

We use RabbitMQ's consistent hash exchange which is useful to shard routing keys from an exchange to multiple queues.

We use the queues to dish out work to workers. Ideally we would like to dynamically scale these workers but this presents issues.

To scale up, you need to add a new queue and binding. On it's own this is not a huge deal - except for that fact that a sharded key may now start going to a different queue.

So Worker A may have been managing Thing1, but now as we add the new queue Worker B may end up getting messages for Thing1. It's important that Worker A has finished all of it's Thing1 processing, before Worker B starts getting Thing1 messages.

Are they any approaches or plugins that can ease this issue?

Entice answered 10/8, 2016 at 12:5 Comment(4)
My first question is why can't you use multiple worker for the same queue? You have to restrict one worker to one queue only when you have to ensure a certain order.Discretionary
I did have to read it multiple times. Message order matters to you. So you need to check if all queues are emtpy before you add a new worker. The plugin states that there is a race condition when you add a new queue. You can end up with messages duplicated in rarer cases. I wouldnt use this plugin add all in such a special case. I would create one input queue where one dispatcher worker listens and this process is responsible for distributing all mesages to right msg queue.Discretionary
@Discretionary Workers are managing state in memory, so one worker must only be managing state for Thing1.Entice
In this case I would suggest you use a dispatcher worker instead of the hash exchange. A dispatcher worker is more flexible. It can keep track which message is allready processed and which not before using a new queue for the next request. I think the risk of message duplication has to many impacts to your app.Discretionary
D
0

In this case I would use a worker which dispatches the messages instead of hash exchange.

producer1 ... producern => topic-exchange => queue => dispatcher worker => queue1 ... queuen => worker1 ... workern

This way the dispatcher worker can keep track of all messages. For this you can simple check how many messages are left within the queue or you can check the worker confirm message or you can use rabbitmqs RPC features.

Discretionary answered 10/8, 2016 at 14:53 Comment(7)
So I effectively implement the distributed hash exchange (plus a few features) in a separate worker?Entice
In this special case yes. It's more work but gives you the fine grained control you want. One more thing I can think of is a kind of controler worker you can "ask" as an rpc implemention if it is valid to process the msg at the moment from within the actual worker.Discretionary
What I didn't ask, maybe it's possible that you wait inside your producer until the last message from this producer is processed. Then you have only to change your producer.Discretionary
I can change anything I like, just wondering what the best approach is and hoping that some rabbitmq plugin solves all my problems :). Yes I could add a synchronisation step - but this ultimately will slow down the system. Really what I'd like is for the hash exchange to have memory of where it sent what routing key.Entice
rabbitmq already solves a hughe amount of problems. What you'r looking for sounds like a feature request for consistent hash exchange. I don't know if its easy because for this feature you need also a kind of producer tracking. You need to know which producer did send which msg. Furthor more you need to know when it is save to switch a producer to a new queue.Discretionary
IMO there are at least 2 reasons why this solution is not optimal: 1) Having a single dispatcher consumer, with its single queue, defies the purpose of sharding your load across multiple queues. 2) In order to effectively implement this, you need some sort of adhoc communication between the dispatcher and the workers. How will the dispatcher know which workers and queues are waiting for messages? how will the dispatcher know if a consumer stopped unexpectedly? This is all resolved gracefully by the exchange, whereas in the dispatcher, you will have to pay quite a lot of complexity and overheadCaffrey
1) Yes, your right. For scaling the dispatcher you need a little bit more efford and logic. 2) It is possible to check queue size for simple situations. For more complex situations the dispatcher can check if a msg was acknoleged. Rabbitemq provides this feature. Or you need to implement RPC like you find it within the rabbitmq docs. The check for ack or reject is also valid when the worker stops unexpected. When you can fullfill all requirements within a exchange great! But at the moment the exchange does not cover all featueres asked within the question.Discretionary

© 2022 - 2024 — McMap. All rights reserved.