Point to point messaging in scalabale application?
Asked Answered
R

2

7

After googling how message is sent/received in chat messenger like whatsapp, i came across they use queues based messaging system. I am just trying to figure out what can be high level design of this feature

HLD per mine understanding :- Say Friend 1 and Friend 2 are online . Friend 1 has established HTTP web connection to web server 1 and Friend 2 has established HTTP web connection to web server 2. Friend 1 send the message to Friend 2.

Now as soon as message comes to web server 1, i need to convey the message to web server 2 so that message can be pushed back to friend 2 through already established web connection.

I believe distributed custom java queues can be used here to propagate the message from one server to another. As soon as message comes to one server , it will push it to distributed queue(distribute queue because of load balancing and high availability) with message content, fromUserId, toUserId. There will be listener on queue which will see destination userId of just poppedIn message and find on which webserver destination userId is active . If user is active pop out the message and push it to client otherwise store it in db so that it can be pulled once once gets online. To see which user is active on which server, there we can maintain the treemap with userId as key and value as serverName for efficient look up

Probably actual design must be more complex/scalable than above brief . Would like to know if this is the right direction for scalable chat messenger?

Also i believe we need to have multiple distributed queues instead of one for such a scalable application. But if we have multiple distributed queues how system will ensure the FIFO message delivery across distributed queues ?

Roulette answered 28/1, 2017 at 12:53 Comment(0)
N
2

Would like to know if this is the right direction for scalable chat messenger?

Designing this application using message queues has the following benefits:

  • Decoupling of client-server and reduce of failure blast: Queues can gracefully handle traffic peaks, by just having a temporarily increased queue size, which will be back to normal as long as traffic normal again (or any transient failures have been fixed)
  • In a messaging application, clients (mobiles) can be offline for long periods. As a result, a synchronous design would not work, since the clients might not be accessible for message delivery. However, with an asynchronous design as with message queues, the responsibility of message delivery is on the client side. As a result, the client can poll for new messages as soon as it gets online.

So, yes this design could be quite scalable in terms of performance and usability. The only thing to have in mind is that this design would require a separate queue for each user, so the number of queues would scale linearly with the number of the application's users (which could be a significant financial & scalability issue).

But if we have multiple distributed queues how system will ensure the FIFO message delivery across distributed queues ?

Many queues, either open-source (rabbitMQ, activeMQ) or commercial (AWS SQS), support FIFO ordering. However, the FIFO guarantee inside the queue is not enough, since the messages sent by a single client could be delivered to the queue in different order due to asynchronicity issues in the network (unless you are using a single, not-distributed queue and TCP which guarantees ordered delivery).

However, you could implement FIFO ordering on the client side. Following this approach, the messages would include a timestamp, which would be used by each client to sort the messages when receiving them. The only side-effect of that is that a client could see a message, without having seen all the previous messages first. However, when previous messages arive, they will be shown in the correct order in the client's UI, so eventually the user would see all the messages and in the correct order.

Nitpicking answered 31/1, 2017 at 22:41 Comment(22)
Thanks Domos. Couple of follow up question on dedicated queue for each user.However, with an asynchronous design as with message queues, the responsibility of message delivery is on the client side. As a result, the client can poll for new messages as soon as it gets online - I believe we need to persist the message in DB if receiver is offline otherwise queue size will continue to grow in memory.Roulette
The only thing to have in mind is that this design would require a separate queue for each user, so the number of queues would scale linearly with the number of the application's users (which could be a significant financial & scalability issue). - Do you mean as soon as message comes up, we will create queue for each user( I believe here you mean receiver user not a sender user. Right ? ) Will this queue exist forever or will it die if user gets offline ? Also i think this receive queue can be created once any user comes online or sender sends a message for receiver. ...Roulette
While coming online each user will check if queue is already created for him, if yes simply subscribe to that queue otherwise create queue and subscribe. Right ?Roulette
What about if instead of creating a queue for each user you requeue those messages that can't be delivered because of the user being offline?Crustacean
1. Not necessarily, for example Amazon SQS has not limit for the queue size. It can grow infinitely. 2. I mean one queue per account of your application. This is because even in group chats, you want a message to be delivered as many times as users participating in the chat. So, in this case, each message will be delivered to the queue of each participant of the chat. In a normal chat, the message will be delivered to the single recipient.Nitpicking
3. The queue can be created once, when the user creates the account. This queue will be used for delivering all the messages for this user (for all the chats he participates). 4. There are surely other feasible designs. The single-queue-per user was proposed, because queue messages are read only once. For instance, if you have a queue per chat, only the first user will read a message (instead of all the participants)Nitpicking
@Nitpicking 1. Not necessarily, for example Amazon SQS has not limit for the queue size. It can grow infinitely. Will queue implementaion like Amazon SQS(or any other) be faster than then DB stores like oracle etc ? My understanding:- Probably Queue based implementation will be faster becoz its in memory store but in oracle it has to be fetched from hard disk, though in both cases call has go over network and fetch it . Is that correct ? ....Roulette
Once the message is delivered , I believe we need to store those message in DB also to handle the case where user wants to read the some past message(after scrolling up). Is n't it ? Or Queue can solve this requirement too.I think no.Roulette
After the message is delivered, it should be stored in the mobile device. However, what you are mentioning is applicable in case you want the user to be able to retrieve the messages even when uninstalling the re-installing the application.Nitpicking
However, what you are mentioning is applicable in case you want the user to be able to retrieve the messages even when uninstalling the re-installing the application . That's correct. In that case we need to store the message in DB also. Probably once message is read from receiver it can put it in separate queue and batch job can save it to DB from there ?Roulette
Also do you agree with analysis that queue based implementation will be faster becoz its in memory store but in oracle it has to be fetched from hard disk, though in both cases call has go over network and fetch it . Second factor that makes queue soloution more scalable is that we can push the message to client but in case of DB client has to pull it which too many calls on server after regular interval of timeRoulette
@Nitpicking Can you please reply on mine last two comments ?Roulette
Probably once message is read from receiver it can put it in separate queue and batch job can save it to DB from there, yes that's right, one more queue will be needed. I would advise not to use queues that store messages only in-memory. Most of the queues now persist messages in disk as well. The superiority to databases comes thanks to the access patterns (not memory vs disk). Better scalability is mainly because queues can be distributed much easier than databases and they are much more elastic. Check youtu.be/zwLC5xmCZUs?t=1494 to see what I'm talking about.Nitpicking
when you say Most of the queues now persist messages in disk as well. The superiority to databases comes thanks to the access patterns (not memory vs disk). do you mean queues now a days can keep the data under both places at same time i.e. in-memory and DB ? I believe those queue must be putting the data in DB asynchronously .Roulette
Not necessarily. Two of the most used message queue brokers (ActiveMQ, rabbitMQ) provide synchronous persistence mechanisms. Check rabbitmq.com/confirms.html and activemq.apache.org/async-sends.htmlNitpicking
@Nitpicking i read both links but have a question. I think you are talking about durable queues. My question is will durable queue keep the message in memory first and after some configurable time will persist in DB ? Or it will keep it in both memory and DB as soon as message arrives.Roulette
It depends on the mechanism you use. That's the difference between sync calls and async calls. Sync calls persist the message to disk before responding to client, while async calls reply directly before persisting to disk. You will have to read thoroughly the documentation for more details about specific queue products.Nitpicking
@Nitpicking Thanks. One last question. Are you aware if whatsapp or any other major chat app really uses model of creating dedicated in- memory queue for each user while creating user account? i mean whats app has more than .6 billion users, so just wondering if they really created .6 billion queues?Roulette
I am not aware for specific applications. But, I imagine that they are are not dedicating one queue for each user (or even per chat group). They probably have made more flexible design decisions. You can skim their engineering blogs and find out, they will have already revealed some of their design decisions there.Nitpicking
@Nitpicking do you have any links to their engineering blog ? I tried to google out for many days but did not find good link about design decision except one high level resource i.e. quora.com/What-is-WhatsApps-server-architectureRoulette
blog.whatsapp.com You can also have a look at this blog opensourcery.co.za/2009/04/19/…Nitpicking
@Nitpicking can you share your thoughts on #42189093 too when you get some time ?Roulette
C
1
 Would like to know if this is the right direction for scalable chat messenger?

I would probably prefer a slightly different approach. Your ideas are correct, but I would like to add up a bit more to the same. I happened to create such a chat messenger a few years ago, and it was supposed to be quite similar to watsapp. I am sure that when you googled, you would have come across XMPP Extensible messaging and presence protocol. we were using openfire as the server that maintains connections . The concept that you explained where

Say Friend 1 and Friend 2 are online . Friend 1 has established HTTP web connection to web server 1 and Friend 2 has established HTTP web connection to web server 2. Friend 1 send the message to Friend 2.

is called federation, and openfire can be run in a federated mode. After reading through your comments, i came across the one queue per user point. I am sure that you already know that this approach is not scalable as its very resource intensive. A good approach would be use an Actor framework such as akka. Each actor is like a light weight thread in java and each actor has an inbox. so messaging is taken care of in this case.

So your scenario transforms to Friend 1 opens a connection to openfire xmpp server and initializes a Friend 1 Actor.When he types a message, it is transferred to the Friend 1 actor's in-box ( Each actor in akka has an in memory inbox). This is communicated to the xmpp server. The server has a database of its own, and since it is federated with other xmpp servers, it will try to find if friend 2 is online. The xmpp server will keep the message in its db until the friend 2 comes online. Once friend 2 establishes a connection to any of the xmpp server a friend 2 actor is created and its presence is propagated to all other servers and the xmpp server 1 will notify Friend 2's actor. Friend 2's actor inbox will now get the message

Optional: There is also a option of delivery receipt. Once Friend2 reads the message, a delivery receipt can be sent to friend 1 to indicate the status of the message i.e read, unread, delivered, not delivered etc.

Coastwise answered 3/2, 2017 at 12:27 Comment(4)
As you said Each actor is like a light weight thread in java and each actor has an inbox. I think having inbox for each actor is kind of similar to having dedicated queue per user. Both will be in memory model . Like queue , inbox to hold the message will also consume memory. Do you see any major difference here ? So i believe model you suggested is more or less same to the message queue model with some implementation difference.Roulette
2. In case of queue model, when delivering the message from sender, system will search what is the location of queue of receiver (from some metadata created when user is coming online)and deliver it. I am sure there will be off the shelf framework which may providing this. I believe what you are saying Akka running in federated mode, provides the similar featureRoulette
you are right in both regards. the mailbox is like a wrapper implemented on top of a ConcurrentLinkedQueue. and the mailbox can be both per actor as well as shared between the actors as well which does sound similar to federation.Coastwise
@Raveesh, you are talking about keeping messages in the server using some data structure like ConcurrentLinkedQueue. Don't you think that is a bad idea because if server goes down before even trying to deliver the message once then message is lost? Shouldn't it be a distributed queue out of the server?Isauraisbel

© 2022 - 2024 — McMap. All rights reserved.