Realtime Scalable Chat App - which database should I choose?
Asked Answered
R

3

7

I am looking to build a scalable real-time chat app (I am just doing this for fun and out of interest so please don't ask why!) and I know that I am going to be handling the realtime messaging part through redis but I am not sure of what database to use for the following information:

  • User Relationships (friends)
  • Cold Chat History - this would only be queried in limited amounts (maybe like 50 messages) ordered by timestamp and queried in reverse (just as your messages would load in imessage or whatsapp when scrolling to view older messages)
  • Chat user relationship

I know for cold chat history an RDBMS or Cassandra is probably my best bet but handling friend relationships, as well as user-to-chat relationships in an RDBMS or cassandra, is ugly. I'm not sure if it's necessary, worth it, or even "right" to have a graph database in my tech stack just for this relationship mapping.

I was thinking of MongoDB or some other document-based storage could be a solution but querying the data seems like it would be really taxing. My thoughts were to have a chat document that has a list of users and then I would have several other documents with a list of message id's pointing to message documents. These documents would be mapped back to the chatID. I'm sure you can see though, the time and resources to query a set of messages would be quite high. Maybe I'm just underestimating the power of MongoDB as I haven't really used it. I would also be more easily able to handle the Chat User Relationship using documents as well as friendships by just storing user-ids in a list within the document.

I understand there is no perfect tool for the job but I would like someone's thoughts and inputs on how to design the data storage.

Thank you in advance!

Ransell answered 4/4, 2021 at 21:33 Comment(0)
O
2

If transaction volume is not high then you can go with Postgresql otherwise Cassandra is a good choice for all your mentioned requirements. In Cassandra you should have multiple tables in de-normalized for low latency and high availability.

  1. User - Create a master table having all information of any user.
  2. User_Friend_relation - Create another table having composite primary key as userid & freindid with clustering key is_active(0,1) desc. ((userid,freindid),is_active)
  3. Chat_user_friend - This is your main table having all chat. Create this table with timestamp as clustering key and store data in desc order so that you can save time by ordering in real time and you have latest data first.
  4. Cold Chat History - As Cassandra is highly scalable... no need of this table.

Data modeling is an area where a lot of discussions are required, anyways I tried to answer this as simple as possible.

Olindaolinde answered 13/4, 2021 at 12:10 Comment(0)
W
4

It's best to keep relationship in realtional database. I use PostgreSQL for such purposes in my chat applications.

For chat history and other events Cassandra is a good choice (I also use Cassandra). However it depends on your database size (records quantity). If you don't need to keep tens of thousands historical messages for thousands users then using Cassandra will be an overkill. In this case you can also use PostgreSQL or another relational database. In PostgreSQL you can optimize an access to history tables using partitioning.

Warfourd answered 5/4, 2021 at 7:59 Comment(0)
O
2

If transaction volume is not high then you can go with Postgresql otherwise Cassandra is a good choice for all your mentioned requirements. In Cassandra you should have multiple tables in de-normalized for low latency and high availability.

  1. User - Create a master table having all information of any user.
  2. User_Friend_relation - Create another table having composite primary key as userid & freindid with clustering key is_active(0,1) desc. ((userid,freindid),is_active)
  3. Chat_user_friend - This is your main table having all chat. Create this table with timestamp as clustering key and store data in desc order so that you can save time by ordering in real time and you have latest data first.
  4. Cold Chat History - As Cassandra is highly scalable... no need of this table.

Data modeling is an area where a lot of discussions are required, anyways I tried to answer this as simple as possible.

Olindaolinde answered 13/4, 2021 at 12:10 Comment(0)
M
2

It may be too late but could help others: I still think, efficient data modeling with RDB like postgreSQL is much better. postgres' capability to handle and parse semi-structured data like JSONs is impressive. So, for all the needs, unless you're handling tons of chat histories as Antek mentioned.

This is a very informative post in that regard.

Millicent answered 11/6, 2023 at 13:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.