Reddit is currently migrating its database from PosgreSQL to Apache Cassandra. Does anybody know what database schema does Reddit use in Cassandra?
I also don't know the exact Reddit schema, but for what you want to archive, you are on the right way, saving a hierarchy of comments in a document based database instead of a relational database. I would recommend to keep one document for each root-comment, and then add all the children (and children of the children) to that comment.
In CouchDB and MongoDB you can store JSON documents directly. In Cassandra I would save the JSON as a String. So the data structure would be only
root-comments
{
root-comment-id
root-comment-json-string
}
and each root-comment-json-string would look like this:
{
comment : "hello world"
answers :
[
{
comment : "reply to hello world"
answers :
[
{
comment : "thanks for the good reply"
answers : []
},
{
comment : "yes that reply was indeed awesome"
answers : []
}
]
}
]
}
additionally you might want to add a UserName, UserID, Timestamp, .... etc. to the structure of each comment.
This 'denormalized' structure will make make the queries very fast compared to a normalized relational stucture IF you have A LOT of data.
In any case you will have to take care of all the exceptions, that can happen when you implement such a system for a large user scale, eg. What happens if someone replies to comment A with comment B, but at the same time (or later) comment A is deleted.
If you search the internet for "cassandra hierarchical data" you find some other approaches, but they all go back to normalisation or they are not complete for a 'infinite' hierarchy.
© 2022 - 2024 — McMap. All rights reserved.
show schema
output from cassandra-cli? – Guarani