What is the Cassandra database schema used in Reddit? [closed] - McMap

About

What is the Cassandra database schema used in Reddit? [closed]

Asked 30/4, 2012 at 8:19 Answered 2/1, 2014 at 6:3

cassandra database-schema reddit

S

1

9

Reddit is currently migrating its database from PosgreSQL to Apache Cassandra. Does anybody know what database schema does Reddit use in Cassandra?

Strangury answered 30/4, 2012 at 8:19 Comment(4)

I don't, and I'm not sure anyone outside of Reddit does, but should it really matter? I expect quite a few people here on SO could help determine the schema that's right for your application. – Guarani 3/5, 2012 at 18:59

Reddit published the code that powers the site on GitHub: github.com/reddit/reddit . I could search in the code and determine the schema from there. But I thought that is easier to ask here. – Strangury 5/5, 2012 at 17:24

I glanced through the code, and I see something like two dozen different columnfamilies that get created and used in different ways. Is there an area in particular you're wondering about, or are you looking more for something like show schema output from cassandra-cli? – Guarani 5/5, 2012 at 21:45

I want to design an application that uses trees of comments as in Reddit. So, since my application is similar, I am trying to get some inspiration from there. – Strangury 17/5, 2012 at 9:42

S

-2

I also don't know the exact Reddit schema, but for what you want to archive, you are on the right way, saving a hierarchy of comments in a document based database instead of a relational database. I would recommend to keep one document for each root-comment, and then add all the children (and children of the children) to that comment.

In CouchDB and MongoDB you can store JSON documents directly. In Cassandra I would save the JSON as a String. So the data structure would be only

root-comments
{
    root-comment-id
    root-comment-json-string
}

and each root-comment-json-string would look like this:

{
comment : "hello world"
answers : 
[
    {
    comment : "reply to hello world"
    answers : 
    [
        {
        comment : "thanks for the good reply"
        answers : []
        },
        {
        comment : "yes that reply was indeed awesome"
        answers : []
        }

    ]
    }

]
}

additionally you might want to add a UserName, UserID, Timestamp, .... etc. to the structure of each comment.

This 'denormalized' structure will make make the queries very fast compared to a normalized relational stucture IF you have A LOT of data.

In any case you will have to take care of all the exceptions, that can happen when you implement such a system for a large user scale, eg. What happens if someone replies to comment A with comment B, but at the same time (or later) comment A is deleted.

If you search the internet for "cassandra hierarchical data" you find some other approaches, but they all go back to normalisation or they are not complete for a 'infinite' hierarchy.

Sprite answered 2/1, 2014 at 6:3 Comment(4)

The problem with the approach you describe is that any time a new comment is added you have to update the JSON, that is, parse it, merge the comment in it and then write it to Cassandra. Imagine a tree with thousands of comments. So this approach is cheaper for retrieval, but expensive when updating. The relational approach is the other way around, expensive when retrieving a comment tree and cheap when updating. I think the best approach is a hybrid one, store the most import comments as you describe and the least important in a relational manner. – Strangury 2/1, 2014 at 11:45

@Calin-AndreiBurloiu Yes true. It is my understanding that such a comment system (like in reddit) has far more reads than updates. So my answer is exactly the correct solution. – Sprite 6/1, 2014 at 3:1

And the parsin of the result can easily done in JavaScript on the client, since it is JSON. – Sprite 6/1, 2014 at 3:3

Kind of a newbie to this and your post helped to explain the structure a lot. In this case, how would inserting a comment work? I imagine there's a method addComment(parentId,commentString) - how do we find that particular node to insert a new child? – Gilman 20/5, 2014 at 7:31

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.