mysql one-to-many table vs. key-value database list
Asked Answered
F

4

0

I am building a comment system, where a comment can have many replies.

If I were to implement this in mysql, I would build a comments table, and have the columns:

  • comment_id,
  • parent_comment_id.

Where parent comment id is 0 for a comment, and is the parent comment id for a reply. So if I was looking for replies for a certain comment, I would look for comments that have parent_comment_id to match the comment I am looking for.

This seems redundant to me as it will require me to go through the whole comments table just to find whether a comment has replies or not (especially for large data), Where if I have a key-store database, I would have a key for the comment id, and inside it would be a list of the replies ordered by date.

So which approach do you think is better for this problem?

Also, I would like to generalize the problem to any one-to-many relationship to be stored as a list in a key-store database. And if you recommend using a key-store database, which one would you recommend for large data? ( I do not want to use redis for that since it is in-memory, and I doubt that replies for comments need to be access frequently).

Thanks for replies.

Faris answered 3/2, 2013 at 9:37 Comment(3)
Do you have database performance problems? Relational databases are not great with tree structures, but if you limit it to one level if comments you might have better performance and you can get away with a self join to get all of the comments at once. No matter what you are going to have to set something that says that you have new comments, etc.Leicester
I don't have any database performance issues because I have not started yet, I am just trying to talk theoretically here. I do have one level of comments and that's it but I expect the table to be extremely large. May I ask what self-join means :) ?Faris
self join means the SQL query will reference its self more than once in the query. Like select main.*, child.* comments main join comments child on main.id = child.parent_id where main.id = 1;Leicester
M
4

A relational database should handle this "adjacency list" model just fine.

First of all, don't use 0 in the parent_comment_id of the "root" comment, use NULL. Then you can build a FOREIGN KEY from parent_comment_id to comment_id that would prevent you from mistakenly attaching a reply to a non-existent comment.

it will require me to go through the whole comments table just to find whether a comment has replies or not

Assuming you have indexed the parent_comment_id (which InnoDB did automatically if you created the FK above), finding the first level of replies to the given comment will require an index range scan. To understand index range scans and why they are efficient, you first need to understand the Anatomy of an SQL Index.

Finding the second level would require another range scan etc. Unfortunately, MySQL doesn't support the recursive query that would allow you to do all that in a single database round-trip, but it should still be fairly efficient.

If you have preformed measurements and concluded that's a problem, there are other strategies for representing hierarchies (with different trade-offs), such as "nested sets" and "closures". Take a look at this presentation by Bill Karwin.

Martica answered 3/2, 2013 at 11:24 Comment(1)
great answer regarding mysql's part of the question. I am only using one level of parenthood so I only need to create the foreign key for the index and i should be fine. I will use mysql for it, but I still believe that key-store databases are more suited for lists since you are looking for 1 key to find a list of primary keys, maybe because i do not understand the underlying algorithms behind mysql. Thanks.Faris
A
2

In fact most relational databases will not have to go trough all the comments to find out which are the replies of a given comment. After all these types of queries are quite frequent and very optimized. Also consider building an index over parent_comment_id. Again this only works if you have a single level of parenship. If you may have a comment that is commented in its turn, maybe another mean of storing data will serve you better.

Ape answered 3/2, 2013 at 9:42 Comment(4)
Okay, most of my relationships will have one level of parentship. And the comments example is just one of them. So do you think building an index on parent_comment_id would be sufficient for a large dataset. I expect the comments table to be extremely large. For me, having a key list makes more sense at solving this problem, since you only need to find one key to find the list, then look them up by id in the table. But i guess you are saying that mysql is more optimized to deal with such queries.Faris
I think building an index should optimize things well enough. I suggest you try some example queries after building an index and see the performance. Only go with the keystore it is not satisfactory. Using database you will have a lot of functionality implemented for you for instance agregate functions, while with keystore you will probably have to re-invent the wheel.Ape
I will go with indexing and see how it goes. Thanks mateFaris
I agree that a comment with no parent should have a NULL rather than a zero in the parent_comment_id field. I also agree that an index would greatly speed up searched for replies.Roberts
R
2

An upvote for Branko's reply. an index on the parent field is good. NULLS work better than zeroes in this case. Plus a referential integrity constraint will help you more than it hurts you.

A couple of extra points.

If you use the approach rather than your existing adjacency lists approach, you will be able to search for the entire subtree made of of replies and replies to replies, etc. rather than just immediate replies. This could be useful.

Second, there is a data structure known as a "forest". This is a table containing a set of trees, where each tree has, as its root, in this case a comment with no parent. A web search should give you some good articles about designing a forest of discussions, where each discussion begins with a comment, and each discussion is a tree of replies. Many people have designed exactly this case.

Roberts answered 3/2, 2013 at 13:2 Comment(1)
I will use NULL for it, but for my application, i will be using 1 level of parenthood, so the forest structure won't be neccessary, i will read more for enlightenment purposes.Faris
P
2

you can create to tables and make it more flexible.

comments => comment_id, the_comment, count_replays

comments_replay => parent_id, the_comment

when there is a replay for comment there is an update for the count_replays.

and now you can do if statement if there is a replays and only then request them.

Providenciaprovident answered 3/2, 2013 at 13:13 Comment(1)
thanks, i will use the replies_count to check whether i need to fetch replies or not. Also having two tables is neat and organized, but will add on the complexity of migration or possible future sharding.Faris

© 2022 - 2024 — McMap. All rights reserved.