Running a SELECT query on a closure table with a JOIN?
Asked Answered
A

1

0

I have an app set up which has nested comments attached to posts. I decided to use the closure table method (slide 40) for the comments due to how cheap hard disk space is versus how easy it seems to be to query and manage the tree structure. However, I'm running into a problem. I can't seem to figure out how to grab the tree path based on the post ID, not the ancestor ID (slide 49).

My database structure looks like this:

table: comment_paths
--------------------
parent_id (fk on comments.id)
child_id  (fk on comments.id)
depth

table: comments
---------------
id
parent_id (fk on comments.id)
post_id   (fk on posts.id)
text

table: posts
---------------
id
name

If I know the parent_id beforehand, as in the slideshow, it's simple to grab the tree:

SELECT c.*, p.*
FROM comments AS c
JOIN comment_paths AS p
ON c.id = p.child_id
WHERE p.parent_id = 1

However, I don't know the parent_id beforehand; I only know the post_id. The way the database is set up, there is more than one tree associated with a post:

                 [post]
-----------------------------------------
[comment]       [comment]       [comment]   depth: 0
    |                               |
 [reply]                         [reply]    depth: 1
  |   |
 [r] [r]                                    depth: 2

My initial solution was a query like this:

SELECT c.*, p.*
FROM comments AS c
JOIN comment_paths AS p
ON c.id = p.child_id
WHERE p.parent_id IN
    (SELECT id FROM comments WHERE parent_id IS NULL AND post_id = 6)

It returns the correct data, but I feel like a nested SELECT like that isn't correct. Is there a better way to do this?

Thanks!

Archi answered 17/4, 2013 at 16:8 Comment(2)
Try this query SELECT c.*, p.* FROM comments AS c JOIN comment_paths AS p ON c.id = p.parent_id WHERE c.parent_id IS NULL AND c.post_id = 6Heartland
@Heartland didn't work. :( That only returned the very top of the tree.Archi
V
2

I can't really think of a better way. The only change I would make is to use EXISTS instead of IN:

SELECT c.*, p.*
FROM comments AS c
JOIN comment_paths AS p
ON c.id = p.child_id
WHERE EXISTS
(SELECT * FROM comments c2 WHERE p.parent_id = c2.id AND c2.parent_id IS NULL AND c2.post_id = 6)

I'm interested to see if there actually is a better method though.

Update:

I'm not sure what RDMS you're using. But, if recursive queries are available, I'm thinking you could get away from the comments_path table entirely:

;with cte as(
select c.*, 0 as depth
from comments c
where c.post_id = 6 and c.parent_id is null
union all
select c.*, cte.depth + 1
from comments c
join cte on c.parent_id = cte.id)

select * from cte
Varmint answered 17/4, 2013 at 17:9 Comment(3)
Thanks for your answer. Is there any reason to use EXISTS versus IN? Substituting it returns a very different data set. It looks like it does a join on every record in the comment_paths set instead of on the comments set.Archi
The different result set was due a typo in the join (I had " = p.parent_id" instead of " = p.child_id"). In my experience, EXISTS performs better than IN. In certain instances, there may not be a performance boost, but it should always perform as well as, if not better than IN.Varmint
I just saw your recursive queries update. That's definitely a route I would like to take with an RDMS that provides it should it be faster than a closure table, but I'd like my project to be able to run on as many RDMS as possible and avoid RDMS-specific queries just to keep bugs to a minimum. Thanks so much for the update, though!Archi

© 2022 - 2024 — McMap. All rights reserved.