Moving in Closure Table with Multiple Parents
Asked Answered
I

4

6

I have the following DAG

A --> B

|     |
v     v

C --> D

Here is the closure table

| Ancestor | Descendant | Depth |
---------------------------------
| A        | A          | 0     |
| A        | B          | 1     |
| A        | C          | 1     |
| A        | D          | 2     |
| A        | D          | 2     |
| B        | B          | 0     |
| B        | D          | 1     |
| C        | C          | 0     |
| C        | D          | 1     |
| D        | D          | 0     |

How would I go about removing path B > D (thus removing A > B > D) without also removing A > C > D and C > D.

Right now I'm using the following query but it only works when every node only has 1 parent.

DELETE FROM `Closure`
WHERE `Descendant` IN (SELECT `Descendant` FROM `Closure` WHERE `Ancestor`=@Node)
AND `Ancestor` NOT IN (SELECT `Descendant` FROM `Closure` WHERE `Ancestor`=@Node);
Inigo answered 9/3, 2012 at 22:38 Comment(0)
S
3

Given your current model I'm not sure it's possible. I'd propose that you add a column to count the number of paths tracking how many different ways there are to get from any node X to node Y.

So rather than your table you end up with

| Ancestor | Descendant | Depth | Refs  |
-----------------------------------------
| A        | A          | 0     | 1     |
| A        | B          | 1     | 1     |
| A        | C          | 1     | 1     |
| A        | D          | 2     | 2     |
| B        | B          | 0     | 1     |
| B        | D          | 1     | 1     |
| C        | C          | 0     | 1     |
| C        | D          | 1     | 1     |
| D        | D          | 0     | 1     |

Removing a node would then entail an update statement followed by a delete statement. The update would, instead of deleting the entries it finds, decrement the number of references for that entry. Then you could delete the entries with 0 or fewer references afterwards.

Coming up with a SQL query which does the update is escaping me at the moment but in theory this should work without having to completely reconstruct the closure table...

Sphalerite answered 13/6, 2018 at 19:36 Comment(1)
Interesting approach. I am wondering if multiple refs from ancestor to descendant could have different depths, how would you update the depth if one of the refs is removed?Alston
M
2

First, I believe there is a duplicate entry in your table. (A,D) appears twice. Second, after removing the edge (B,D), the following paths should remain:

  1. Node self-maps: (A,A), (B,B), (C,C), (D,D)
  2. (A,B)
  3. (A,C)
  4. (A,D) ( through C )

Thus, to remove the edge (B,D) in this example, all that is required is to remove that one row:

Delete MyTable 
Where Ancestor = 'B'
    And Descendant = 'D'

A closure table is still only mapping relations between two nodes. What makes it special is that it is mapping every indirect relation effectively as a direct relation. The edge (B,D) is simply saying that you can get from B to D. That row alone says nothing about how you got to B nor does it say anything about how many nodes it took to get from B to D; it simply saying you can get from B to D. Thus, there is no edge listed for A > B > D per se. Rather, all that is captured is that you can get from A to B and from A to D which is still true even if the edge (B,D) is removed.

Moureaux answered 29/5, 2013 at 17:48 Comment(3)
The duplicate (A,D) is actually the root of this question - it exists to show that in a closure table you would have one (A,D) caused by ABD and one from ACD. How do I detect when another path intersects in this diamond and not delete (A,D)? Preferably only using SQL and only given the edge (B,D).Inigo
@Inigo - It make no sense to have (A,D) in there twice. A closure table doesn't describe how one node connects with another; only that one node connects with another. Thus, if (A,D) is in the table, it doesn't matter if the path to get from A to D was A > B > C > E >.....> X > Y > D or A > D. That's the point. The closure table saves you the trouble of having to calculate that path each time. If you are trying to capture the paths used to get from one node to another, then a closure table isn't the solution. Instead store the graph in an edges table and calculate the path.Moureaux
ah, yeah I'm having a hard time remember why I asked this question to be honest. I can't really remember what we decided to do as a solution. I think the original intention was to figure out a way to remove the edge from (B,D) and remove any entries in the closure table caused by it without accidentally removing edges that still exist. The only solution I can immediately think of is to rebuild the entire closure table from nodes you know are only 1 step away.Inigo
H
1

In natural language, that would be: "Delete ancestor-descendant relashionship to D, if there is no parent of D besides B that is also a descendant of A". Is that correct?

(Edit: no, that's not correct; not only relashionships to D must be removed, but also relashionships to every descendant of D. Thus, that criteria is not valid...)

My tentative SQL would then be:

DELETE a
FROM Closure AS a
    INNER JOIN Closure AS b ON a.Descendant = b.Descendant
WHERE
    a.Descendant IN (SELECT Descendant FROM Closure WHERE Ancestor = {Child}) AND
    b.Depth = 1 AND
    b.Ancestor != {Parent} AND
    a.Ancestor NOT IN (SELECT Ancestor FROM Closure WHERE Descendant = b.Ancestor)

(Sorry if I got the query wrong - or used non-standard features - I'm not actually experienced with that. But my natural language description should give an insight for what actually needs to go on the query)


Update: On second thought, I don't believe my query will work for all cases. Consider this:

A --> B --> D --> E --> F
  1. F is a descendant of D (True)
  2. E is a parent of F (True)
  3. E is not B (True)
  4. A is not an ancestor of E (False)

Thus, A >> F won't be removed, even though it should. Sorry I couldn't help, but that seems a problem too big to fit in a single query. I'd suggest looking for an algorithmic solution first, then seeing how that could be implemented in your case.

Hinrichs answered 9/3, 2012 at 23:31 Comment(0)
F
0

While tracking depth and allowing multiple parents at the same time is probably possible, I get a code smell from it, especially when the solution involves having duplicates pairs. Thomas' answer outlines that issue.

I'm going to simplify the question a bit to just focus on unlinking a node when multiple parents are allowed, because it's a tricky enough problem on its own. I'm discarding the depth column entirely and assuming there are no duplicate pairs.

You have to take into account children of D, which gets complicated fast. Say we have:

A --> B
       
|     |
v     v

C --> D --> E

We want to unlink D from B, which means we also have to remove links between E and B. But what if they're connected like this:

A --> B --> C
             
      |     |
      v     v

      D --> E < -- G
      
      |
      V

H --> F

In this case if we disconnect B > D, we don't want to unlink B and E anymore, because E is still linked to B through C. But we do want to disconnect F from B.

I'll go through my solution below using that second example. I know that D only has one parent in this example, but it still works perfectly if D has multiple parents; I can more easily demonstrate some of the edge cases this way so that's why I'm doing it like this.

Here's what the table would look like:

| Ancestor | Descendant |
-------------------------
| A        | A          |
| A        | B          |
| A        | C          |
| A        | D          |
| A        | E          |
| A        | F          |
| B        | B          |
| B        | C          |
| B        | D          |
| B        | E          |
| B        | F          |
| C        | C          |
| C        | E          |
| D        | D          |
| D        | E          |
| D        | F          |
| E        | E          |
| F        | F          |
| G        | G          |
| G        | E          |
| H        | H          |
| H        | F          |

Query 1: Get all descendants of D, including D

SELECT `Descendant`
FROM `Closure`
WHERE `Ancestor` = @Node

This will return: D, E, F

Query 2: Get all ancestors of B, including B

SELECT `Ancestor`
FROM `Closure`
WHERE `Descendant` = @ParentNode

This will return: A, B

Query 3a: Get all ancestors of the items in Query 1 that do not appear in Query 1 or 2

SELECT DISTINCT `Ancestor`
FROM `Closure`
WHERE `Descendant` IN (@Query1)
AND `Ancestor` NOT IN (@Query1)
AND `Ancestor` NOT IN (@Query2)

This will return: C, G, H

The goal here is to get all parents of E and F that may reconnect farther up the chain.

Query 3b: this is exactly the same as Query 3a, except it returns both ancestors and descendants

SELECT DISTINCT `Ancestor`, `Descendant`
[ ... ]

This will return: (C, E), (G, E), (H, F)

We'll need this later.

Query 4: Filter results of Query 3a down to nodes that reconnect farther up the chain

SELECT `Ancestor`,`Descendant`
FROM `Closure`
WHERE `Descendant` IN (@Query3a)
AND (`Ancestor` IN (@Query2) OR `Ancestor` = `Descendant`))

This will return: (A, C), (B, C), (C, C)

We now have references to all parents of C that should not be unlinked. Note that we have no links to parents of F. That's because F is not connected to B and A except through D (which we're unlinking).

Query 5: Construct the pairs that should be kept, using the results of Query 3b as a bridge between Queries 1 and 4

SELECT `Query4`.`ancestor`,`Query3b`.`descendant`
FROM (@Query3b) as `Query3b`
LEFT JOIN (@Query4) as `Query4`
WHERE `Query3b`.`descendant` IN (@Query1)

This will return: (A, E), (B, E)

Query 6: Run the regular query for orphaning a node and its children, except exclude all pairs returned by Query 5

DELETE FROM `Closure`
WHERE `Descendant` IN (@Query1)
AND `Ancestor` IN (@Query2)
AND (`Ancestor`, `Descendant`) NOT IN (@Query5)

After this operation, we will have removed the following links:

| Ancestor | Descendant |
-------------------------
| A        | D          |
| A        | F          |
| B        | D          |
| B        | F          |

Both D and F are correctly unlinked, and E correctly retains its connections to A and B, leaving:

A --> B --> C
             
            |
            v

      D --> E < -- G
      
      |
      V

H --> F

Let me know if I've missed anything! I just solved this problem myself today and I may run into more edge cases as time goes on. If I find any I'll update this answer.

Flowerpot answered 1/2, 2022 at 23:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.