How to maintain a transitive closure table efficiently?
Asked Answered
F

3

14

I have a DAG in my relational database (Firebird) with two tables edge and node (adjacency list model). I want to query them recursively, but found recursive queries very inefficient. So I tried to implement triggers to maintain the transitive closure following the Dong et.al. paper http://homepages.inf.ed.ac.uk/libkin/papers/tc-sql.pdf.

SELECTs are now very fast, but DELETEs are extremely slow, because almost the whole graph is copied for a single delete. Even worse, concurrent updates seem impossible.

Is there a better way to implement this?

Edit

I did some experiments and introduced a reference counter to the TC table. With that, deletes are fast. I wrote some simple test cases, but I'm not sure if I'm doing right. This is what i have so far:

CREATE GENERATOR graph_tc_seq;

CREATE TABLE EDGE (
    parent DECIMAL(10, 0) NOT NULL,
    child DECIMAL(10, 0) NOT NULL,
    PRIMARY KEY (parent, child)
);

CREATE TABLE GRAPH_TC (
    parent DECIMAL(10, 0) NOT NULL,
    child DECIMAL(10, 0) NOT NULL,
    refcount DECIMAL(9, 0),
    PRIMARY KEY (parent, child)
);

CREATE TABLE GRAPH_TC_TEMP (
    session_id DECIMAL(9, 0),
    parent DECIMAL(10, 0),
    child DECIMAL(10, 0)
);

CREATE PROCEDURE GRAPH_TC_CREATE (p_parent DECIMAL(10, 0), c_child DECIMAL(10, 0))
AS 
    declare variable tp_parent DECIMAL(10,0);
    declare variable tc_child DECIMAL(10,0);
    declare variable session_id DECIMAL(9,0);
    declare variable refs DECIMAL(9,0);
begin
    session_id = gen_id(graph_tc_seq,1);
    insert into graph_tc_temp (parent, child, session_id, refcount) values (:p_parent, :p_parent, :session_id, 1);
    insert into graph_tc_temp (parent, child, session_id, refcount) values (:c_child, :c_child, :session_id, 1);
    insert into graph_tc_temp (parent, child, session_id, refcount) values (:p_parent, :c_child, :session_id, 1);
    insert into graph_tc_temp (parent, child, session_id, refcount) select distinct :p_parent, child, :session_id, refcount from graph_tc where parent = :c_child and not parent = child;
    insert into graph_tc_temp (child, parent, session_id, refcount) select distinct :c_child, parent, :session_id, refcount from graph_tc where child = :p_parent and not parent = child;
    insert into graph_tc_temp (parent, child, session_id, refcount) select distinct a.parent, b.child, :session_id, a.refcount*b.refcount from graph_tc a, graph_tc b where a.child = :p_parent and b.parent = :c_child and not a.parent = a.child and not b.parent = b.child;
    for select parent, child, refcount from graph_tc_temp e where session_id= :session_id and exists (select * from graph_tc t where t.parent = e.parent and t.child = e.child ) into :tp_parent, :tc_child, :refs do begin
        update graph_tc set refcount=refcount+ :refs where parent = :tp_parent and child = :tc_child;
    end
    insert into graph_tc (parent, child, refcount) select parent, child, refcount from graph_tc_temp e where session_id = :session_id and not exists (select * from graph_tc t where t.parent = e.parent and t.child = e.child);
    delete from graph_tc_temp where session_id = :session_id;
end ^

CREATE PROCEDURE GRAPH_TC_DELETE (p_parent DECIMAL(10, 0), c_child DECIMAL(10, 0))
AS 
    declare variable tp_parent DECIMAL(10,0);
    declare variable tc_child DECIMAL(10,0);
    declare variable refs DECIMAL(9,0);
begin
    delete from graph_tc where parent = :p_parent and child = :p_parent and refcount <= 1;
    update graph_tc set refcount = refcount - 1 where parent = :p_parent and child = :p_parent and refcount > 1;
    delete from graph_tc where parent = :c_child and child = :c_child and refcount <= 1;
    update graph_tc set refcount = refcount - 1 where parent = :c_child and child = :c_child and refcount > 1;
    delete from graph_tc where parent = :p_parent and child = :c_child and refcount <= 1;
    update graph_tc set refcount = refcount - 1 where parent = :p_parent and child = :c_child and refcount > 1;
    for select distinct :p_parent,  b.child, refcount from graph_tc b where b.parent = :c_child and not b.parent = b.child into :tp_parent, :tc_child, :refs do begin
        delete from graph_tc where parent = :tp_parent and child = :tc_child and refcount <= :refs;
        update graph_tc set refcount = refcount - :refs where parent = :tp_parent and child = :tc_child and refcount > :refs;
    end
    for select distinct :c_child,  b.parent, refcount from graph_tc b where b.child = :p_parent and not b.parent = b.child into :tc_child, :tp_parent, :refs do begin
        delete from graph_tc where child = :tc_child and parent = :tp_parent and refcount <= :refs;
        update graph_tc set refcount = refcount - :refs where child = :tc_child and parent = :tp_parent and refcount > :refs;
    end
    for select distinct a.parent, b.child, a.refcount*b.refcount from graph_tc a, graph_tc b where not a.parent = a.child and not b.parent = b.child and a.child = :p_parent and b.parent = :c_child into :tp_parent, :tc_child, :refs do begin
        delete from graph_tc where parent = :tp_parent and child = :tc_child and refcount <= :refs;
        update graph_tc set refcount = refcount - :refs where parent = :tp_parent and child = :tc_child and refcount > :refs;
    end
end ^

CREATE TRIGGER GRAPH_TC_AFTER_INSERT FOR EDGE AFTER INSERT as
begin
    execute procedure graph_tc_create(new.parent,new.child);
end ^

CREATE TRIGGER GRAPH_TC_AFTER_UPDATE FOR EDGE AFTER UPDATE as
begin
    if ((new.parent <> old.parent) or (new.child <> old.child)) then begin
    execute procedure graph_tc_delete(old.parent,old.child);
    execute procedure graph_tc_create(new.parent,new.child);
    end
end ^

CREATE TRIGGER GRAPH_TC_AFTER_DELETE FOR EDGE AFTER DELETE as
begin
    execute procedure graph_tc_delete(old.parent,old.child);
end ^

This is my own idea, but I think others have implemented an TC already. Are they doing the same thing?

I have some test cases, but I'm not sure if I might get an inconsistency with bigger graphs.

How about concurrency, I think this approach will fail when two simultaneous transactions want to update the graph, right?

Edit

I found some bugs in my code, and I'd like to share the fixed version with you.

I found a great article: http://www.codeproject.com/Articles/22824/A-Model-to-Represent-Directed-Acyclic-Graphs-DAG-o. Are there more interesting articles or scientific papers, with different approaches?

Folio answered 24/9, 2013 at 22:26 Comment(6)
Can you include (relevant parts of) the DDL and trigger definitions?Ivar
Consider using a GTT (global temporary table) instead of a normal table for GRAPH_TC_TEMPIvar
Do you want to query the tables recursively, or query particular graphs & connections?Lenhard
@Lenhard I want to query all connections recursively. For instance: what are all (indirect) parents or children of node X? Which nodes are between nodes X and Y?Folio
@ChristophWalesch you mention you found some bugs and want to share the fixed version. Where is your fixed version?Springwood
@Springwood What you see the fixed version. An older version of the question had the bugs i mentioned.Folio
L
1

I just fixed up a slow delete operation by extending to the transitive reflexive closure table model described here: http://www.dba-oracle.com/t_sql_patterns_incremental_eval.htm. It took a little more work to fully maintain the paths count within it, but it payed off big when deletes went from a 6 second each individual remove operation to negligable (I can now delete every relationship in the graph, and then add them all back in 14 seconds total for 4,000 relationships).

Landa answered 8/8, 2014 at 20:35 Comment(1)
And for bonus, shortest path lengths can be maintained similarly to the total count of paths tjhsst.edu/~rlatimer/acm/DatabaseSystems/…Landa
M
1

SQL is not the right tool for dealing with graphs. Use one of these :

http://en.wikipedia.org/wiki/Graph_database

I like very much ArangoDB, wich have a syntaxe close to mongodb.

Malik answered 11/10, 2013 at 23:22 Comment(1)
I am aware that graph db would be the ideal solution; but for a two graphs with < 100k edges each I don't add a new database.Folio
L
1

I just fixed up a slow delete operation by extending to the transitive reflexive closure table model described here: http://www.dba-oracle.com/t_sql_patterns_incremental_eval.htm. It took a little more work to fully maintain the paths count within it, but it payed off big when deletes went from a 6 second each individual remove operation to negligable (I can now delete every relationship in the graph, and then add them all back in 14 seconds total for 4,000 relationships).

Landa answered 8/8, 2014 at 20:35 Comment(1)
And for bonus, shortest path lengths can be maintained similarly to the total count of paths tjhsst.edu/~rlatimer/acm/DatabaseSystems/…Landa
C
0

Try creating indexes for the relevant where clauses (e.g.: child, parent).

I'm not familiar with Firebird, but have a look how the "firebird describe" works on it and check if it's using a proper use of indexes in order to speed the selects you have within your procedures.

Even creating indexes you lost in performance for upddate/delete/insert, in your case it can improve the result.

Communalism answered 14/10, 2013 at 16:56 Comment(1)
The real implementation has indices; I just didn't copy the CREATE INDEX statement to the code above.Folio

© 2022 - 2024 — McMap. All rights reserved.