Are graph databases more performant than relational databases for highly connected acyclic graph data?
I need to significantly speed up my query results and hope that graph databases will be the answer. I had seen significant improvement in my relational database queries when I used Common Table Extensions bringing a recursive search of my sample data from 16 hours to 30 minutes. Still, 30 minutes is way too long for a web application and trying to work around that kind of response gets rather ridiculous pretty quickly relying on caching.
My Gremlin query looks something like:
g.withSack(100D).
V(with vertex id).
repeat(out('edge_label').
sack(div).by(constant(2D))).
emit().
group().by('node_property').by(sack().sum()).
unfold().
order().by(values,decr).
fold()
a Cypher equivalent (thank you cyberSam) something like:
MATCH p=(f:Foo)-[:edge_label*]->(g)
WHERE f.id = 123
RETURN g, SUM(100*0.5^(LENGTH(p)-1)) AS weight
ORDER BY weight DESC
and my SQL roughly like:
WITH PctCTE(id, pt, tipe, ct)
AS
(SELECT id, CONVERT(DECIMAL(28,25),100.0) AS pt, kynd, 1
FROM db.reckrd parent
WHERE parent.id = @id
UNION ALL
SELECT child.id, CONVERT(DECIMAL(28,25),parent.pt/2.0), child.kynd, parent.ct+1
FROM db.reckrd AS child
INNER JOIN PctCTE AS parent
ON (parent.tipe = 'M' AND
(child .emm = parent.id))
OR
(NOT parent.tipe = 'M' AND
(child .not_emm = parent.id))
),
mergeCTE(dups, h, p)
AS
(SELECT ROW_NUMBER () OVER (PARTITION BY id ORDER BY ct) 'dups', id, SUM(pt) OVER (PARTITION BY id)
FROM PctCTE
)
which should return a result set with 500,000+ edges in my test instance.
If I filtered to reduce the size of the output, it would still have to be after traversing all of those edges first for me to get to the interesting stuff I want to analyse.
I can foresee some queries on real data getting closer to having to traverse 3,000,000+ edges ...
If graph databases aren't the answer, is a CTE as good as it gets?
neo4j
: can you also express your query in Cypher (or at least explain what it is trying to do?). – Dagan