How to calculate maximum degree of a directed graph using SPARQL?
Asked Answered
H

2

6

I have computed the indegree and outdegree of each node in the directed graph in two separate queries:

SELECT ?s (COUNT(*) AS ?outdegree) 
{ ?s ?p ?o }
GROUP BY ?s
ORDER BY DESC(?outdegree) 

SELECT ?o (COUNT(*) AS ?indegree) 
{ ?s ?p ?o }
GROUP BY ?o
ORDER BY DESC(?indegree)  

I need to compute the maximum degree of the graph. Since max degree of a directed graph is the maximum (indegree+outdegree) value of the graph , I want to know how to combine the results of the above two queries to compute it.

Also, if there is a more efficient way of doing this, please suggest them as well.

Harley answered 17/6, 2014 at 18:5 Comment(0)
C
5

You can use a pretty simply query to get the degree of each vertex ?x:

select ?x (count(*) as ?degree) { 
  { ?x ?p ?o } union
  { ?s ?p ?x }
}
group by ?x

E.g., on this data:

@prefix : <https://mcmap.net/q/1708966/-how-to-calculate-maximum-degree-of-a-directed-graph-using-sparql/1281433/> .

#     a
#     |
#     V
# b<--c-->d
#     |
#     V  
#     e

:a :p :c .
:c :p :b, :d, :e .

you'll get the results:

---------------
| x  | degree |
===============
| :a | 1      |
| :b | 1      |
| :c | 4      |
| :d | 1      |
| :e | 1      |
---------------

Now, if you want the maximum one, you can simply order and use a limit of 1, e.g.,

select ?x (count(*) as ?degree) { 
  { ?x ?p ?o } union
  { ?s ?p ?x }
}
group by ?x
order by desc(?degree)
limit 1
---------------
| x  | degree |
===============
| :c | 4      |
---------------

This will work if there's only one vertex with the highest degree. If there are multiple with the same highest degree, you'll just get one of them.

If you really want to combine the two queries that you've got, then something like Rob Hall's answer will work, except that since the subqueries don't return the nodes that have a 0 indegree or 0 outdegree, they're not present in the final results, since they're not available to join on. So only use that approach if you're guaranteed that every node has non-zero indegrees and outdegrees. His answer is also useful as an example of how to construct the graph and run these queries with Jena programmatically.

Confessional answered 17/6, 2014 at 19:2 Comment(0)
A
3

Using the following test data:

<urn:ex:cent0>  <urn:ex:p>  <urn:ex:cent1> , <urn:ex:o1> , <urn:ex:o0> .
<urn:ex:s1>  <urn:ex:p>  <urn:ex:cent0> .
<urn:ex:cent1>  <urn:ex:p>  <urn:ex:o3> , <urn:ex:o2> .
<urn:ex:s2>  <urn:ex:p>  <urn:ex:cent0> .
<urn:ex:s0>  <urn:ex:p>  <urn:ex:cent0> .

I executed your queries and the following query:

SELECT  ?cent (( ?indegree + ?outdegree ) AS ?degree)
WHERE
  { { SELECT  (?s AS ?cent) (count(*) AS ?outdegree)
      WHERE
        { ?s ?p ?o }
      GROUP BY ?s
      ORDER BY DESC(?outdegree)
    }
    { SELECT  (?o AS ?cent) (count(*) AS ?indegree)
      WHERE
        { ?s ?p ?o }
      GROUP BY ?o
      ORDER BY DESC(?indegree)
    }
  }

Which resulted in the following output:

-----------------------------
| o              | indegree |
=============================
| <urn:ex:cent0> | 3        |
| <urn:ex:cent1> | 1        |
| <urn:ex:o0>    | 1        |
| <urn:ex:o1>    | 1        |
| <urn:ex:o2>    | 1        |
| <urn:ex:o3>    | 1        |
-----------------------------
------------------------------
| s              | outdegree |
==============================
| <urn:ex:cent0> | 3         |
| <urn:ex:cent1> | 2         |
| <urn:ex:s0>    | 1         |
| <urn:ex:s1>    | 1         |
| <urn:ex:s2>    | 1         |
------------------------------
---------------------------
| cent           | degree |
===========================
| <urn:ex:cent0> | 6      |
| <urn:ex:cent1> | 3      |
---------------------------

This satisfies the goal of identifying the node with the maximum total degree. The following is the code that I used to construct this model and execute this test (in the event that you wish to reproduce it):

final Resource c0 = ResourceFactory.createResource("urn:ex:cent0");
final Resource c1 = ResourceFactory.createResource("urn:ex:cent1");
final Property p = ResourceFactory.createProperty("urn:ex:p");

final Model model = new ModelCom(Factory.createDefaultGraph()){{
    this.add(this.createResource("urn:ex:s0"), p, c0);
    this.add(this.createResource("urn:ex:s1"), p, c0);
    this.add(this.createResource("urn:ex:s2"), p, c0);

    this.add(c0, p, this.createResource("urn:ex:o0"));
    this.add(c0, p, this.createResource("urn:ex:o1"));
    this.add(c0, p, c1);

    this.add(c1, p, this.createResource("urn:ex:o2"));
    this.add(c1, p, this.createResource("urn:ex:o3"));
}};

final Query outdeg = QueryFactory.create(
    "SELECT ?s (COUNT(*) AS ?outdegree)\n"+ 
    "{ ?s ?p ?o }\n"+
    "GROUP BY ?s\n"+
    "ORDER BY DESC(?outdegree)"
);

final Query indeg = QueryFactory.create(
    "SELECT ?o (COUNT(*) AS ?indegree)\n"+ 
    "{ ?s ?p ?o }\n"+
    "GROUP BY ?o\n"+
    "ORDER BY DESC(?indegree)"
);

final Query alldeg = QueryFactory.create(
    "SELECT ?cent ((?indegree+?outdegree) AS ?degree) WHERE {\n"+
    "  {SELECT (?s AS ?cent) (COUNT(*) AS ?outdegree)\n"+ 
    "    { ?s ?p ?o }\n"+
    "    GROUP BY ?s\n"+
    "    ORDER BY DESC(?outdegree)\n"+
    "  }\n"+
    "  {SELECT (?o AS ?cent) (COUNT(*) AS ?indegree)\n"+ 
    "    { ?s ?p ?o }\n"+
    "    GROUP BY ?o\n"+
    "    ORDER BY DESC(?indegree)\n"+
    "  }\n"+
    "}"
);

@Test
public void test()
{
    model.write(System.out, "TTL");
    System.out.println();

    System.out.println(alldeg);

    final QueryExecution exec0 = QueryExecutionFactory.create(indeg, model);
    ResultSetFormatter.out(exec0.execSelect(), indeg);
    exec0.close();

    final QueryExecution exec1 = QueryExecutionFactory.create(outdeg, model);
    ResultSetFormatter.out(exec1.execSelect(), outdeg);
    exec1.close();

    final QueryExecution exec2 = QueryExecutionFactory.create(alldeg, model);
    ResultSetFormatter.out(exec2.execSelect(), alldeg);
    exec2.close();
}
Accelerator answered 17/6, 2014 at 18:52 Comment(2)
This will miss any sink or source nodes, though, since an entry with 0 indegree or 0 outdegree won't be found in the corresponding subquery, and so it won't be represented in the join of the two subqueries.Confessional
Excellent catch. This will result in an incorrect answer whenever a source or sink node has the highest in and/or out degree. You've posted a much more elegant solution as well, yet I can't bring myself to delete an answer that I put time into, haha.Accelerator

© 2022 - 2024 — McMap. All rights reserved.