I cannot describe my problem formally due to my bad English; let me tell it using an example. The table below is actually grouped by 'subject','predicate'.
We define a set on rows, if they the same 'subject'. Now I want to combine any two sets if they contain the same 'predicate's, sum the 'count' of the same 'predicate', and count the number of distinct subjects which have a same set.
subject predicate count
-----------------------------
s1 p1 1
s1 p2 2
s2 p1 3
s3 p1 2
s3 p2 2
Therefore, what wanted from this table is two sets:
{2, (p1, 3), (p2, 4)},
{1, (p1,3)}
where in the first set, 2 indicates there are two subjects (s1 and s3) having this set; (p1,3) is the sum from (s1, p1, 1) and (s3, p1, 2).
So how can I retrieve these sets and store them in Java?
How can I do it using SPARQL?
Or, firstly store these triples in Java, then how can I get these sets using Java?
One solution might be concat predicates and counts,
SELECT (COUNT(?s) AS ?distinct)
?propset
(group_concat(?count; separator = \"\\t\") AS ?counts)
{
SELECT ?s
(group_concat(?p; separator = \" \") AS ?propset)
(group_concat(?c; separator = \" \") AS ?count
{
?s ?p ?c
} GROUP BY ?s ORDER BY ?s
} GROUP BY ?propset ORDER BY ?propset
Then the counts could be decoupled, then sum up. It works fine on small dataset, but very time consuming.
I think I will give up this weird problem. Thank you very much for answering.