SPARQL: combine multiple predicate values into array of labels
Asked Answered
V

1

6

Right now I have a query that returns a list of various properties for a specific entity:

SELECT ?propLabel ?val WHERE {

  BIND(wd:Q122426 as ?entity)
  {
    BIND(?entity AS ?valUrl)
    BIND("n/a" AS ?propUrl)
    BIND("name"@en AS ?propLabel)
    ?entity rdfs:label ?val.
    FILTER((LANG(?val)) = "en")

# instance of
  } UNION {
    ?entity wdt:P31 ?valUrl .
    BIND("instance of"@en AS ?propLabel)

    # filter  isIRI(?valUrl) 
    ?valUrl rdfs:label ?valLabel 
    FILTER (LANG(?valLabel) = "en") 
    BIND(CONCAT(?valLabel) AS ?val)

# occupation
  } UNION {
    ?entity wdt:P106 ?val.
    BIND("occupation"@en AS ?propLabel)

# position held
  } UNION {
    ?entity wdt:P39 ?val.
    BIND("position"@en AS ?propLabel)

# ... and more ...

  }
}

This works great, but it returns the entity code (Qxxxxx) instead of the text label. So I can change it like this:

# occupation
  } UNION {
    ?entity wdt:P106 ?valUrl.
    BIND("occupation"@en AS ?propLabel)

    ?valUrl rdfs:label ?valLabel 
    FILTER (LANG(?valLabel) = "en") 
    BIND(CONCAT(?valLabel) AS ?val)

and that works fine.

But my question is

How can I 'collapse' the values for multiple predicates into a string? i.e. for the predicates P106, P119, Px, Py, etc. get:

| ?property   | ?valueLabel |
|-------------+-------------|
| tags  | politician, Giza East Field, something else, something else |
| name        | Henutsen    |

Is there a more efficient way to structure the query rather than UNIONing every predicate? Like, supply just a list of predicates P31, P106, P39 etc

Vi answered 17/5, 2019 at 19:4 Comment(10)
I don't understand the purpose of UNION in your query. You should use an OPTIONAL for each property which might be missing. And for the "array" of values, you have to use group_concat as aggregate function and group by the entity. Note, the result isn't a proper array but a string consisting of all joined values with the given separator as join string. There is no such array datastructure in SPARQLEmbolism
By the way, you can also get the property labels in the query, there is no need to bind them manually.Embolism
Thank you—can you write the proposed improved query as an answer? I'm still learning SPARQL.Vi
I thought the UNION was so that I get a table where the rows are each property, but I honestly don't remember where I originally got that :)Vi
The “array” that you describe is a JSON data structure. But the result of a SPARQL query is not a JSON data structure but a table of rows and columns, with each SELECT variable becoming a column, and each row containing one value (or no value—unbound) for each column. Can you update the question with a description of the table structure that you want to get?Hamford
@Vi why do you want to get a row per property? Why not a column per property?Embolism
here is (a sketch of) a more compact query of yours: SELECT distinct ?propLabel (COALESCE(str(?valLabel), ?val) as ?value) WHERE { VALUES (?prop ?propLabel) {(rdfs:label "name"@en) (wdt:P106 "occupation"@en) (wdt:P39 "position"@en)} wd:Q122426 ?prop ?val # if it's a literal and it has a language tag, take the English one FILTER(!isLiteral(?val) || lang(?val) = "" || langmatches(lang(?val), "en")) # get labels for IRIs OPTIONAL {?val rdfs:label ?valLabel FILTER(langmatches(lang(?valLabel), "en"))} }Embolism
with Wikidata magic label service: SELECT distinct ?property ?valLabel WHERE { VALUES (?prop ?property) {(rdfs:label "name"@en) (wdt:P106 "occupation"@en) (wdt:P39 "position"@en)} wd:Q122426 ?prop ?val FILTER(!isLiteral(?val) || lang(?val) = "" || langmatches(lang(?val), "en")) SERVICE wikibase:label {bd:serviceParam wikibase:language "en" .} }Embolism
@Hamford I know, I meant 'array' as in a general conceptual sence, so as AKSW said, a comma-separated string is fine.Vi
@AKSW can you post that as an answer and not as a comment? :)Vi
H
3

Here is a version that replaces the UNION with a compact VALUES clause, and use of the label service (courtesy of AKSW in the comments):

SELECT ?entity ?property ?valueLabel {
  VALUES ?entity { wd:Q122426 }
  VALUES (?p ?property) {
    (wdt:P31    "instance of"@en)
    (rdfs:label "name"@en)
    (wdt:P106   "occupation"@en)
    (wdt:P39    "position"@en)
  }
  ?entity ?p ?value
  FILTER (!isLiteral(?value) || lang(?value) = "" || langmatches(lang(?value), "en"))
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}

Result:

| ?property   | ?valueLabel |
|-------------+-------------|
| instance of | human       |
| occupation  | politician  |
| name        | Henutsen    |

Collapsing the multiple rows into a single row, with comma-separated lists as values, is in theory also possible and would look something like this:

SELECT ?entity
  (group_concat(?property; separator=", ") AS ?properties)
  (group_concat(?valueLabel; separator=", ") AS ?values)
WHERE {
  ...
}
GROUP BY ?entity

with the original query inserted at .... The idea is to use GROUP BY to group the results by entity (not strictly necessary here as we only have a single entity anyway, but if you wanted to add more entities to the query), and then group_concat to combine all values per entity into one value.

Unfortunately, this doesn't work for some reason; ?properties looks fine but ?values is empty. It might have something to do with the “magic” label service, perhaps it is not compatible with aggregates.

Update: Here is a version that returns name and “tags” as separate rows, as in the updated question:

SELECT ?property ?valueLabel {
  {
    SELECT
      ("tags" AS ?property)
      (group_concat(?valueLabel; separator=", ") AS ?valueLabel)
    WHERE {
      VALUES ?p { wdt:P31 wdt:P106 wdt:P119 wdt:P39 }
      wd:Q122426 ?p ?value
      SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
    }
  } UNION {
    BIND ("name"@en AS ?property)
    wd:Q122426 rdfs:label ?valueLabel
    FILTER langMatches(lang(?valueLabel), "en")
  }
}

Results:

| ?property | ?valueLabel                        |
|-----------+------------------------------------|
| tags      | human, politician, Giza East Field |
| name      | Henutsen                           |

One thing is a little bit weird about this query. Note that ?valueLabel is assigned a value by the label service, but then re-assigned a different value in the SELECT clause of the inner sub-query. This kind of re-assignment is usually not allowed in SPARQL, but here it is actually required to make the query work. This seems to be a quirk with the label service.

Hamford answered 21/5, 2019 at 16:1 Comment(4)
Amazing, thank you! I edited my question to state the grouping request more clearly. I see how group_concat works, just not clear on some of the details.Vi
I'm trying this out in the WD query service but getting an empty field for tags— w.wiki/4ScVi
Well, it worked when I wrote the answer eight days ago. It doesn't work now. As I've said in both parts of the response, something is weird in the way the label service interacts with aggregates. One could try rewriting the query to not use the label service but I'll leave that job to somebody else.Hamford
Thank you so much for all your help!Vi

© 2022 - 2024 — McMap. All rights reserved.