Add text search where clause to SPARQL query
Asked Answered
N

1

5

I have been given what I thought would be a simple task - take an existing SPARQL query and adapt the WHERE clause to restrict results to entities where a specific text field contains a specific search term.

However, I am entirely new to the SPARQL language and nothing I've tried is working. It seems I need to use the text:query (rdfs:label 'word' 10) syntax, but I haven't managed to successfully integrate this into the query below.

What I need is to further filter the results of the below query where the rdfs:label triple has a value containing the search term. If any of you could provide guidance about how I need to change the query I'd be very grateful.

SELECT DISTINCT * WHERE 
{
  { SELECT  ?object ?label ?accessionNumber ?image  WHERE {
      ?object a my:Object .
      ?object my:accessionNumber ?accessionNumber .
      ?object  rdfs:label ?label .
      ?object my:maker <http://id.my.org.uk/agent/1234> .  
  }}  

  OPTIONAL  { 
    ?object my:preferredAsset ?asset .
    ?asset a my:Asset .
    ?asset dcterms:hasVersion ?image .
    ?image my:role 'thumbnail' .  
  }  
} 

Thanks in advance.

Newsworthy answered 6/8, 2014 at 15:33 Comment(0)
W
9

Approximate Matching

String Matching

Joshua Taylor's comment points out an excellent and elegant solution to do exactly what you asked for:

filter contains( lcase(?label), "word").

You can also use regular expressions via the REGEX Filter Function. You would simply add an additional filter to your query such as:

FILTER regex(?label, "*word*", "i") .

This would allow you to retrieve all labels that contain word (case-insensitive).

Jena Text

The syntax text:query (rdfs:label 'word' 10) you mentioned is part of the jena-text project. Note that you must configure jena-text for it to work. The primary time that you want to use that is if you want to perform approximate text matching ie: if it's acceptable to search for word and get back things like words or wordpress etc.

Exact Matching

Another alternative is exact matching. You can do this by specifying an initial binding, or by modifying your query directly.

Query Modification

Modifying your query would produce one of several variations. Not all of these variations are considered equal (Plain Literals / Language Literals / Typed Literals), so you need to be careful when searching to know that your data will match.

 ?object  rdfs:label "word" .
 ?object  rdfs:label '''word''' .
 ?object  rdfs:label "word"@en .
 ?object  rdfs:label "word"^^xsd:string .

Binding Specification

Constructing an initial binding usually looks something like this (psuedocode):

final QuerySolutionMap initialBinding = new QuerySolutionMap(){{
     this.add("?label", model.createTypedLiteral(someString));
}};
final QueryExecution e = 
         QueryExecutionFactory.create(query,model,initialBinding);

Note that the second argument to add has the same choices as the query modification. You can create a language literal or a plain literal rather than a typed literal. Again, it needs to match your underlying data.

Wooton answered 6/8, 2014 at 16:20 Comment(2)
for doing case insensitive matching, you don't need all the power of regex, though. You can just do filter contains( lcase(?label), "word"). That may be a bit cheaper, since the matching is simpler.Bagehot
Fantastic, comprehensive answer. Thank you (and @joshua-taylor) - right now a case-insensitive text match is sufficient, but I wasn't aware of the jena-text project and this will definitely be something we'll look at going forward. Grateful to you both.Newsworthy

© 2022 - 2024 — McMap. All rights reserved.