Retrieving a DBpedia resource by its string name with SPARQL and without knowing its type
Asked Answered
D

1

11

As shown in this question which has a similar title, I would like to retrieve a dbpedia resource by knowing a part of its name. I'm a beginner when it comes to SPARQL and such, but the example in the question helped me a lot, as the author searched for "Romania", and the person answering hooked him up with a Sparql request to do the job. That's nice, but here's the thing.

In the example, they already "knew" that Romania is a country, hence the

    ?c a dbpedia-owl:Country ;

in the WHERE clause. The complete sparql request being

    SELECT ?c
    WHERE {
    ?c a dbpedia-owl:Country ;
    foaf:name "Romania"@en .
    FILTER NOT EXISTS {?c dbpedia-owl:dissolutionYear ?y}
    } 

But, this question doesn't quite completely answer our need, hence searching for ANY resource by its name, the "name" being the actual name of a resource, or a part of it, regardless of its (rdf:)type. The goal would be to search for "anything", just knowing the name or a part of it.

I've been doing some research before asking you guys this question, and I already know that the "part of the name" problem could be resolved with bif function (the bad way, since it's not sparql compliant), or the CONTAINS clause, but I couldn't find any example showing how to use it.

Let's now suppose that there's a "word" to search for among the dbpedia resources, that word would be an input from some user. And let's call it "INPUT".

The request, I would imagine, would look like :

   SELECT ?something WHERE
   {
    ?something a (dbpedia Resource).
    CONTAINS(?something,"INPUT")
   }

My major question is about two major aspects :

  1. Is there anything that describes the type Dbpedia Resource ? I don't think it's in ontology or anything. By knwoing that I would like to search among all the resources to find one matching ...
  2. A specific name I would provide, or some string. I considered the FILTER option, but that would mean getting ALL the resources, and then filtering them by their name after they have been retreived, which would be, I guess, not so optimal.

So, does anyone knows this "Master Query" to get a resource by providing its name, or a part of it ? (An example being providing "Obama", and getting results not only for Barrack, but for Michelle as well).

Thank you in advance.

Dextroglucose answered 26/12, 2011 at 13:53 Comment(0)
E
15

I'm assuming that in your first question you are interested in looking at just instance resources. I don't know if you can explicitly ask just for instance resources in the general case, since in RDF everything is a resource. If you specifically need this for the DBpedia dataset you can query for resources that have dcterms:subject as a property (in DBPedia only instance resources have a dcterms:subject). So you can have a query like this:

SELECT DISTINCT ?s ?label WHERE {
            ?s rdfs:label ?label . 
            FILTER (lang(?label) = 'en'). 
            ?label bif:contains "Obama" . 
            ?s dcterms:subject ?sub 
}

Similarly for your second question - if you are using just the DBpedia dataset you might want to use "bif:contains" although is not SPARQL compliant. I don't think there is another optimal way to do this and as you said using FILTER will be sub-optimal especially if you need to execute queries quickly. I think that keyword search and indexing is handled ad-hoc by each triple store there is not yet a standardized way to to full-text searchers.

So to sum up, if you work with dbpedia only just use the features of the store and the specifics of the dataset to solve your problem.

Exoskeleton answered 26/12, 2011 at 23:43 Comment(4)
It's actually a shame that we have to go through a "trick" (the dcterms:subject), but hey, my requests are only for dbpedia resources. So your answer is absolutely amazing and saved me LOADS of headaches. Thank you so much. As for the bif functions, I know they're not SPARQL compliant, that's why I looked for CONTAINS, though I think I'll still be looking for examples where they use this clause. So thanks again for your answer, my journey with dbpedia & rdf has just begun, but you just gave me the start I needed.Dextroglucose
There are some painful restrictions with this approach. It does not allow spaces. So if you were trying to fetch the canonical entity representation for "Barak Obama" (or programmatically, any name string you came across, which is my case), you cannot use bif:contains. And then I thought, maybe URL encode the string? ?label bif:contains "barak%20obama" . No dice there. Maybe two separate statements to capture constituent parts? Nope. :( Virtuoso 37000 Error SP031: SPARQL compiler: More than one bif:contains() or similar predicate for '$label' variable in a single group Any ideas?Phox
@Phox you can do ?label bif:contains '"barak obama"' (note the extra quotes)Seedy
@Seedy good point, we can also use underscore to concat like 'barak_obama' .Sheaves

© 2022 - 2024 — McMap. All rights reserved.