I have a list with movie titles and want to look these up in DBpedia for meta information like "director". But I have trouble to identify the correct movie with SPARQL, because the titles sometimes don't exactly match.
How can I get the best match for a movie title from DBpedia using SPARQL?
Some problematic examples:
- My List: "Die Hard: with a Vengeance" vs. DBpedia: "Die Hard with a Vengeance"
- My List: "Hachi" vs. DBpedia: "Hachi: A Dog's Tale"
My current approach is to query the DBpedia endpoint for all movies and then filter by checking for single tokens (without punctuations), order by title and return the first result. E.g.:
SELECT ?resource ?title ?director WHERE {
?resource foaf:name ?title .
?resource rdf:type schema:Movie .
?resource dbo:director ?director .
FILTER (
contains(lcase(str(?title)), "die") &&
contains(lcase(str(?title)),"hard")
)
}
ORDER BY (?title)
LIMIT 1
This approach is very slow and also sometimes fails, e.g.:
SELECT ?resource ?title ?director WHERE {
?resource foaf:name ?title .
?resource rdf:type schema:Movie .
?resource dbo:director ?director .
FILTER (
contains(lcase(str(?title)), "hachi")
)
}
ORDER BY (?title)
LIMIT 10
where the correct result is on second place:
resource title director
http://dbpedia.org/resource/Chachi_420 "Chachi 420"@en http://dbpedia.org/resource/Kamal_Haasan
http://dbpedia.org/resource/Hachi:_A_Dog's_Tale "Hachi: A Dog's Tale"@en http://dbpedia.org/resource/Lasse_Hallström
http://dbpedia.org/resource/Hachiko_Monogatari "Hachikō Monogatari"@en http://dbpedia.org/resource/Seijirō_Kōyama
http://dbpedia.org/resource/Thachiledathu_Chundan "Thachiledathu Chundan"@en http://dbpedia.org/resource/Shajoon_Kariyal
Any ideas how to solve this problem? Or even better: How to query for best matches to a string with SPARQL in general?
Thanks!
bif:contains
is much faster on indexed literals than regular REGEX. An example from the docs is?s foaf:Name ?name . ?name bif:contains "'rich*'".
which would match all subjects whosefoaf:Name
contain the word Rich. This would match Richard, Richie etc. – Lymn