How to skip bad dates in DBpedia SPARQL request?
Asked Answered
C

3

11

I need to get data about films from DBpedia.

I use SPARQL query as follows on http://dbpedia-live.openlinksw.com/sparql:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?subject ?label ?released WHERE {
  ?subject rdf:type <http://dbpedia.org/ontology/Film>.
  ?subject rdfs:label ?label.
  ?subject <http://dbpedia.org/ontology/releaseDate> ?released.
  FILTER(xsd:date(?released) >= "2000-01-01"^^xsd:date).
} ORDER BY ?released
LIMIT 20

I tried to get films that were released after 01.01.2000. But the engine answers as follows:

Virtuoso 22007 Error DT006: Cannot convert 2009-06-31 to datetime : 
Too many days (31, the month has only 30)

SPARQL query:
define sql:big-data-const 0 
#output-format:text/html
define sql:signal-void-variables 1 define input:default-graph-uri <http://dbpedia.org> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?subject ?label ?released WHERE {
  ?subject rdf:type <http://dbpedia.org/ontology/Film>.
  ?subject rdfs:label ?label.
  ?subject <http://dbpedia.org/ontology/releaseDate> ?released.
  FILTER(xsd:date(?released) >= "2000-01-01"^^xsd:date).
} ORDER BY ?released
LIMIT 20

As far as I can understand there are some errors in data in DBpedia and the engine cannot convert string data to date type in order to compare with the date I set. And the engine breaks the query execution.

So, the question is: is there any way to tell the engine to skip all the erroneous data and return to me all that could be processed?

Confidence answered 28/9, 2011 at 10:57 Comment(1)
looks like it's a bug in dbpediaWholism
H
3

You can use COALESCE function in order to define a default date for invalid ones:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?subject ?label ?released ?released_fixed WHERE {
  ?subject rdf:type <http://dbpedia.org/ontology/Film>.
  ?subject rdfs:label ?label.
  ?subject <http://dbpedia.org/ontology/releaseDate> ?released.
  bind ( coalesce(xsd:datetime(?released), '1000-01-01') as ?released_fixed)
  FILTER(xsd:date(coalesce(xsd:datetime(?released), '1000-01-01')) >= "2000-01-01"^^xsd:date).
} ORDER BY ?released
LIMIT 20

This query provides following SPARQL Results on DbPedia Live Endpoint

The bind construct is only for presenting the fixed dates which are set to '1000-01-01' and stored in the variable ?release_fixed. The bind is not necessary for the query and can be omitted together with ?release_fixed in the SELECT clause

Halfmast answered 3/10, 2013 at 21:19 Comment(2)
Is the bind() necessary here (if so why please)? I tried this both with and without the bind(), and still get the same error OP reported.Kelcey
I have enhanced my answer in order to explain the bind().Halfmast
G
1

One way is to filter using the datatype, as you can see below:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?subject ?label ?released WHERE {
  ?subject rdf:type <http://dbpedia.org/ontology/Film>.
  ?subject rdfs:label ?label.
  ?subject <http://dbpedia.org/ontology/releaseDate> ?released.
  FILTER(datatype(?released) = <http://www.w3.org/2001/XMLSchema#dateTime>)
  FILTER(xsd:date(?released) >= "2000-01-01"^^xsd:date).
} ORDER BY ?released
LIMIT 20

SPARQL results

Gape answered 22/5, 2013 at 19:19 Comment(1)
Trying to run that query results in an error: Virtuoso 22023 Error SR066: Unsupported case in CONVERT (incomplete RDF box -> DATE). This seems similar to the problem in the question.Jueta
T
0

Discarding a result with a date that is off by a day seems silly to me (like Windows doing a bugcheck whenever it feels something is wrong, eg your GPU video adaptor hanging 5 times in a row).

Since you only care about the year, isn't it better to compare string-wise?

str(?released) >= "2000"

XSD says "at least 4 digits for the year" so this works for all positive years (AD). BTW this will also work if the DBpedia extraction framework found only a year in that field.

Twink answered 8/1, 2015 at 10:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.