SPARQL query to get only results with the most recent date
Asked Answered
T

2

6

I am learning basics of SPARQL with recent RDF-database released by the Finnish Ministry of Justice. It contains Finnish law data.

There are statutes, which have versions, which have a date and topics. I want to get the most recent versions that have a "gun" topic. So, I wrote this:

PREFIX sfl: <http://data.finlex.fi/schema/sfl/>
PREFIX eli: <http://data.europa.eu/eli/ontology#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?stat ?vers ?dv 
WHERE { 
   ?stat rdf:type sfl:Statute .
   ?stat sfl:hasVersion ?vers .
   ?vers eli:version_date ?dv .
   ?vers eli:is_about ?top .
   ?top skos:prefLabel "Ase"@fi .

 } ORDER BY DESC(?dv)

This returns four lines, with three statutes, one statute twice. This is because that statute has two versions, older and current. The two other statutes have only one version.

How do I get rid of the older version so I get only statutes with the most recent version? I tried using something like (MAX(?dv) AS ?ndv) and grouping by ?stat and ?vers, but this doesn't work, as there are four distinct versions.

EDIT: Let me add a mock example of what happens.

The result of the original query looks like this:

stat | vers | dv
 a   | abc  |  x
 a   | cde  |  y(<x)
 b   | foo  |  z
 c   | fot  |  u

We see that statute "a" has two versions, "abc" and "cde" and the dv of version "abc" is later that dv of version "cde". The other two statutes "b" and "c" have only one version each, with dvs of "z" and "u".

The property of having topic "gun" is a property of vers. All the versions returned have that topic.

What I want to get is this:

stat | vers | dv
 a   | abc  |  x
 b   | foo  |  z
 c   | fot  |  u

In other words, I wish to get, for each statute, only the version with the highest or latest dv value.

PS. You are welcome to test this at http://yasgui.org/ Just type the query and you get the result.

Tiepolo answered 23/3, 2016 at 15:2 Comment(4)
When I go to the endpoint you mentioned, and run the query, I don't get any results. What endpoint does one need to select?Crouch
Answered my own question, seems to be ldf.fi/finlex/sparql .Crouch
Should be data.finlex.fi/sparqlTiepolo
Not sure, then. I looked for ldf.fi/sparql-services.html and found ldf.fi/finlex/sparql, and then used it on the yasgui site you linked to. The data.finlex URI doesn't seem to be prepopulated in the yasgui.Crouch
C
5

You can do this using a subselect, as scotthenninger's answer, but you could also just use a filter to make sure that each result doesn't have another possible result that would be more recent. In you query, that would just mean adding:

filter not exists {
  ?stat sfl:hasVersion/eli:version_date ?dv2
  filter (?dv2 > ?dv)
}

The idea is just to keep only those result rows which have a version such that there is not another version of the same statute with a more recent date. This approach is a bit more flexible in that it doesn't require a "single max-value" that you can retrieve via a subselect; it will let you keep results based on arbitrary criteria, as long as you can express them in SPARQL.

I used a property path in ?stat sfl:hasVersion/eli:version_date ?dv2 instead of the longer ?stat sfl:hasVersion ?vers2 . ?vers2 eli:version_date ?dv2 because it's a bit shorter and we don't really care about the value of ?vers2 here. Here's what the query as a whole now looks like:

PREFIX sfl: <http://data.finlex.fi/schema/sfl/>
PREFIX eli: <http://data.europa.eu/eli/ontology#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?stat ?vers ?dv 
WHERE { 
   ?stat rdf:type sfl:Statute .
   ?stat sfl:hasVersion ?vers .
   ?vers eli:version_date ?dv .
   ?vers eli:is_about ?top .
   ?top skos:prefLabel "Ase"@fi .
   filter not exists {
      ?stat sfl:hasVersion/eli:version_date ?dv2
      filter (?dv2 > ?dv)
   }
 } ORDER BY DESC(?dv)

Query and Results

Crouch answered 24/3, 2016 at 18:12 Comment(2)
@Tiepolo In the cases that there is a "single max-value", you should probably spend a bit of time benchmarking. The "filter not exists" approach is flexible, but it does mean that for each result in the initial set, you're doing a scan over some other set of solutions. That could be expensive, depending on what you're checking. In this case, it's probably not too bad, since you're just looking at the other versions of the same statute.Crouch
I see. But this has to be better than the one I came up originally. It has stuff like this FILTER(NOT EXISTS {?stat rdf:type sfl:Statute . ?stat sfl:hasVersion ?vers1 . ?stat sfl:hasVersion ?vers2 . ?vers1 eli:version_date ?dv1 . ?vers2 eli:version_date ?dv2 . FILTER(?vers1!=?vers2)} || EXISTS {?stat rdf:type sfl:Statute . ?stat sfl:hasVersion ?vers . ?vers eli:version_date ?dv . FILTER(?dv!=?mdv)})Tiepolo
P
3

You'd have to run a sub-select to figure out the max date aggregate:

PREFIX sfl: <http://data.finlex.fi/schema/sfl/>
PREFIX eli: <http://data.europa.eu/eli/ontology#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?stat ?vers ?dv ?max 
WHERE { 
   ?stat rdf:type sfl:Statute .
   ?stat sfl:hasVersion ?vers .
   {  SELECT  (max(?dv) AS ?max) 
      WHERE {
          ?stat rdf:type sfl:Statute .
          ?stat sfl:hasVersion ?vers .
          ?vers eli:version_date ?dv .
      }
   }
   ?vers eli:version_date ?max .
   ?vers eli:is_about ?top .
   ?top skos:prefLabel "Ase"@fi .

}

Pi answered 23/3, 2016 at 16:33 Comment(6)
I tried that, it didn't return anything. I gather this would work if the ?dv would be the same for all most recent versions. It isn't. All four versions have different ?dv values. The subselect, as far as I see, returns the highest ?dv out of all the versions of all the statutes. But the ?dv for any version with gun topic is lower than that. Adding the topic restriction within the subselect does return one value, the most recent version of the one statute that happens to be the newest.Tiepolo
Yes, it is the case that the match with "Ase"@fi is needed in the sub-select. Without a minimal data example, it's pretty hard to guess. I think a variant of this query, using sub-select to get the max and matching that to the one that has the max value, will get what you're looking for.Pi
I added a mock example and a link to a testing site.Tiepolo
You could use a subselect here. You could also use a filter not exists to make sure that there's no version of the same thing with a later date.Crouch
@JoshuaTaylor I managed to do this with a subselect and some rather nasty looking filters. But that only works for this case. What if the case is more general? I tried it with a different topic and that has a lot more complicated results with many different statutes having multiple versions, each with different dv values. There is no single max-value that will do it. If this was Java I would use ifs with for loops to check for extra versions.Tiepolo
@Tiepolo I've added an answer that shows the filter not exists part. I think it's relatively simple, and the approach definitely doesn't depend on having a "single max-value".Crouch

© 2022 - 2024 — McMap. All rights reserved.