How to get only the most recent value from a Wikidata property?
Asked Answered
C

1

6

Suppose I want to get a list of every country (Q6256) and its most recently recorded Human Development Index (P1081) value. The Human Development Index property for the country contains a list of data points taken at different points in time, but I only care about the most recent data. This query will not work because it gets multiple results for each country (one for each Human Development Index data point):

SELECT
?country 
?countryLabel 
?hdi_value
?hdi_date
WHERE {
  ?country wdt:P31 wd:Q6256.
  OPTIONAL { ?country p:P1081 ?hdi_statement. 
         ?hdi_statement ps:P1081 ?hdi_value.
         ?hdi_statement pq:P585 ?hdi_date.
       }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Link to Query Console

I'm aware of GROUP BY/GROUP CONCAT but that will still give me every result when I'd prefer to just have one. GROUP BY/SAMPLE will also not work since SAMPLE is not guaranteed to take the most recent result.

Any help or link to a relevant example query is appreciated!

P.S. Another thing I'm confused about is why population P1082 in this query returns only one population result per country

SELECT
?country 
?countryLabel 
?population
WHERE {
  ?country wdt:P31 wd:Q6256.
  OPTIONAL { ?country wdt:P1082 ?population. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

while the same query but for HDI returns multiple results per country:

SELECT
?country 
?countryLabel 
?hdi
WHERE {
 ?country wdt:P31 wd:Q6256.
  OPTIONAL { ?country wdt:P1081 ?hdi. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

What is different about population and HDI that causes the behavior to be different? When I view the population data for each country on Wikidata I see multiple population points listed, but only one gets returned by the query.

Clariceclarie answered 2/3, 2018 at 9:36 Comment(12)
1. Paste something like FILTER NOT EXISTS {?country p:P1081/pq:P585 ?hdi_date1 . FILTER (?hdi_date1 > ?hdi_date) } outside of OPTIONAL.Etka
2. Duplicate of https://mcmap.net/q/1737356/-some-cities-aren-39-t-instances-of-city-or-big-city-odd-behaviour-of-wikidataEtka
Try this for your first query : SELECT DISTINCT ?country ?countryLabel (MAX(?hdi_value) AS ?hd) (MAX(?hdi_date) AS ?hde) WHERE { ?country wdt:P31 wd:Q6256. OPTIONAL { ?country p:P1081 ?hdi_statement. ?hdi_statement ps:P1081 ?hdi_value. ?hdi_statement pq:P585 ?hdi_date. } SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } } GROUP BY ?country ?countryLabelCaen
@Shubhangi, are you sure that HDI are just increasing for all the countries? I. e. that MAX(hdi_value) corresponds to MAX(hdi_date)?Etka
@StanislavKralin I have checked for few countries and it was fine.Caen
@Shubhangi, check for VenezuelaEtka
@StanislavKralin yes I have checked for Venezuela and it gives me the right answer as he wants the most recent result.Caen
@Shubhangi, the right answer for Venezuela is 0.762, not 0.764.Etka
Thank you both! Stanislav's answer works perfectly here. Unfortunately, Shubhangi's answer does not work when the most recent HDI is lower than a previous HDI as was the case with Venezuela.Clariceclarie
@StanislavKralin oops yeah you are right I didn't see the very last digit of HDI for Venezuela.Caen
@Shubhangi, full list of such countries (their maximal HDI is in the past): Greece, Cyprus, Cuba, Belize, East Timor, Federated States of Micronesia, Venezuela, Saint Lucia, Jamaica, Honduras, Iraq, Oman, Syria, Central African Republic, South Sudan, Equatorial Guinea, The Gambia, Libya, Swaziland, State of PalestineEtka
@StanislavKralin Thank you for pointing out the mistakes :)Caen
E
4

Both your questions are duplicates, but I'll try to add interesting facts to existing answers.

Question 1 is a duplicate of SPARQL query to get only results with the most recent date.

This technique does the trick:

FILTER NOT EXISTS {
    ?country p:P1081/pq:P585 ?hdi_date_ .
    FILTER (?hdi_date_ > ?hdi_date)
}

However, you should add this clause outside of OPTIONAL, it is not working inside of OPTIONAL (and I'm not sure this is not a bug).


Question 2 is a duplicate of Some cities aren't instances of city or big city?

You can't use wdt-predicates, because missing statements are not truthy.
They are normal-rank statements, but there is a preferred-rank statement.

Truthy statements represent statements that have the best non-deprecated rank for given property. Namely, if there is a preferred statement for property P2, then only preferred statements for P2 will be considered truthy. Otherwise, all normal-rank statements are considered truthy.

The reason why P1081 always has preferred statement is that this property is processed by PreferentialBot.

Etka answered 2/3, 2018 at 16:3 Comment(2)
I tried to do exactly what you described but on the cities - but doesn't look correct. SELECT ?city ?cityLabel ?cityPopulation ?hdi_date WHERE { # ?country wdt:P31 wd:Q6256. # OPTIONAL { ?country wdt:P1082 ?population. } ?city wdt:P31/wdt:P279* wd:Q515 . ?city p:P1082 ?populationStatement . ?populationStatement ps:P1082 ?cityPopulation. FILTER NOT EXISTS { ?city p:P1082/pq:P585 ?hdi_date_ . FILTER (?hdi_date_ > ?hdi_date) } SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } } order by ?countryLabelGateway
@user702846, "exactly" would be rather something like this: w.wiki/EKB :)Etka

© 2022 - 2024 — McMap. All rights reserved.