Getting started with DBpedia
Asked Answered
L

2

8

I want to get started using DBpedia. At the moment all I know is that DBpedia is a structured form of Wikipedia data and it can be queried using SPARQL. To me the basic idea of DBpedia (giving structure to wikipedia data) seems truly amazing, so please go easy if my question is basic.

My goal

Get simple data extracts from DBpedia. For example the countries of the world and their capitals and populations. Or get 100 random famous people, their dates and places of birth and a short description. Eventually I want to query the metadata to find what types of 'entity' are in DBpedia (eg mountains? Rivers? Cities?) and their 'properties'. But that is a separate question and I can experiment once I get the basics working.

What I found so far

In Google I found http://wiki.dbpedia.org/develop/getting-started but I think it's about installing all of DBpedia, and I only want to query it.

Also I found https://mickael.kerjean.me/2016/05/20/walkthrough-dbpedia-and-triplestore/ but it assumes you already have SPARQL or SNORQL set up, and I can't see how to do this.

Aso I found https://docs.data.world/tutorials/sparql/Your_First_Sparql_Query.html which is a guide to SPARQL but again it assumes you are using their own DataWorld environment.

On Stackoverflow I found List countries from DBpedia and List countries from DBpedia but again they assume you have set up the SPARQL environment.

Question(s)

  1. What software do I use to write simple queries on DBpedia data - do I need SPARQL or SNORQL? Do I install them locally or can I use web-based tools? I use Windows 10 and I'm happy with SQL queries.
  2. Once I have the software set up, what is the simplest query to get the list of countries of the world and capitals and populations?
  3. Also how can I write a basic query to return (say) 100 random people and their basic details?
Laurentium answered 14/2, 2018 at 15:36 Comment(5)
That's not a specific question, it sounds like How to query Semantic Web data in general?.Portraitist
"do I need SPARQL or SNORQL" -> SPARQL is the query language for RDF, it's not a software. There are a lots of APIs/tools for querying RDF data via SPARQL. Any search engine is your friendPortraitist
"what is the simplest query to get the list of countries of the world and capitals and populations" -> write the SPARQL query? Figure out the relations that connect countries to their capital and population. You're asking how? Well, look at some sample DBpedia countries maybe?Portraitist
"100 random people and their basic details" -> figure out the class for persons in DBpedia, look at the SPARQL specs for the random function + order by + limit. Figure out the properties that you think of as "basic details" . That's itPortraitist
Thanks for the prompt comments. I have to admit I don't know enough about DBpedia to find the relations, or how to find out a class or to find properties. Also I don't know the easiest way of querying DBpedia, is SPARQL the most widely used and best option? I was hoping for a "hello world" sort of example to get me started, eg use ABC application, type in XYZ etc.Laurentium
C
7

The answer regarding your first question,actually SPARQL is a query language not a software and you can write your queries here https://dbpedia.org/sparql.

In-order to obtain countries ,their capital and respective population :

PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT  min(?country_name) as ?Country_name min(?capital_name) as ?Capital_name min(?population) as ?Population
WHERE {
?country a dbo:Country.
?country rdfs:label ?country_name.
?country dbo:capital ?capital.
?capital  rdfs:label ?capital_name.
?country ?p ?population .
FILTER(?p = dbo:populationTotal || ?p = dbp:populationCensus). 
FILTER NOT EXISTS { ?country dbo:dissolutionYear ?year }
FILTER langMatches( lang(?country_name), "en"  ).
FILTER langMatches( lang(?capital_name), "en"  ).}
GROUP BY ?country_name

For your third question, this is an example solution :

SELECT  distinct ?link ?person_full_name ?birth_year   WHERE { 
     ?link a foaf:Person. 
     ?link ?p ?person_full_name. 
     FILTER(?p IN(dbo:birthName,dbp:birthName,dbp:fullname,dbp:name)). 
     ?link rdfs:label ?person_name .    
     ?person_name bif:contains "abdul" . 
     OPTIONAL { ?link dbo:birthYear ?birth_year .  }
     FILTER(langMatches(lang(?person_full_name), "en"))
     }
     LIMIT 100
Crosstree answered 15/2, 2018 at 14:10 Comment(7)
Many thanks for the detailed reply. I tried the countries query in dbpedia.org/snorql including an "ORDER BY ?country_name" clause and it looks excellent. But I cannot see Australia in the list! Why should it be missing?Laurentium
I have modified the query little bit. you were not able to find Australia in the list because the property that I have used was not there on dbpedia page of Australia related to population.please see the updated query .Crosstree
I ran the modified country query, now most of the countries are doubled up with slightly different populations! Interestingly Australia is not doubled up.Laurentium
Also I just tried the 'persons' query and it gets some results. Out of interest, why did you declare PREFIX for the country query, but no PREFIX for foaf: and bif: in the persons query?Laurentium
Anyway you have answered my 3 questions, your code runs well, and I can understand it and tweak it as my understanding of SPARQL improves. Many thanks for your answers and I will mark this as the answer.Laurentium
At the sparql endpoint you can include or exclude the prefixes .I have added in first query just to provide you exact format regarding how it works.Crosstree
I have fixed the issue of duplicated values .please check once.Crosstree
B
2

Check out About DBpedia and Using DBpedia.

Also

Byzantium answered 14/2, 2018 at 15:57 Comment(3)
Thanks I will take a look, I didn't know SPARQL was built into DBpedia already. Is there any advantage to installing SPARQL on a local machine (or is this even possible)?Laurentium
I think you have much background to digest. SPARQL is a query language for RDF graph data, akin to SQL for relational table data. The public DBpedia endpoints are all provided via Virtuoso (from my employer) as described in the articles I linked. Usage limits on the public instance may mean that you'll benefit from setting up a local or cloud DBpedia mirror instance, which would typically include a SPARQL processor. This site is not appropriate for much more back-and-forth in this vein. See stackoverflow.com/help/on-topic and stackoverflow.com/help/how-to-askByzantium
I do have a lot to digest - my knowledge of DBpedia is minimal as I freely admitted at the start of my question, and reiterated in reply to AKSW's comments. I hoped my question was simple enough that someone could answer it in a few minutes using a few lines of code, a bit like a "hello world" program. AKSW's suggestions seem excellent, but if I knew how to do the things he suggested I would not need to ask my question. Anyway thanks for your pointers, and if you feel this question should be closed or deleted that's fine.Laurentium

© 2022 - 2024 — McMap. All rights reserved.