Get all Wikipedia Infobox Templates and all Pages using them
Asked Answered
F

3

10

Given a Wikipedia page like Wikipedia: Stack Overflow there are often Infoboxes (mostly on the right hand at the top of the page). Example screenshot:

Stackoverflow Infobox at Wikipedia

  1. DBPedia lists all these attributes as RDF triples. You can see the example at DBPedia: Stack Overflow. There you see the property dbpprop:wikiPageUsesTemplate with the value dbpedia:Template:Infobox_website which is interesting. I want to know which Wikipedia pages use this template. How can i do that and list all pages which use the Infobox_website template? Preferably with a SPARQL query but i am open to other easy solutions.

  2. Next thing is a list of all Infobox Templates. Wikipedia: Category Infobox Templates shows the hierarchy of the desired Wikipedia categories - that looks like what i am seeking. But i want all of these in a machine readable format, on one page. Maybe DBPedia is the right thing here too? At DBPedia: Category Infox Templates and DBPedia: INFOBOX i find very few information. But these are looking very promising. How can i use SPARQL to find all Infobox Types so that i can do step 1 repeatedly for each of them?

You can use this for testing the SPARQL queries: http://dbpedia.org/snorql/

Update 1

I seem to have solved problem number 1: SPARQL: list all pages with Infobox_website

Update 2

Also, this seems to be the query for problem number 2: SPARQL: list all Infoboxes

Finer answered 3/11, 2011 at 18:41 Comment(3)
Your "Update 1" query now returns no results. Do you have a working solution still?Trophoplasm
sorry, i am stuck too. please let me know if you find a solutionFiner
Yeah found a solution, will add as an answerTrophoplasm
T
2

The previous answers seem to have stopped working. Only a small change is required to get them working at the new dbpedia query endpoint at http://live.dbpedia.org/sparql though.

To get a list of all of the pages and the templates that they use this query works:

SELECT * WHERE {  ?page  dbpprop:wikiPageUsesTemplate ?template . }

See results (limited to 100)

If you're looking for a specific template:

SELECT * WHERE {  
   ?page  
   dbpprop:wikiPageUsesTemplate 
   <http://dbpedia.org/resource/Template:Infobox_website> . 
}

See results

And for my use case I'm interested in the Wikipedia URL rather than the DBPedia page, so I'm using this query:

SELECT ?wikipedia_url WHERE {  
   ?page  
   dbpprop:wikiPageUsesTemplate 
   <http://dbpedia.org/resource/Template:Infobox_website> . 
   ?page foaf:isPrimaryTopicOf ?wikipedia_url .
}

See results

I'm also using curl to pull the results into a script:

$ curl -s "http://live.dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+%3Fwikipedia_url+WHERE+%7B+%0D%0A%09+%3Fpage+%0D%0A%09+dbpprop%3AwikiPageUsesTemplate+%0D%0A%09+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FTemplate%3AInfobox_website%3E+.+%0D%0A+%3Fpage+foaf%3AisPrimaryTopicOf+%3Fwikipedia_url+.%0D%0A%0D%0A%09%7D&format=text%2Ftab-separated-values" \
| tr -d \" | grep -v "^wikipedia_url$" | head
http://en.wikipedia.org/wiki/U.S._News_&_World_Report
http://en.wikipedia.org/wiki/FriendFinder
http://en.wikipedia.org/wiki/Debkafile
http://en.wikipedia.org/wiki/GTPlanet
http://en.wikipedia.org/wiki/Lithuanian_Wikipedia
http://en.wikipedia.org/wiki/Connexions
http://en.wikipedia.org/wiki/Hypno5ive
http://en.wikipedia.org/wiki/Scoop_(website)
http://en.wikipedia.org/wiki/Bhoomi_(software)
http://en.wikipedia.org/wiki/Brainwashed_(website)

I'm not sure if this gives the full result set though, because it returns 1698 results whereas wmflabs.org seems to suggest there should be 4439.


For the second part of your question, only a small change is needed from the previous query to get a list of all templates:

SELECT DISTINCT ?template WHERE { 
    ?page  
    dbpprop:wikiPageUsesTemplate  
    ?template . 
    FILTER (regex(?template, "Infobox")) . 
} ORDER BY ?template

See results

Trophoplasm answered 13/8, 2015 at 16:50 Comment(4)
Thanks for the update. If you add SPARQL links to the new endpoint with the solution to both problems in the question, i'll mark this as the accepted answer.Finer
The new endpoint doesn't let you link directly to the query browser. I will add links to the results though.Trophoplasm
Oh, let me also answer the second partTrophoplasm
the dbpprop prefix generates a Undefined namespace prefix error, replacing it by dbp seems to do the trickRailing
F
8

Ok, since i seem to have found a solution (most probably not the best) i want to share them.

1) This SPARQL query can be used to find all pages that include a specific Infobox type:

SELECT * WHERE { ?page dbpedia2:wikiPageUsesTemplate <http://dbpedia.org/resource/Template:Infobox_website> . ?page dbpedia2:name ?name . }

Link at SNORQL


2) This SPARQL query can be used to find all Infobox types:

SELECT DISTINCT ?template WHERE { ?page dbpedia2:wikiPageUsesTemplate ?template . FILTER (regex(?template, "Infobox")) . } ORDER BY ?template

Link at SNORQL

Finer answered 4/11, 2011 at 5:4 Comment(0)
T
2

The previous answers seem to have stopped working. Only a small change is required to get them working at the new dbpedia query endpoint at http://live.dbpedia.org/sparql though.

To get a list of all of the pages and the templates that they use this query works:

SELECT * WHERE {  ?page  dbpprop:wikiPageUsesTemplate ?template . }

See results (limited to 100)

If you're looking for a specific template:

SELECT * WHERE {  
   ?page  
   dbpprop:wikiPageUsesTemplate 
   <http://dbpedia.org/resource/Template:Infobox_website> . 
}

See results

And for my use case I'm interested in the Wikipedia URL rather than the DBPedia page, so I'm using this query:

SELECT ?wikipedia_url WHERE {  
   ?page  
   dbpprop:wikiPageUsesTemplate 
   <http://dbpedia.org/resource/Template:Infobox_website> . 
   ?page foaf:isPrimaryTopicOf ?wikipedia_url .
}

See results

I'm also using curl to pull the results into a script:

$ curl -s "http://live.dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+%3Fwikipedia_url+WHERE+%7B+%0D%0A%09+%3Fpage+%0D%0A%09+dbpprop%3AwikiPageUsesTemplate+%0D%0A%09+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FTemplate%3AInfobox_website%3E+.+%0D%0A+%3Fpage+foaf%3AisPrimaryTopicOf+%3Fwikipedia_url+.%0D%0A%0D%0A%09%7D&format=text%2Ftab-separated-values" \
| tr -d \" | grep -v "^wikipedia_url$" | head
http://en.wikipedia.org/wiki/U.S._News_&_World_Report
http://en.wikipedia.org/wiki/FriendFinder
http://en.wikipedia.org/wiki/Debkafile
http://en.wikipedia.org/wiki/GTPlanet
http://en.wikipedia.org/wiki/Lithuanian_Wikipedia
http://en.wikipedia.org/wiki/Connexions
http://en.wikipedia.org/wiki/Hypno5ive
http://en.wikipedia.org/wiki/Scoop_(website)
http://en.wikipedia.org/wiki/Bhoomi_(software)
http://en.wikipedia.org/wiki/Brainwashed_(website)

I'm not sure if this gives the full result set though, because it returns 1698 results whereas wmflabs.org seems to suggest there should be 4439.


For the second part of your question, only a small change is needed from the previous query to get a list of all templates:

SELECT DISTINCT ?template WHERE { 
    ?page  
    dbpprop:wikiPageUsesTemplate  
    ?template . 
    FILTER (regex(?template, "Infobox")) . 
} ORDER BY ?template

See results

Trophoplasm answered 13/8, 2015 at 16:50 Comment(4)
Thanks for the update. If you add SPARQL links to the new endpoint with the solution to both problems in the question, i'll mark this as the accepted answer.Finer
The new endpoint doesn't let you link directly to the query browser. I will add links to the results though.Trophoplasm
Oh, let me also answer the second partTrophoplasm
the dbpprop prefix generates a Undefined namespace prefix error, replacing it by dbp seems to do the trickRailing
F
1

You can also use the MediaWiki API's embeddedin query to return a list of all pages that include a given template. You'll want to use a library for accessing the API though, which language would you prefer? For Ruby, I'd suggest MediaWiki::Gateway.

Furl answered 4/11, 2011 at 0:3 Comment(1)
these look very limited. how to display all types of infoboxes at once?Finer

© 2022 - 2024 — McMap. All rights reserved.