Wikidata entity value from name
Asked Answered
S

3

22

Is there a way to get Wikidata page information based on the name of the entity for example if I wanted to get page data for Google. I think it has to be done using "entity" with the corresponding entity value however I am not sure of there is any easy way to determine the entity value.

Serg answered 12/12, 2014 at 21:48 Comment(0)
K
19

If you want to do this using the API, you would first use wbsearchentities to find out which entity do you want. For example:

https://www.wikidata.org/w/api.php?action=wbsearchentities&search=Google&language=en

The problem with this is that there are multiple entities called "Google": the company (Google Inc.), the search engine (Google Web Search), the verb (to google) and even a Wikipedia disambiguation page.

After you somehow decide which entity to access, use wbgetentities to actually get the information you want:

https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q95&languages=en

Or, if you can't decide which entity to use, you could get information for all of them at the same time:

https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q95|Q9366|Q961680|Q1156923&languages=en

Keitt answered 13/12, 2014 at 13:30 Comment(1)
Hi @Keitt thanks very much! If you could provide one last bit of assistance - I'm having difficulties parsing the data in order to return say an array or a string of the entity numbers. Could you please advise?Serg
C
11

If you are familiar with Python you could do it programmatically with the Wikidata api, using Pywikibot The following python script, obtains the wikidata entities. If you want the data objects for each individual wikidata entity, you need to uncomment the last two lines

 from pywikibot.data import api
 import pywikibot
 import pprint

 def getItems(site, itemtitle):
     params = { 'action' :'wbsearchentities' , 'format' : 'json' , 'language' : 'en', 'type' : 'item', 'search': itemtitle}
     request = api.Request(site=site,**params)
     return request.submit()

 def getItem(site, wdItem, token):
    request = api.Request(site=site,
                          action='wbgetentities',
                          format='json',
                          ids=wdItem)    
    return request.submit()

def prettyPrint(variable):
    pp = pprint.PrettyPrinter(indent=4)
    pp.pprint(variable)

# Login to wikidata
site = pywikibot.Site("wikidata", "wikidata")
repo = site.data_repository()
token = repo.token(pywikibot.Page(repo, 'Main Page'), 'edit')
wikidataEntries = getItems(site, "Google")
# Print the different Wikidata entries to the screen
prettyPrint(wikidataEntries)

# Print each wikidata entry as an object
#for wdEntry in wikidataEntries["search"]:
#   prettyPrint(getItem(site, wdEntry["id"], token))

which results in

{   u'search': [   {   u'aliases': [u'Google Inc.'],
                       u'description': u'American multinational Internet and technology corporation',
                       u'id': u'Q95',
                       u'label': u'Google',
                       u'url': u'//www.wikidata.org/wiki/Q95'},
                   {   u'aliases': [u'Google Search', u'Google Web Search'],
                       u'description': u'Internet search engine developed by Google, Inc.',
                       u'id': u'Q9366',
                       u'label': u'Google',
                       u'url': u'//www.wikidata.org/wiki/Q9366'},
                   {   u'description': u'Wikipedia disambiguation page',
                       u'id': u'Q961680',
                       u'label': u'Google',
                       u'url': u'//www.wikidata.org/wiki/Q961680'},
                   {   u'aliases': [u'Google'],
                       u'description': u'verb',
                       u'id': u'Q1156923',
                       u'label': u'google',
                       u'url': u'//www.wikidata.org/wiki/Q1156923'},
                   {   u'id': u'Q10846831',
                       u'label': u'google',
                       u'url': u'//www.wikidata.org/wiki/Q10846831'},
                   {   u'aliases': [u'Google Android'],
                       u'description': u'operating system for mobile devices created by Google',
                       u'id': u'Q94',
                       u'label': u'Android',
                       u'url': u'//www.wikidata.org/wiki/Q94'},
                   {   u'description': u'web browser developed by Google',
                       u'id': u'Q777',
                       u'label': u'Google Chrome',
                       u'url': u'//www.wikidata.org/wiki/Q777'}],
    u'searchinfo': {   u'search': u'Google'},
    u'success': 1}
Craving answered 17/2, 2015 at 14:13 Comment(4)
i tried this but I keep getting the error CRITICAL: Waiting for 1 network thread(s) to finish. Press ctrl-c to abort, do you know what that is about? maybe I can somehow specify on what port to run it or soemthing?Bankroll
You leave out token = repo.token(pywikibot.Page(repo, 'Main Page'), 'edit’) and the positional token argument in def getItem(site, wdItem, token) because you’re not editing anything at this point (sorry, I can’t edit the answer yet, my reputation isn’t sufficient).Citric
Pay attention that by default the number of results are 7. If you want to get more results (limited to 50 for a user account), you need to add a limit parameter, for example : params = {'action': 'wbsearchentities', 'format': 'json', 'language': 'en', 'type': 'item', 'search': search_string, 'limit': 50} in getItems()Endblown
MAKE SURE to follow Kaleidophon's advice, delete line 24Starry
O
5

Maybe you can use sparql, to run a query:

SELECT ?item WHERE {
  ?item rdfs:label "Google"@en
}

You can use in python using pywikibot:

 import pywikibot
 from pywikibot import pagegenerators, WikidataBot

 sparql = "SELECT ?item WHERE { ?item rdfs:label 'Google'@en }"
 entities = pagegenerators.WikidataSPARQLPageGenerator(sparql, site=repo)
 entities = list(entities)
Octan answered 13/5, 2016 at 5:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.