get list of desambiguated homonyms from wikipedia / wikidata / linked data
Asked Answered
S

1

6

If I search for "George Bush" manually on wikipedia I'll get this page which lists homonyms with short descriptions.

I would like to feed my search to an api and get the following info :

  • George H. W. Bush
  • George W. Bush
  • George Bush (biblical scholar)
  • George Bush (footballer)
  • George Bush (racing driver)
  • George P. Bush
  • George Washington Bush

I don't mind getting more as long as I can unambiguously parse it.

My goal is to have a website's users able to tag a public person, but I want to restrict their choices and avoid ambiguities, so this list could be slightly different, any other decent database with an api would do.

I haven't figured out how to do it with wikipedia nor wikidata, I just managed to do queries on a specific id/page once I know it, which isn't the case here.

Soleure answered 23/9, 2018 at 9:7 Comment(3)
See opendata.stackexchange.com/a/12497/16193Curb
Thank you, I don't know SPARQL so it's Chinese to me, I'll read about it and I hope it gets me closer to my solution but so far I don't see how this is related to what I want nor how I'd make it run.Soleure
Basically, you need something like this. However, not all results are people. Then the question I've linked to appears.Curb
S
2

There are a couple of ways to do this, depending on what sort of data you want.

For example - https://en.wikipedia.org/w/api.php?action=query&titles=George%20Bush&prop=links - will tell you if there is a "disambiguation" for that person's name.

That will return:

               {
                    "ns": 0,
                    "title": "Bush family"
                },
                {
                    "ns": 0,
                    "title": "George Brush (disambiguation)"
                },
                {
                    "ns": 0,
                    "title": "George Bush (biblical scholar)"
                },
                {
                    "ns": 0,
                    "title": "George Bush (footballer)"
                },
                {
                    "ns": 0,
                    "title": "George Bush (racing driver)"
                },
                {
                    "ns": 0,
                    "title": "George H. W. Bush"
                },
                {
                    "ns": 0,
                    "title": "George P. Bush"
                },
                {
                    "ns": 0,
                    "title": "George W. Bush"
                },
                {
                    "ns": 0,
                    "title": "George Washington Bush"

You can get more data at once using - https://en.wikipedia.org/w/api.php?action=query&utf8=&list=search&srsearch=George%20Bush

That will get you:

    "search": [
        {
            "ns": 0,
            "title": "George W. Bush",
            "pageid": 3414021,
            "size": 299185,
            "wordcount": 27007,
            "snippet": "<span class=\"searchmatch\">George</span> Walker <span class=\"searchmatch\">Bush</span> (born July 6, 1946) is an American politician who served as the 43rd President of the United States from 2001 to 2009. He had previously",
            "timestamp": "2018-09-26T21:48:08Z"
        },
        {
            "ns": 0,
            "title": "George H. W. Bush",
            "pageid": 11955,
            "size": 210189,
            "wordcount": 20867,
            "snippet": "<span class=\"searchmatch\">George</span> Herbert Walker <span class=\"searchmatch\">Bush</span> (born June 12, 1924) is an American politician who served as the 41st President of the United States from 1989 to 1993. Prior",
            "timestamp": "2018-10-01T06:41:50Z"
        },
Species answered 1/10, 2018 at 13:11 Comment(2)
Thanks, I think I could make it work though it's not ideal, I tested on John Lennon, John smith, Alexandre Dumas, and it seems I'd have to combine both calls, and isolate those that start with the first name and contain the family name, but then I'd still be stuck with items like "John Lennon Discography", and I'd be nowhere if starting from a surname only like "Hitchens".Soleure
BTW, the Wikibase extension allows to use some special keywords on Wikidata, e. g. haswbstatement:P31=Q5Curb

© 2022 - 2024 — McMap. All rights reserved.