Retrieve another language of a Wikipedia page
Asked Answered
D

2

10

Task: We have Wikipedia English page and need to retrieve the same page address in Russian.

I know the Semantic Web solution - use simple query to DbPedia, but I am curious whether there are traditional solutions. I have asked the same question in semanticoverflow.com where Toby Inkster suggested to parse http://en.wikipedia.org/wiki/Colugo?action=raw results (there are other languages links in the bottom), but this way is too inefficient. Are there any other ways or DbPedia is the one real option?

Dougherty answered 10/11, 2010 at 10:56 Comment(0)
S
11

Wikipedia has an extensive API, which can provide language links information among others. In this particular case, you're looking for api.php?action=query&prop=langlinks&titles=.... See here for example.

Shufu answered 30/6, 2012 at 17:55 Comment(0)
R
1

Sometimes, when finding the Japanese (ja) title equivalence for page, https://en.wikipedia.org/wiki/Aframomum_corrorima

import json
import requests

site = "enwiki"  # For English queries, set `&sites=enwiki`
page = "Aframomum_corrorima"
trg_lang = "ja"

url = f"https://www.wikidata.org/w/api.php?action=wbgetentities&sites={site}&titles={page}&languages={trg_lang}&format=json"

result = json.loads(requests.get(url).content.decode('utf8'))

translations = [result['entities'][k]['labels'] for k in result['entities']]
print(translations)

[out]:

[{'ja': {'language': 'ja', 'value': 'コロリマ'}}]

Then you'll find that the https://ja.wikipedia.org/w/index.php?title=コロリマ isn't written yet but the wikidata API is able to find the right entity translation.

To extract all the possible links, do something like:

url = f"https://www.wikidata.org/w/api.php?action=wbgetentities&sites={site}&titles={page}&prop=langlinks&format=json"

result = json.loads(requests.get(url).content.decode('utf8'))

links = [result['entities'][e]['sitelinks'] for e in result['entities'].keys()]

print(json.dumps(links))

[out]:

[
    {
        "amwiki": {
            "site": "amwiki",
            "title": "\\u12ae\\u1228\\u122a\\u121b",
            "badges": []
        },
        "cebwiki": {
            "site": "cebwiki",
            "title": "Aframomum corrorima",
            "badges": []
        },
        "commonswiki": {
            "site": "commonswiki",
            "title": "Category:Aframomum corrorima",
            "badges": []
        },
        "elwiki": {
            "site": "elwiki",
            "title": "Aframomum corrorima",
            "badges": []
        },
        "enwiki": {
            "site": "enwiki",
            "title": "Aframomum corrorima",
            "badges": []
        },
        "eswiki": {
            "site": "eswiki",
            "title": "Aframomum corrorima",
            "badges": []
        },
        "frwiki": {
            "site": "frwiki",
            "title": "Aframomum corrorima",
            "badges": []
        },
        "kowiki": {
            "site": "kowiki",
            "title": "\\ucf54\\ub7ec\\ub9ac\\ub9c8",
            "badges": []
        },
        "lawiki": {
            "site": "lawiki",
            "title": "Aframomum corrorima",
            "badges": []
        },
        "specieswiki": {
            "site": "specieswiki",
            "title": "Aframomum corrorima",
            "badges": []
        },
        "svwiki": {
            "site": "svwiki",
            "title": "Korarima",
            "badges": []
        },
        "ukwiki": {
            "site": "ukwiki",
            "title": "Aframomum corrorima",
            "badges": []
        },
        "viwiki": {
            "site": "viwiki",
            "title": "Aframomum corrorima",
            "badges": []
        },
        "warwiki": {
            "site": "warwiki",
            "title": "Aframomum corrorima",
            "badges": []
        }
    }
]
Runagate answered 2/1 at 21:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.