How to obtain data in a table from Wikipedia API?
Asked Answered
C

1

13

I'm trying to get all the content from Wikipedia:Unusual_articles and I'm able to get the list of table content by calling this endpoint:

https://en.wikipedia.org/w/api.php?action=parse&format=json&prop=sections&page=Wikipedia:Unusual_articles

and the data I got back look something like this:

{
    title: "Wikipedia:Unusual articles",
    pageid: 154126,
    sections: [
        {
            toclevel: 1,
            level: "2",
            line: "Places and infrastructure",
            number: "1",
            index: "T-1",
            fromtitle: "Wikipedia:Unusual_articles/Places_and_infrastructure",
            byteoffset: null,
            anchor: "Places_and_infrastructure"
        },
        {
            toclevel: 2,
            level: "3",
            line: "Americas",
            number: "1.1",
            index: "T-2",
            fromtitle: "Wikipedia:Unusual_articles/Places_and_infrastructure",
            byteoffset: null,
            anchor: "Americas"
        },
...

But I'm not able to get the content of a particular section. For example under Americas is a list of the table with a link and a short description, but is there a way to obtain the link and short description from the API?

table

Cratch answered 24/10, 2016 at 4:6 Comment(4)
I'd suggest reading the API documentation and figuring out which API call will give you article content.Beatriz
Your best bet is probably to parse the table HTML. The API call is almost right, your are just using the wrong property.Agoraphobia
@Agoraphobia what props am I supposed to use to get the table html?Cratch
Try this query (the table is transcluded from a subpage). In general, ApiSandbox is the easy way to find out what parameters you need.Agoraphobia
D
18

You can get the content of every page section by using MediaWiki API with action=parse in two steps. First you have to get all sections from the page with:

https://en.wikipedia.org/w/api.php?action=parse&prop=sections&page=Wikipedia:Unusual_articles

From the response you see that section Americas has index=T-2 (T means transcluded page) and it comes from fromtitle=Wikipedia:Unusual_articles/Places_and_infrastructure. Now we use these index and fromtitle to get the content of the section with:

https://en.wikipedia.org/w/api.php?action=parse&page=Wikipedia:Unusual_articles/Places_and_infrastructure&section=2&prop=...

where:

  • prop=wikitext - gives the original section wikitext that was parsed.
  • prop=text - gives the parsed section text of the wikitext.
Doornail answered 2/11, 2016 at 20:11 Comment(1)
i'm able to get section details from above api. by passing section index. but it returning html text . I want to get only plain text. how can i get it ?Azar

© 2022 - 2024 — McMap. All rights reserved.