How to use wikipedia api if it exists? [closed]
Asked Answered
G

8

62

I'm trying to find out if there's a Wikipedia API (I think it is related to the MediaWIki?).

If so, I would like to know how I would tell Wikipedia to give me an article about the new york yankees for example.

What would the REST URL be for this example?

All the docs on this subject seem fairly complicated.

Galvanometer answered 8/6, 2009 at 11:29 Comment(2)
The "if it exists" part is also covered here: https://mcmap.net/q/174491/-is-there-a-wikipedia-api. But I think the "how to use it" part is a legitimate question... sort of.Nissie
There is now an R package that accesses the Mediawiki API (and so Wikipedia), more details and an example: https://mcmap.net/q/323349/-how-to-access-wikipedia-from-rReduction
S
82

You really really need to spend some time reading the documentation, as this took me a moment to look and click on the link to fix it. :/ but out of sympathy i'll provide you a link that maybe you can learn to use.

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=New_York_Yankees&rvprop=timestamp|user|comment|content

That's the variabled you will be looking to get. Your best bet is to know the page you will be after and replace the Wikipedia link part into the title i.e.:

http://en.wikipedia.org/wiki/New_York_Yankees [Take the part after the wiki/]

-->

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=New_York_Yankees&rvprop=timestamp|user|comment|content

[Place it in the title variable of the GET request.

The URL above can do with tweaking to get the different sections you do or do not want. So read the documentation :)

Sungod answered 8/6, 2009 at 12:7 Comment(3)
+1 for an actual example, instead of just dumping links (even though the example is also just a link... :)Nissie
A FANTASTIC PLACE to start is with the wikipedia sandbox. It can help you format your requests/queries: en.wikipedia.org/wiki/Special:ApiSandboxAcidosis
What if I don't know the specific page? Like if I want to search for the band Iron Maiden? The page could be "iron maiden", "Iron Maiden", "Iron Maiden band". How do I search for that?Vauban
O
67

The answers here helped me arrive at a solution, but I discovered more info in the process which may be of advantage to others who find this question. I figure most people simply want to use the API to quickly get content off the page. Here is how I'm doing that:

Using Revisions:

//working url:
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Threadless&rvprop=content&format=json&rvsection=0&rvparse=1

//Explanation
//Base Url:
http://en.wikipedia.org/w/api.php?action=query

//tell it to get revisions:
&prop=revisions

//define page titles separated by pipes. In the example i used t-shirt company threadless
&titles=whatever|the|title|is

//specify that we want the page content
&rvprop=content

//I want my data in JSON, default is XML
&format=json

//lets you choose which section you want. 0 is the first one.
&rvsection=0

//tell wikipedia to parse it into html for you
&rvparse=1

Using Extracts (better/easier for what i'm doing)

//working url:
http://en.wikipedia.org/w/api.php?action=query&prop=extracts&titles=Threadless&format=json&exintro=1

//only explaining new parameters
//instead of revisions, we'll set prop=extracts
&prop=extracts

//if we just want the intro, we can use exintro. Otherwise it shows all sections
&exintro=1

All the info requires reading through the API documentation as was mentioned, but I hope these examples will help the majority of the people who come here for a quick fix.

Ostensorium answered 12/4, 2012 at 0:30 Comment(2)
The first working url provided also allows you to retrieve the infobox for the wiki page! ThanksJablonski
hi, is there a way to get the plain text from the main description?? its very difficult to parse wikitext or HTMl responses :(. any help will be highly appreciated please.Tracee
S
12

See http://www.mediawiki.org/wiki/API

Specifically, for the English Wikipedia, API is located at http://en.wikipedia.org/w/api.php

Solarize answered 8/6, 2009 at 11:32 Comment(3)
yea, i cant figure out how to do my example after reading that. any ideas?Galvanometer
no, i seriously can't figure that document out. i don't know how to get specific page data using that api.Galvanometer
You actually can't. To get raw article source you should access the articles this way: mediawiki.org/w/index.php?title=API&action=rawSolarize
C
9

Have a look at the ApiSandbox at https://en.wikipedia.org/wiki/Special:ApiSandbox That is a web frontend to easily query the API. A few clicks will craft you the URL and show you the API result.

That is an extension for MediaWiki, enabled on all Wikipedia languages. https://www.mediawiki.org/wiki/Extension:ApiSandbox

Cyanocobalamin answered 13/2, 2013 at 16:2 Comment(0)
V
8

If you want to extract structured data from Wikipedia, you may consider using DbPedia http://dbpedia.org/

It provides means to query data using given criteria using SPARQL and returns data from parsed Wikipedia infobox templates

There are some SPARQL libraries available for multiple platforms to make queries easier

Vanquish answered 18/12, 2012 at 22:0 Comment(0)
F
3

If you want to extract structured data from Wikipedia, you may also try http://www.wikidata.org/wiki/Wikidata:Main_Page

Farl answered 25/3, 2014 at 17:38 Comment(0)
I
1

Below is a working example that prints the first sentence from Wikipedias New York Yankees page to your web browsers console:

<!DOCTYPE html>
</html>
    <head>
        <script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js"></script>
    </head>
    <body>
        <script>
            var wikiUrl = "http://en.wikipedia.org/w/api.php?action=opensearch&search=New_York_Yankees&format=json&callback=wikiCallbackFunction";

            $.ajax(wikiUrl, {
                dataType: "jsonp",
                success: function( wikiResponse ) {
                    console.log( wikiResponse[2][0] );
                }
            });
        </script>   
    </body>
</html>

http://en.wikipedia.org/w/api.php is the endpoint for your url. You can see how to structure your url by visiting: http://www.mediawiki.org/wiki/API:Main_page

I used jsonp as the dataType to allow cross-site requests. More can be found here: http://www.mediawiki.org/wiki/API:Cross-site_requests

Last but not least, make sure to reference the Jquery.ajax() API: http://api.jquery.com/jquery.ajax/

Importunity answered 13/4, 2015 at 10:13 Comment(0)
A
0

Wiki Parser converts Wikipedia dumps into XML. It is also quite fast. You can then use any XML processing tool to handle the data from the parsed Wikipedia articles.

Anet answered 29/1, 2015 at 16:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.