Accessing main picture of wikipedia page by API

Asked 2/12, 2011 at 22:32 Answered 29/4, 2020 at 5:46

Is there any way I can access the thumbnail picture of any wikipedia page by using an API? I mean the image on the top right side in box. Is there any APIs for that?

Invade answered 2/12, 2011 at 22:32 Comment(1)

All answers here are unreliable hacks that often give the wrong image. The answer at #36813852 gives an image more often AND it is never the wrong image. I suggest merging the two questions. – Distaff 25/6, 2020 at 10:23

http://en.wikipedia.org/w/api.php

Look at prop=images.

It returns an array of image filenames that are used in the parsed page. You then have the option of making another API call to find out the full image URL, e.g.: action=query&titles=Image:INSERT_EXAMPLE_FILE_NAME_HERE.jpg&prop=imageinfo&iiprop=url

or to calculate the URL via the filename's hash.

Unfortunately, while the array of images returned by prop=images is in the order they are found on the page, the first can not be guaranteed to be the image in the info box because sometimes a page will include an image before the infobox (most of the time icons for metadata about the page: e.g. "this article is locked").

Searching the array of images for the first image that includes the page title is probably the best guess for the infobox image.

Metathesize answered 2/12, 2011 at 22:38 Comment(5)

i can access images using prop, but its giving me multiple pictures en.wikipedia.org/w/… . I dont know which one is main picture. – Invade 2/12, 2011 at 23:15

@Aby, I thinks this just a list of picture, you can not get the real image URL, so may be you can reference my anwser. and if you want to check the api document, see this http://www.mediawiki.org/wiki/API:FAQ – Panga 2/12, 2011 at 23:27

Thanks for replying i checked out that image. – Invade 4/12, 2011 at 19:39

It's better to use the API call instead of calculating your own, because you still don't know if it's in Commons e.g. /commons/a/ae/Filename.jpg or language specific /en/a/ae/Filename.jpg – Vaucluse 26/10, 2012 at 22:26

There is now a new property called pageimages, that filters out the default images. – Duncandunce 18/6, 2014 at 11:53

You can get the thumbnail of any wikipedia page using prop=pageimages. For example:

http://en.wikipedia.org/w/api.php?action=query&titles=Al-Farabi&prop=pageimages&format=json&pithumbsize=100

And you will get the thumbnail full URL.

Disgust answered 1/12, 2013 at 11:27 Comment(6)

This is a great solution, but (for the record) it is based on a new-ish API extension that is marked 'experimental'. mediawiki.org/wiki/Extension:PageImages – Rina 3/6, 2014 at 17:40

Maybe it's experimental, but it works! A second example, showing how to get thumbnails from multiple pages, in one single query: en.wikipedia.org/w/… – Nsf 1/9, 2014 at 12:24

Is there a way to get this same information if you have the wikipedia (numeric) ID and not the wikipedia TITLE? – Wichita 1/6, 2015 at 12:43

As I understand I need to know number 14533 that I could access image, but it changes all the time, so how can I open it? {"batchcomplete":"","query":{"pages":{"14533":{"pageid":14533,"ns":0,"title":"India","thumbnail":{"original":"upload.wikimedia.org/wikipedia/en/4/41/Flag_of_India.svg"}}}}} – Pledgee 29/12, 2015 at 10:40

@LaurynasG - You want to use formatversion=2 in your api call. https://en.wikipedia.org/w/api.php?action=query&formatversion=2&prop=pageimages%7Cpageterms&titles=Albert%20Einstein – Popsicle 27/3, 2017 at 6:43

Note that you also need to add &pilicense=any if you also want to get non-free thumbnail images (i.e., for things like video games, movies, etc.) – Canadian 8/1, 2019 at 0:12

http://en.wikipedia.org/w/api.php

Look at prop=images.

or to calculate the URL via the filename's hash.

Searching the array of images for the first image that includes the page title is probably the best guess for the infobox image.

Metathesize answered 2/12, 2011 at 22:38 Comment(5)

i can access images using prop, but its giving me multiple pictures en.wikipedia.org/w/… . I dont know which one is main picture. – Invade 2/12, 2011 at 23:15

Thanks for replying i checked out that image. – Invade 4/12, 2011 at 19:39

There is now a new property called pageimages, that filters out the default images. – Duncandunce 18/6, 2014 at 11:53

This is good way to get the Main Image of a page in wikipedia

http://en.wikipedia.org/w/api.php?action=query&prop=pageimages&format=json&piprop=original&titles=India

Seepage answered 27/5, 2015 at 10:2 Comment(4)

This answer is brief, but works! pageimages is probably new property, which is why it wasn't covered earlier, but gets the main image for the page. – Funicle 15/9, 2016 at 23:19

Elegant ! The pageimages property was it. It The result is probably the first picture in the infobox template. And if you add images to the prop=, then you get all the other images on the page as well. (en.wikipedia.org/w/…) – Argosy 16/10, 2017 at 5:5

This is the best answer – Kingdom 28/3, 2018 at 5:41

Note that you also need to add &pilicense=any if you also want to get non-free thumbnail images (i.e., for things like video games, movies, etc.) – Canadian 8/1, 2019 at 0:12

Check out the MediaWiki API example for getting the main picture of a wikipedia page: https://www.mediawiki.org/wiki/API:Page_info_in_search_results.

As other's have mentioned, you would use prop=pageimages in your API query.

If you also want the image description, you would use prop=pageimages|pageterms instead in your API query.

You can get the original image using piprop=original. Or you can get a thumbnail image with a specified width/height. For a thumbnail with width/height=600, piprop=thumbnail&pithumbsize=600. If you omit either, the image returned in the API callback will default to a thumbnail with width/height of 50px.

If you are requesting results in JSON format, you should always use formatversion=2 in your API query (i.e., format=json&formatversion=2) because it makes retrieving the image from the query easier.

Original Size Image:

https://en.wikipedia.org/w/api.php?action=query&format=json&formatversion=2&prop=pageimages|pageterms&piprop=original&titles=Albert Einstein

Thumbnail Size (600px width/height) Image:

https://en.wikipedia.org/w/api.php?action=query&format=json&formatversion=2&prop=pageimages|pageterms&piprop=thumbnail&pithumbsize=600&titles=Albert Einstein

Popsicle answered 27/3, 2017 at 7:8 Comment(2)

It's important to note that pageimages will not return the image url if wikipedia uses a non-free image. For instance, attempting to retrieve the image for Family Guy will not return an image:

https://en.wikipedia.org/w/api.php?action=query&format=json&formatversion=2&prop=pageimages|pageterms&piprop=original&titles=Family%20Guy

– Popsicle 28/3, 2017 at 9:0

Note that you also need to add &pilicense=any if you also want to get non-free thumbnail images (i.e., for things like video games, movies, etc.) – Canadian 8/1, 2019 at 0:12

Way 1: You can try some query like this:

http://en.wikipedia.org/w/api.php?action=opensearch&limit=5&format=xml&search=italy&namespace=0

in the response, you can see the Image tag.

<Item>
<Text xml:space="preserve">Italy national rugby union team</Text>
<Description xml:space="preserve">
The Italy national rugby union team represent the nation of Italy in the sport of rugby union.
</Description>
<Url xml:space="preserve">
http://en.wikipedia.org/wiki/Italy_national_rugby_union_team
</Url>
<Image source="http://upload.wikimedia.org/wikipedia/en/thumb/4/46/Italy_rugby.png/43px-Italy_rugby.png" width="43" height="50"/>
</Item>

Way 2: use query http://en.wikipedia.org/w/index.php?action=render&title=italy

then you can get a raw html code, you can get the image use something like PHP Simple HTML DOM Parser http://simplehtmldom.sourceforge.net

I have no time write it to you. just give you some advice, thanks.

Panga answered 2/12, 2011 at 22:40 Comment(2)

This is a open search and this might give multiple pages on search.. – Invade 4/12, 2011 at 19:40

@Aby, I have also studied wiki api document for a long time. so these 2 ways were what I thought it could be get the image. In my opinion, I'd like way 2, because if there have image on the page, you can easily get them out by dom parse. because all the wiki pages are generated by code, you can easily find the commons from them. they always lay in some table or div with class aaa or class bbb. that is all of my suggestion. – Panga 4/12, 2011 at 21:6

I'm sorry for not answering specifically your question about the main image. But here's some code to get a list of all images:

function makeCall($url) {
    $curl = curl_init();
    curl_setopt($curl, CURLOPT_URL, $url);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
    return curl_exec($curl);
}

function wikipediaImageUrls($url) {
    $imageUrls = array();
    $pathComponents = explode('/', parse_url($url, PHP_URL_PATH));
    $pageTitle = array_pop($pathComponents);
    $imagesQuery = "http://en.wikipedia.org/w/api.php?action=query&titles={$pageTitle}&prop=images&format=json";
    $jsonResponse = makeCall($imagesQuery);
    $response = json_decode($jsonResponse, true);
    $imagesKey = key($response['query']['pages']);
    foreach($response['query']['pages'][$imagesKey]['images'] as $imageArray) {
        if($imageArray['title'] != 'File:Commons-logo.svg' && $imageArray['title'] != 'File:P vip.svg') {
            $title = str_replace('File:', '', $imageArray['title']);
            $title = str_replace(' ', '_', $title);
            $imageUrlQuery = "http://en.wikipedia.org/w/api.php?action=query&titles=Image:{$title}&prop=imageinfo&iiprop=url&format=json";
            $jsonUrlQuery = makeCall($imageUrlQuery);
            $urlResponse = json_decode($jsonUrlQuery, true);
            $imageKey = key($urlResponse['query']['pages']);
            $imageUrls[] = $urlResponse['query']['pages'][$imageKey]['imageinfo'][0]['url'];
        }
    }
    return $imageUrls;
}
print_r(wikipediaImageUrls('http://en.wikipedia.org/wiki/Saturn_%28mythology%29'));
print_r(wikipediaImageUrls('http://en.wikipedia.org/wiki/Hans-Ulrich_Rudel'));

I got this for http://en.wikipedia.org/wiki/Saturn_%28mythology%29:

Array
(
    [0] => http://upload.wikimedia.org/wikipedia/commons/1/10/Arch_of_SeptimiusSeverus.jpg
    [1] => http://upload.wikimedia.org/wikipedia/commons/8/81/Ivan_Akimov_Saturn_.jpg
    [2] => http://upload.wikimedia.org/wikipedia/commons/d/d7/Lucius_Appuleius_Saturninus.jpg
    [3] => http://upload.wikimedia.org/wikipedia/commons/2/2c/Polidoro_da_Caravaggio_-_Saturnus-thumb.jpg
    [4] => http://upload.wikimedia.org/wikipedia/commons/b/bd/Porta_Maggiore_Alatri.jpg
    [5] => http://upload.wikimedia.org/wikipedia/commons/6/6a/She-wolf_suckles_Romulus_and_Remus.jpg
    [6] => http://upload.wikimedia.org/wikipedia/commons/4/45/Throne_of_Saturn_Louvre_Ma1662.jpg
)

And for the second URL (http://en.wikipedia.org/wiki/Hans-Ulrich_Rudel):

Array
(
    [0] => http://upload.wikimedia.org/wikipedia/commons/e/e9/BmRKEL.jpg
    [1] => http://upload.wikimedia.org/wikipedia/commons/3/3f/BmRKELS.jpg
    [2] => http://upload.wikimedia.org/wikipedia/commons/2/2c/Bundesarchiv_Bild_101I-655-5976-04%2C_Russland%2C_Sturzkampfbomber_Junkers_Ju_87_G.jpg
    [3] => http://upload.wikimedia.org/wikipedia/commons/6/62/Bundeswehr_Kreuz_Black.svg
    [4] => http://upload.wikimedia.org/wikipedia/commons/9/99/Flag_of_German_Reich_%281935%E2%80%931945%29.svg
    [5] => http://upload.wikimedia.org/wikipedia/en/6/64/HansUlrichRudel.jpeg
    [6] => http://upload.wikimedia.org/wikipedia/commons/8/82/Heinkel_He_111_during_the_Battle_of_Britain.jpg
    [7] => http://upload.wikimedia.org/wikipedia/commons/6/66/Regulation_WW_II_Underwing_Balkenkreuz.png
)

Note that the URL changed a bit on the 6th element of the second array. It's what @JosephJaber was warning about in his comment above.

Hope this helps someone.

Proconsulate answered 6/12, 2013 at 19:21 Comment(1)

If anyone needs to actually see which object keys to reference to retrieve image from fetch response, THIS is the answer!! Thank you! – Dorso 7/5, 2020 at 20:54

I have written some code that gets main image (full URL) by Wikipedia article title. It's not perfect, but overall I'm very pleased with the results.

The challenge was that when queried for a specific title, Wikipedia returns multiple image filenames (without path). Furthermore, the secondary search (I used the code varatis posted in this thread - thanks!) returns URLs of all images found based on the image filename that was searched, regardless of the original article title. After all this, we may end up with a generic image irrelevant to the search, so we filter those out. The code iterates over filenames and URLs until it finds (hopefully the best) match... a bit complicated, but it works :)

Note on the generic filter: I've been compiling a list of generic image strings for the isGeneric() function, but the list just keeps growing. I am considering maintaining it as a public list - if there is any interest let me know.

Pre:

protected static $baseurl = "http://en.wikipedia.org/w/api.php";

Main function - get image URL from title:

public static function getImageURL($title)
{
    $images = self::getImageFilenameObj($title); // returns JSON object
    if (!$images) return '';

    foreach ($images as $image)
    {
        // get object of image URL for given filename
        $imgjson = self::getFileURLObj($image->title);

        // return first image match
        foreach ($imgjson as $img)
        {
            // get URL for image
            $url = $img->imageinfo[0]->url;

            // no image found               
            if (!$url) continue;

            // filter generic images
            if (self::isGeneric($url)) continue;

            // match found
            return $url;
        }
    }
    // match not found
    return '';          
}

== The following functions are called by the main function above ==

Get JSON object (filenames) by title:

public static function getImageFilenameObj($title)
{
    try     // see if page has images
    {
        // get image file name
        $json = json_decode(
            self::retrieveInfo(
                self::$baseurl . '?action=query&titles=' .
                urlencode($title) . '&prop=images&format=json'
            ))->query->pages;

        /** The foreach is only to get around
         *  the fact that we don't have the id.
         */
        foreach ($json as $id) { return $id->images; }
    }
    catch(exception $e) // no images
    {
        return NULL;
    }
}

Get JSON object (URLs) by filename:

public static function getFileURLObj($filename)
{
    try                     // resolve URL from filename
    {
        return json_decode(
            self::retrieveInfo(
                self::$baseurl . '?action=query&titles=' .
                urlencode($filename) . '&prop=imageinfo&iiprop=url&format=json'
            ))->query->pages;
    }
    catch(exception $e)     // no URLs
    {
        return NULL;
    }
}

Filter out generic images:

public static function isGeneric($url)
{
    $generic_strings = array(
        '_gray.svg',
        'icon',
        'Commons-logo.svg',
        'Ambox',
        'Text_document_with_red_question_mark.svg',
        'Question_book-new.svg',
        'Canadese_kano',
        'Wiki_letter_',
        'Edit-clear.svg',
        'WPanthroponymy',
        'Compass_rose_pale',
        'Us-actor.svg',
        'voting_box',
        'Crystal_',
        'transportation_inv',
        'arrow.svg',
        'Quill_and_ink-US.svg',
        'Decrease2.svg',
        'Rating-',
        'template',
        'Nuvola_apps_',
        'Mergefrom.svg',
        'Portal-',
        'Translation_to_',
        '/School.svg',
        'arrow',
        'Symbol_',
        'stub',
        'Unbalanced_scales.svg',
        '-logo.',
        'P_vip.svg',
        'Books-aj.svg_aj_ashton_01.svg',
        'Film',
        '/Gnome-',
        'cap.svg',
        'Missing',
        'silhouette',
        'Star_empty.svg',
        'Music_film_clapperboard.svg',
        'IPA_Unicode',
        'symbol',
        '_highlighting_',
        'pictogram',
        'Red_pog.svg',
        '_medal_with_cup',
        '_balloon',
        'Feature',
        'Aiga_'
    );

    foreach ($generic_strings as $str)
    {
        if (stripos($url, $str) !== false) return true;
    }

    return false;
}

Comments welcome.

Hoy answered 23/12, 2013 at 18:43 Comment(0)

Lets take Example of Page http://en.wikipedia.org/wiki/index.html?curid=57570 to get Main Pic

Check out

prop=pageprops

action=query&pageids=57570&prop=pageprops&format=json

Results Page Data Eg.

{ "pages" : { "57570":{
                    "pageid":57570,
                    "ns":0,
                    "title":"Sachin Tendulkar",
                    "pageprops" : {
                         "defaultsort":"Tendulkar,Sachin",
                         "page_image":"Sachin_at_Castrol_Golden_Spanner_Awards_(crop).jpg",
                         "wikibase_item":"Q9488"
                    }
            }
          }
 }}

We get main Pic file name this result as

** (wikiId).pageprops.page_image = Sachin_at_Castrol_Golden_Spanner_Awards_(crop).jpg**

Now as we have Image file name we will have to make another Api Call to get full image path from file name as follows

action=query&titles=Image:INSERT_EXAMPLE_FILE_NAME_HERE.jpg&prop=imageinfo&iiprop=url

Eg.

action=query&titles=Image:Sachin_at_Castrol_Golden_Spanner_Awards_(crop).jpg&prop=imageinfo&iiprop=url

Returns Array of Image Data having url in it as http://upload.wikimedia.org/wikipedia/commons/3/35/Sachin_at_Castrol_Golden_Spanner_Awards_%28crop%29.jpg

Optative answered 29/11, 2014 at 18:14 Comment(0)

I there is a way to reliably get a main image for a wikipedia page - the Extension called PageImages

The PageImages extension collects information about images used on a page.

Its aim is to return the single most appropriate thumbnail associated with an article, attempting to return only meaningful images, e.g. not those from maintenance templates, stubs or flag icons. Currently it uses the first non-meaningless image used in the page.

https://www.mediawiki.org/wiki/Extension:PageImages

Just add the prop pageimages to your API Query:

/w/api.php?action=query&prop=pageimages&titles=Somepage&format=xml

This reliably filters out annoying default images and prevents you from having to filter them yourself! The extension is installed on all the main wikipedia pages...

Duncandunce answered 18/6, 2014 at 11:51 Comment(0)

Like Anuraj mentioned, the pageimages parameter is it. Look at the following url that'll bring about some nifty stuff:

https://en.wikipedia.org/w/api.php?action=query&prop=info|extracts|pageimages|images&inprop=url&exsentences=1&titles=india

Her are some interesting parameters:

The two parameters extracts and exsentences gives you a short description you can use. (exsentences is the number of sentences you want to include in the excerpt)
The info and the inprop=url parameters gives you the url of the page
The prop property has multiple parameters separated by a bar symbol
And if you insert the format=json in there, it is even better

Argosy answered 16/10, 2017 at 5:21 Comment(1)

As of 2023, this results in "Unrecognized parameters: exsentences, inprop" – Alexander 27/4, 2023 at 13:23

See this related question on an API for Wikipedia. However, I would not know if it is possible to retrieve the thumbnail picture through an API.

You can also consider just parsing the web page to find the image URL, and retrieve the image that way.

Sorghum answered 2/12, 2011 at 22:42 Comment(1)

thanks for replying, but fetching image on a wikipedia page is kind of impossible – Invade 2/12, 2011 at 23:22

Here is my list of XPaths I have found work for 95 percent of articles. the main ones are 1, 2 3 and 4. A lot of articles are not formatted correctly and these would be edge cases:

You can use a DOM parsing lib to fetch image using the XPath.

static NSString   *kWikipediaImageXPath2    =   @"//*[@id=\"mw-content-text\"]/div[1]/div/table/tr[2]/td/a/img";
static NSString   *kWikipediaImageXPath3    =   @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[1]/td/a/img";
static NSString   *kWikipediaImageXPath1    =   @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[2]/td/a/img";
static NSString   *kWikipediaImageXPath4    =   @"//*[@id=\"mw-content-text\"]/div[2]/table/tr[2]/td/a/img";
static NSString   *kWikipediaImageXPath5    =   @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[2]/td/p/a/img";
static NSString   *kWikipediaImageXPath6    =   @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[2]/td/div/div/a/img";
static NSString   *kWikipediaImageXPath7    =   @"//*[@id=\"mw-content-text\"]/div[1]/table/tr[1]/td/div/div/a/img";

I used a ObjC wrapper called Hpple around libxml2.2 to pull out the image url. Hope this helps

Magistery answered 15/5, 2016 at 2:19 Comment(0)

You can also use cocoa Pod called SDWebImage

Code sample (remember to also add import SDWebImage):

func requestInfo(flowerName: String) {

        let parameters : [String:String] = [
            "format" : "json",
            "action" : "query",
            "prop" : "extracts|pageimages",//pageimages allows fetch imagePath
            "exintro" : "",
            "explaintext" : "",
            "titles" : flowerName,
            "indexpageids" : "",
            "redirects" : "1",
            "pithumbsize" : "500"//specify image size in px
        ]


        AF.request(wikipediaURL, method: .get, parameters: parameters).responseJSON { (response) in
            switch response.result {
            case .success(let value):
                print("Got the wikipedia info.")
                print(response)

                let flowerJSON : JSON = JSON(response.value!)
                let pageid = flowerJSON["query"]["pageids"][0].stringValue

                let flowerDescription = flowerJSON["query"]["pages"][pageid]["extract"].stringValue

                let flowerImageURL = flowerJSON["query"]["pages"][pageid]["thumbnail"]["source"].stringValue //fetching Image URL

                self.wikiInfoLabel.text = flowerDescription
                self.imageView.sd_setImage(with: URL(string : flowerImageURL))//imageView updated with Wiki Image

            case .failure(let error):
                print(error)
            }
        }
    }

Rothberg answered 29/4, 2020 at 5:46 Comment(0)

I think not, but you can capture the image using a link parser HTML documents

Erminia answered 2/12, 2011 at 22:38 Comment(1)

is there any way to access pictures using APIs? – Invade 2/12, 2011 at 23:15

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

prop=pageprops

Recommended topics

Hot tags