Retrieving image license and author information in wiki commons
Asked Answered
E

6

20

I am trying to use the wikimedia API for wiki commons at:

http://commons.wikimedia.org/w/api.php

It seems like the commons API is very immature and the part at their document that mentions the possibility to retrieve license and author information is empty.

Is there anyway I can retrieve the paragraph that contains the information about the licensing using the API? (For example, the paragraph under the title "Licensing" at this page). Of course I can download the whole page and try to parse it, but what are APIs for?

Exceedingly answered 17/9, 2011 at 8:55 Comment(0)
S
27

Late answer but you can request the "extmetadata" data with the following query:

http://en.wikipedia.org/w/api.php?action=query&prop=imageinfo&iiprop=extmetadata&titles=File%3aBrad_Pitt_at_Incirlik2.jpg&format=json

Look under imageinfo.extmetadata.UsageTerms, Artist, Credit, etc.

Steiger answered 9/2, 2015 at 20:5 Comment(3)
This is the right answer. Use format jsonfm for easy verification: en.wikipedia.org/w/…Detrain
Right answer: from wikipedia official php api and easily parsable JSON response!Iodate
I love this answer. Is there any dump that is given in this format? I dowloaded some dumps but I would love a giant parseable dump in this formatCombatant
L
4

You could try using Magnus Manske's Commons API tool on the Wikimedia Toolserver. It's not an official service, and the documentation seem to be rather sparse (that is to say, almost nonexistent), but the XML output seems pretty self-explanatory.

I can't seem to find the source for Magnus's script anywhere, but I assume it extracts the licensing information from the categories the file belongs to. If you wanted, you could do that yourself: just fetch the list of categories and, if necessary, walk up the category tree until you find a license category you recognize. Alas, the tree-walking part requires either multiple API requests or a database of Commons categories (either live access on the Toolserver, or a reconstructed copy from the database dumps).

Yes, I realize that this answer may seem unsatisfactory. The fact is that Magnus's script seems to be the closest currently existing thing to what you want, and even it's marked as experimental and incomplete. Basically, this is a problem waiting for someone to implement a (better) solution.

Laryngeal answered 7/2, 2012 at 20:47 Comment(4)
It seems that the tool is down, can anybody confirm that?Hafnium
@user5950: Yeah, seems to be down for me too.Laryngeal
So, if anybody finds out about an good alternative, please let us know!Hafnium
2021 update: It seems to be working fine, and the source code is bitbucket.org/magnusmanske/magnustools/src/master/public_html/…Subjunction
A
4

have a look at Mediawiki and try this function:

import json, requests
def extract_image_license(image_name):

    start_of_end_point_str = 'https://commons.wikimedia.org' \
                         '/w/api.php?action=query&titles=File:'
    end_of_end_point_str = '&prop=imageinfo&iiprop=user' \
                       '|userid|canonicaltitle|url|extmetadata&format=json'
    result = requests.get(start_of_end_point_str + image_name+end_of_end_point_str)
    result = result.json()
    page_id = next(iter(result['query']['pages']))
    image_info = result['query']['pages'][page_id]['imageinfo']

    return image_info

then you call the function and pass in the image name you want to query for example:

extract_image_license('Albert_Einstein_Head.jpg')
America answered 26/2, 2019 at 10:46 Comment(0)
I
2

I've used Magnus' Commons API tool. It's not designed to be just dropped into a project, but if you copy the source of the wiki page it calls and cache it locally, then move the logic into a class you can make it more easily callable. Here's the source for Magnus' version. If you want the class I created from it let me know and I'll dig it out.

Impignorate answered 11/7, 2012 at 12:48 Comment(0)
M
1

From http://www.mediawiki.org/wiki/API_talk:Main_page#Image_license_information Is there a way to get the license of an image through the api? By category is probably easiest, assuming the site categorizes by license. There is no built in module though for license information. Splarka 08:45, 22 January 2010 (UTC)

However, I find that using categories doesn't return anything for many images even though they have a license specified. Maybe the best way is to parse the rendered html of the image page.

Morpho answered 24/1, 2014 at 21:23 Comment(0)
H
-3

see page: http://www.mediawiki.org/wiki/API:Meta

You can use foreach image the tag 'meta=siteinfo' and the tag 'siprop=rightsinfo' (siprop is the prop of the siteinfo) Then you will see the rightsinfo of the picture.

In your case of Brad Pitt it would be like:

http://en.wikipedia.org/w/api.php?format=jsonfm&action=query&titles=File:Brad_Pitt_at_Incirlik2.jpg&prop=imageinfo&iiprop=url&meta=siteinfo&siprop=rightsinfo

Hellish answered 27/8, 2013 at 14:30 Comment(1)
This isn't correct. 'siteinfo' gives information about the site, not the image. For example, if you look at the page for File:Flag_of_the_United_Kingdom.svg, you'll see it's licensed as Public Domain. However, using this file in the query you provide shows that the page (not the image) is licensed as Creative Commons. It gives no information about the image.Analyst

© 2022 - 2024 — McMap. All rights reserved.