Wikipedia api fulltext search to return articles with title, snippet and image
Asked Answered
J

2

12

I've been looking for a way to query the wikipedia api based on a search string for a list of articles with the following properties:

  • Title
  • Snippet/Description
  • One or more images related to the article.

I also have to make the query using jsonp.

I've tried using the list=search parameter

http://en.wikipedia.org/w/api.php?action=query&list=search&prop=images&format=json&srsearch=test&srnamespace=0&srprop=snippet&srlimit=10&imlimit=1

But it seems to ignore the prop=images, I've also tried variations using the prop=imageinfo and prop=pageimages. But they all give me the same result as just using the list=search.

I've also tried action=opensearch

http://en.wikipedia.org/w/api.php?action=opensearch&search=test&limit=10&format=xml

Which gives me exactly what I want when i set format=xml, but returns a simple array of page titles when using format=json and therefore fails because of the jsonp requirement.

Is there another approach to doing this? I'd really like to solve this in a single request rather than make the first search request and then a second request for the images using titles=x|y|z

Jerrold answered 17/9, 2014 at 12:50 Comment(2)
You can't use a list and a prop query together.Smear
So there is no way of doing this in one request? Seems like a pretty standard query to make.Jerrold
F
23

As Bergi suggested, using generators is the way to go here. Specifically what I would do:

The whole query could look like this:

http://en.wikipedia.org/w/api.php?format=json&action=query&generator=search&gsrnamespace=0&gsrsearch=test&gsrlimit=10&prop=pageimages|extracts&pilimit=max&exintro&explaintext&exsentences=1&exlimit=max

Failing answered 18/9, 2014 at 11:43 Comment(1)
Thank you, this also solved a new requirement I had, by giving me a property for controlling the length of the snippet/extract.Jerrold
S
1

I've tried using the list=search parameter, but it seems to ignore the prop=images

If you want to retrieve any properties, you need to specify a list of pages for which you want to get these; e.g. by using the titles=, pageids=, or revids= parameters. You didn't send any, so you did not get a result for the prop=images.

If you did use api.php?action=query&list=search&srsearch=test&prop=images&titles=test you would have gotten the search results for test and the images of the Test page.

You can however also use the collection that the list query generates for your property query, using the list module as a generator. The query would look like api.php?action=query&generator=search&gsrsearch=test&gsrnamespace=0&gsrprop=snippet&prop=images. Unfortunately, it does not yield the attributes that the list contained, but only used the pageids for a basic property query.

Using two queries is probably the way to go. Btw, I'd recommend to use the pageimages property, it will likely give you the best results.

Smear answered 18/9, 2014 at 10:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.