Rails + MediaWiki API for Wikipedia data extraction
Asked Answered
F

3

6

I am trying to use Rails to extract data from Wikipedia, based on a search term.

For example,

1) if I have the String "American Idol", I want to pass that to Wikipedia and get a list of the articles that relate to that. My goal will be to take the first 3 hyperlinks and display them on the website.

2) one step further would involve me extracting small pieces of data from Wikipedia - say the infobox, or the first few words of the wikipedia article.

Any tips?

Thanks!

Foolscap answered 20/10, 2011 at 4:32 Comment(1)
check this link, you may like wikipedia.coffee scraping wikipediaSelwin
H
5

You don't need to resort to screen-scraping, MediaWiki has a very comprehensive API for precisely this kind of thing. See https://github.com/jpatokal/mediawiki-gateway for a handy Ruby wrapper around it.

Alternatively, if you're only interested in data like infoboxes, see DBpedia for the database version of Wikipedia.

Hospitaler answered 27/10, 2011 at 11:25 Comment(2)
unfortunately "This gem is no longer in active development.", maybe you know another project which is actively maintained?Paregoric
Should probably tweak the wording there: that's supposed to mean that I don't spend my own time on it any more, but I'm more than happy to take pull requests. AFAIK it works fine on all current MediaWiki versions.Hospitaler
N
1

There is another gem that you can use: https://github.com/kenpratt/wikipedia-client

This gem seems to get just the first result of your search, but you can consult the documentation to be sure.

Regarding the content, once you get the page, the gem allows you to access the different content of the article, links, images and so on.

Nuzzle answered 22/5, 2014 at 9:31 Comment(0)
H
0

Use mechanize and nokogiri to do that. This is a great cheat sheet for that:

http://www.e-tobi.net/blog/files/ruby-mechanize-cheat-sheet.pdf

Mechanize is a toolbox to simulate website calls and nokogiri is an html/xml parser. It should be simple to realize that.

Huddle answered 20/10, 2011 at 5:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.