I wrote a Python library that aims to make this very easy. Check it out at Github.
To install it, run
$ pip install wikipedia
Then to get the first paragraph of an article, just use the wikipedia.summary
function.
>>> import wikipedia
>>> print wikipedia.summary("Albert Einstein", sentences=2)
prints
Albert Einstein (/ˈælbərt ˈaɪnstaɪn/; German: [ˈalbɐt ˈaɪnʃtaɪn] (
listen); 14 March 1879 – 18 April 1955) was a German-born
theoretical physicist who developed the general theory of relativity,
one of the two pillars of modern physics (alongside quantum
mechanics). While best known for his mass–energy equivalence formula E
= mc2 (which has been dubbed "the world's most famous equation"), he received the 1921 Nobel Prize in Physics "for his services to
theoretical physics, and especially for his discovery of the law of
the photoelectric effect".
As far as how it works, wikipedia
makes a request to the Mobile Frontend Extension of the MediaWiki API, which returns mobile friendly versions of Wikipedia articles. To be specific, by passing the parameters prop=extracts&exsectionformat=plain
, the MediaWiki servers will parse the Wikitext and return a plain text summary of the article you are requesting, up to and including the entire page text. It also accepts the parameters exchars
and exsentences
, which, not surprisingly, limit the number of characters and sentences returned by the API.
urllib
for getting the page andBeautifulSoup
for parsing HTML. Though there are other ways of doing it, search for them on StackOverflow itself. This has been discussed lots of times. – Kind