Although MediaWiki has an API (api.php
), it might be easiest for your purposes to just use the action=raw
parameter to index.php
if you just want to retrieve the source code of one revision (not wrapped in XML, JSON, etc., as opposed to the API).
For example, this is the raw word of the day page for November 14:
http://en.wiktionary.org/w/index.php?title=Wiktionary:Word_of_the_day/November_14&action=raw
What's unfortunate is that the format of wiki pages focuses on presentation (for the human reader) rather than on semantics (for the machine), so you should not be surprised that there is no "get word definition" API command. Instead, your script will have to make sense of the numerous text formatting templates that Wiktionary editors have created and used, as well as complex presentational formatting syntax, including headings, unordered lists, and others. For example, here is the source code for the page "overflow":
http://en.wiktionary.org/w/index.php?title=overflow&action=raw
There is a "generate XML parse tree" option in the API, but it doesn't break much of the presentational formatting into XML. Just see for yourself:
http://en.wiktionary.org/w/api.php?action=query&titles=overflow&prop=revisions&rvprop=content&rvgeneratexml=&format=jsonfm
In case you are wondering whether there exists a parser for MediaWiki-format pages other than MediaWiki, no, there isn't. At least not anything written in JavaScript that's currently maintained (see list of alternative parsers, and check the web sites of the two listed ones). And even then, supporting most/all of the common templates will be a big challenge. Good luck.