Parse Wiktionary
Asked Answered
M

2

7

Is there any .Net library to parse pages I've retrieved through the mediawiki api? A standard mediawiki parser that could just give titles and the data in pure data would be fine, but I would rather have one that is specifically suited to wiktionary, one that could give me what type of word it is and all of the definitions.

I would prefer not to write my own parser for this. Any suggestions?

Maurizio answered 5/12, 2011 at 23:38 Comment(5)
Which output format are you consuming from the API? There are currently 9 from which to choose...Around
@Alex there are tonnes of examples, start here: mediawiki.org/wiki/API:Parsing_wikitextChimaera
I'm not aware of any API or client library that would provide Wiktionary data in a structured format (as opposed to HTML or raw wikitext). Then again, I haven't really looked much, either.Janeljanela
I spoke too soon -- just after posting the comment above, I found this answer which mentions JWKTL. It's in Java, though, not C#.Janeljanela
Possible duplicate of Has anyone parsed Wiktionary?Audraaudras
M
4

The dbnary project provides parsed information from Wiktionary in RDF form.

If you want something processed even further, I provide SQLite and TEI files generated from the dbnary data as part of my WikDict project at download.wikdict.com.

This does not really answer the question for .net libraries, but I'm sure you'll easily find libraries to read XML (TEI), SQLite or RDF.

Mandler answered 26/9, 2016 at 12:57 Comment(0)
V
2

If you get the output in JSON, there are many options you could use, both built in to .NET and external to the framework itself.

If you get the output in XML, again, there are powerful XML manipulation classes within the .NET framework itself and outside of the framework.

You're going to have to be more specific -- provide the format and some example output.

Vermiform answered 5/12, 2011 at 23:43 Comment(1)
I use this: en.wiktionary.org/w/… it comes out with wiki code, the same code that you would type into mediawiki to make the page.Maurizio

© 2022 - 2024 — McMap. All rights reserved.