Convert MediaWiki wikitext format to HTML using command line
Asked Answered
T

3

13

I tend to write a good amount of documentation so the MediaWiki format to me is easy for me to understand plus it saves me a lot of time than having to write traditional HTML. I, however, also write a blog and find that switching from keyboard to mouse all the time to input the correct tags for HTML adds a lot of time. I'd like to be able to write my articles in Mediawiki syntax and then convert it to HTML for use on my blog.

I've tried Google-ing but must need better nomenclature as surprisingly I haven't been able to find anything.

I use Linux and would prefer to do this from the command line.

Any one have any thoughts or ideas?

Triple answered 18/2, 2012 at 18:29 Comment(1)
see also lexers / parsers for (un) structured text documents for alternative formatsOligochaete
T
9

Looked into this a bit and think that a good route to take here would be to learn to a general markup language like restucturedtext or markdown and then be able to convert from there. Discovered a program called pandoc that can convert either of these to HTML and Mediawiki. Appreciate the help.

Example:

pandoc -f mediawiki -s myfile.mediawiki  -o myfile.html -s
Triple answered 25/2, 2012 at 9:55 Comment(5)
Please don't. Alternative parsers for wikitext are always very fragile, due to how wikitext has (not) been designed.Slemmer
I just tried pandoc as a result of this answer for converting mediawiki to tex and HTML and am very pleased with the results. I can't speak for its fragility but if you're just using the basics like headings, lists etc it looks perfectly fine. It plays nicely with other UNIX commands since it supports stdin/stdout IO which is great for pipes.Varistor
Pandoc does not recognize the full wiki markup. Therefore there will be a lot of articles which cannot be properly parsed. I tried this myself.Blume
@Blume - if you remember, what kind of mediawiki syntax does it fail on? Readers considering using it may not need unsupported features that are catered mainly for Wikipedia-like use cases.Varistor
pandoc fails to translate templates in wikitext.Felicity
M
14

The best would be to use MediaWiki parser. The good news is that MediaWiki 1.19 will provide a command line tool just for that!

Disclaimer: I wrote that tool.

The script is maintenance/parse.php some usage examples straight from the source code:

Entering text yourself, ending it with Control + D:

$ php maintenance/parse.php --title foo
''[[foo]]''^D
<p><i><strong class="selflink">foo</strong></i>
</p>
$

The usual file input method:

$ echo "'''bold'''" > /tmp/foo.txt
$ php maintenance/parse.php /tmp/foo.txt
<p><b>bold</b>
</p>$

And of course piping to stdin:

$ cat /tmp/foo | php maintenance/parse.php
<p><b>bold</b>
</p>$

as of today you can get the script from http://svn.wikimedia.org/svnroot/mediawiki/trunk/phase3/maintenance/parse.php and place it in your maintenance directory. It should work with MediaWiki 1.18

The script will be made available with MediaWiki 1.19.0.

Metaphysics answered 4/3, 2012 at 23:16 Comment(8)
Actually this is pretty useful, and just what I need. Appreciate the info Antoine.Macroclimate
I get the error "PHP Fatal error: Call to undefined function mysql_error() in /scratch4/dhruv/mediawiki-1.20.2/includes/db/DatabaseMysql.php on line 326" when I try to run the above. Any idea how I can fix it?Riane
Also, why does this tool take in --dbuser and --dbpass?Riane
@Riane The eval.php script is an old script that has not been migrated to take recognize --dbuser and --dbpass :( I have filled bug bugzilla.wikimedia.org/45254 to track this, though that is not that much of a high priority item =)Loess
I had some permissions problems with CDB files, so the lazy way to get around them was to use sudo. Then it worked.Varistor
I wish it didn't generate "edit" hyperlinks that point to nowhere. I guess some regex manipulation can take care of that. I'm happy it generates a table of contents.Varistor
the best would be running a mediawiki instance, so it also has access to templates.Felicity
That is what maintenance/parse.php is doing! It requires a MediaWiki instance and has full access to templates, it is just that the interface is a command line utility rather than editing a page with a web browser. And I think that question triggered me to write the command line tool :]Loess
T
9

Looked into this a bit and think that a good route to take here would be to learn to a general markup language like restucturedtext or markdown and then be able to convert from there. Discovered a program called pandoc that can convert either of these to HTML and Mediawiki. Appreciate the help.

Example:

pandoc -f mediawiki -s myfile.mediawiki  -o myfile.html -s
Triple answered 25/2, 2012 at 9:55 Comment(5)
Please don't. Alternative parsers for wikitext are always very fragile, due to how wikitext has (not) been designed.Slemmer
I just tried pandoc as a result of this answer for converting mediawiki to tex and HTML and am very pleased with the results. I can't speak for its fragility but if you're just using the basics like headings, lists etc it looks perfectly fine. It plays nicely with other UNIX commands since it supports stdin/stdout IO which is great for pipes.Varistor
Pandoc does not recognize the full wiki markup. Therefore there will be a lot of articles which cannot be properly parsed. I tried this myself.Blume
@Blume - if you remember, what kind of mediawiki syntax does it fail on? Readers considering using it may not need unsupported features that are catered mainly for Wikipedia-like use cases.Varistor
pandoc fails to translate templates in wikitext.Felicity
E
5

This page lists tons of MediaWiki parsers that you could try.

Engleman answered 18/2, 2012 at 18:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.