Convert Markdown links to HTML with Pandoc
Asked Answered
H

6

36

In my new project I have multiple Markdown files which are linked to each other. These links refer to the original .md files.

Example:

File README.md

...
1. [Development documentation](Development.md)
1. [User documentation](Usage.md)
...

If I convert these files with Pandoc, e.g., to HTML files, all links are still pointing to the original .md file. I'm looking for a way to also convert the link type, which means that output files should refer to the output file type such as HTML, PDF, TeX, etc. Is there a way to convert the internal link type with Pandoc?

I use this to convert the files:

pandoc -f markdown -t html5 input.md -o output.html
Hollah answered 6/12, 2016 at 10:48 Comment(0)
D
15

You can create a filter that checks every link element and—if the URL ends with .md—replaces it with .html.

Example with Python, using the panflute package:

import panflute as pf

def action(elem, doc):
    if isinstance(elem, pf.Link) and elem.url.endswith('.md'):
        elem.url = elem.url[:-3] + '.html'
        return elem

if __name__ == '__main__':
    pf.run_filter(action)
Deanadeanda answered 6/12, 2016 at 21:56 Comment(2)
Thanks, that's great! I am just wondering if there's no simple way by using Pandoc options?Hollah
@Hollah no, there is no out-of-the-box way to do that. Markdown assumes that your links already point to the rendered HTML documents and does not alter your URLs. In fact, any altering of URLs would be a bug. Of course, using a custom-built plugin (like the one here) is possible with various Markdown parsers. But it has to be custom because only you know your specific needs and no single solution could possibly meet the needs of most (let alone all) users.Nichellenichol
T
40

Example with the built-in Lua filters:

-- links-to-html.lua
function Link(el)
  el.target = string.gsub(el.target, "%.md", ".html")
  return el
end

Then:

pandoc -f markdown -t html5 input.md -o output.html --lua-filter=links-to-html.lua
Turpentine answered 21/3, 2018 at 0:1 Comment(2)
Note that # is not a valid Lua comment. Use -- if you want to keep the first line.Undervest
I would expect this to preserver links and only change the href; but the given command also converts the "a" tags into "embed" tags, which is somehow worse? I wonder what I could be doing wroing?Basketwork
D
15

You can create a filter that checks every link element and—if the URL ends with .md—replaces it with .html.

Example with Python, using the panflute package:

import panflute as pf

def action(elem, doc):
    if isinstance(elem, pf.Link) and elem.url.endswith('.md'):
        elem.url = elem.url[:-3] + '.html'
        return elem

if __name__ == '__main__':
    pf.run_filter(action)
Deanadeanda answered 6/12, 2016 at 21:56 Comment(2)
Thanks, that's great! I am just wondering if there's no simple way by using Pandoc options?Hollah
@Hollah no, there is no out-of-the-box way to do that. Markdown assumes that your links already point to the rendered HTML documents and does not alter your URLs. In fact, any altering of URLs would be a bug. Of course, using a custom-built plugin (like the one here) is possible with various Markdown parsers. But it has to be custom because only you know your specific needs and no single solution could possibly meet the needs of most (let alone all) users.Nichellenichol
F
3

A slight modification to Sergio Correia's answer also catches anchor links in documents. Take care; in some rare cases this might garble links...

import panflute as pf

def action(elem, doc):
    if isinstance(elem, pf.Link):
        if elem.url.endswith('.md'):
            elem.url = elem.url[:-3] + '.html'
            return elem
        elif elem.url.find('.md#'):
            elem.url = elem.url.replace('.md#', '.html#')
            return elem

if __name__ == '__main__':
    pf.run_filter(action)
Faulk answered 11/8, 2021 at 15:52 Comment(0)
B
2

Assuming you are going to serve you HTML pages via a web server, it is relatively simple to resolve all *.md URLs as *.html ones instead of rewriting them via Pandoc, e.g., using NGinx:

location ~ \.md$ {
  if (!-f $request_filename) {
    rewrite ^(.*)\.md$ $1 permanent;
  }
}

location / {
  try_files /$uri /$uri.html;
}

Alternatively, you can replace all md links with html using sed (taken from here):

Change all internal file URLs from pointing to *.md links and instead point to the local *.html file

  1. recursively run this sed command (programmatically replace FILENAME)

    sed -n -i.bak '/href="\./s/\.md/\.html/' FILENAME.html
    
  2. alternatively, run the following command instead (programmatically replace FILENAME)

    sed -e '/href="\./s/\.md/\.html/' FILENAME.html > FILENAME.html.tmp && mv FILENAME.html.tmp FILENAME.html`
    
Benefactor answered 12/2, 2017 at 19:51 Comment(1)
I would not recommend this approach, because there could be link in code sections which you would not want to convert, and similar other complications you might not think of at the moment, and which you automatically circumvent if you use an other tool for parsing, like pandoc.Kaleena
P
1

I had a similar problem, so I made md_htmldoc.

It finds all of the .md files in a directory and then makes a separate directory where all the Markdown files has been converted to HTML.

It fixes hyperlinks (thanks to Sergio Correia's answer).

It also gathers up any local file references so that links to images and such still work.

Perjury answered 19/7, 2017 at 6:39 Comment(3)
I saw in your code (get_references.py), that you use regex to find links in markdown. I would not recommend this approach, because there might be links in code sections which you would not want to convert, and similar other complications you might not think of at the moment. you could automatically circumvent this, if you always use an other tool for parsing, like pandoc (as you also do per panflute).Kaleena
@hoijui: get_references.py isn't used to convert something - it's just used to find references. Every found reference gets checked if it refers to a local file - if yes, then it gets added to doc_relevant, which then is used to (1) either compile a markdown to html or (2) copy that file to HTML_DIRFaulk
ok, thanks :-) still, you might miss references this way, for the reasons mentioned above. Only building an AST can do it properly (which is what pandoc does, for example). using filters and AST(raw?) output with pandoc, it is quite easy to do.Kaleena
S
1

For anyone using a Makefile to drive conversion, here is a Makefile fragment that provides a rule transforming a .md into a .html with link adjusted:

SHELL=/bin/bash

%.html: %.md
    ( set -eu -o pipefail ; \
    pandoc -i $< -t html | \
    sed -E 's/<a href="([^"]*).md/<a href="\1.html/g' > [email protected] && mv -vf [email protected] $@ ; )

If test.md exists in current directory, make test.html will do it.

The rule also takes care of not clobbering an existing HTML file (whatever the reason) until the conversion actually succeeds.

Suiting answered 15/3, 2021 at 22:4 Comment(2)
Notice that the rule does not try to be smart in case of twisted links, like [foo](bar.md.somethingelse).Windup
Elegant, simple and very effective. sed is almost magical for all it can do for you!Probability

© 2022 - 2024 — McMap. All rights reserved.