How can I parse Markdown into an AST, manipulate it, and write it back to Markdown?
Asked Answered
C

2

15

I want to modify Markdown files programmatically.

I have been looking into Markdown parsers and tried a few of them; namely Marked, Markdown-it and Commonmark. They give access to an AST, which allows me to modify the content easily.

The problem is that they render to HTML only. I couldn't find any info on rendering back to Markdown.

I see two options right now, either write a custom renderer for one of these libraries (which would be quite time consuming) or use a separate tool that transforms HTML back to Markdown.

Is there an easier alternative? And why would a Markdown parser only render to HTML?

Curtain answered 1/6, 2021 at 23:55 Comment(3)
You sound confused. Markdown is plain text. You should have plenty of options for editing that. Markdown parsers render html because what that's exactly what they do; they take plain text and translate it to html.Ski
@Ski there are real reasons someone might want to operate on an AST instead of the source file, that said... OP should include some more information about their use-case, because converting to html and back does sound insane.Demoss
@Ski He did not sound confused at all. I edited the question to improve the English.Guardado
G
6

The best alternative is what you wanted to do in the first place!

There are many Markdown parsers that produce ASTs, and a good number of those can render it back to Markdown!

And why would a Markdown parser only render to HTML?

The reason a lot of them do is because the number one use of Markdown is as source code for HTML. Markdown was even designed for that in the first place. So the most common use of a Markdown parser, including cases where people want to first manipulate the AST, is to output HTML.

That said, the really good libraries include the option to render to other formats, including back to Markdown.

Here are the libraries that I already know can do this:

Pandoc

Probably the number one Markdown toolkit in the world. Pandoc's native language is Haskell, but there are Javascript wrappers (just search npm). If you're going to do a lot of Markdown stuff down the road, it probably makes sense to become knowledgable in Pandoc anyway.

Its support for filters" is all about AST manipulation. It has special support for Lua and Lua filters, which might be the easiest to code, but you can also write filters in other languages: Python, PHP, Perl, Javascript/Typescript, Groovy, Ruby.

It supports renderer to Markdown, amongst a huge number of other formats.

Its parser and renderer has many other options that might make your job even easier, or maybe already do exactly what you want. There are also many filters people have written that may already do what you want.

CMark

Though this reference implementation of CommonMark is written in C, there are many Node wrappers. There is even a port to JavaScript using Emscripten. It ports the GitHub extensions, so that tables and other GFM things can also be manipulated in the AST.

It can output CommonMark, as well as HTML and LaTeX, or even an XML representation of the AST.

remark

A Javascript-based framework specifically designed around AST manipulation. I've never used it, but it possibly has tools to make AST manipulation easier, though I'm only guessing.

Guardado answered 23/6, 2021 at 13:18 Comment(0)
I
2

I just found mdast-util-from-markdown which seems to do the trick. Then you can convert it back to a string with mdast-util-to-markdown. mdast is basically a markdown syntax tree specification.

Inanimate answered 13/10, 2022 at 23:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.