Convert rich MarkDown to plain text
Asked Answered
D

2

13

How to convert rich Markdown into just plain text? So it can be used i.e. for a Facebook OpenGraph description.

I'm using MarkdownSharp, and it doesn't seem to have this functionality. Before I'm going to reinvent the wheel I thought of asking here first.

Any hints about an implementation strategy are greatly appreciated!

Example

The Monorailcat
---------------
![Picture of a Lolcat](https://media1.giphy.com/media/c7goDcMPKjw6A/200_s.gif)
One of the earliest pictures of **monorail cat** found is from the website [catmas.com’s blog][1] section, dated from November 2, 2006. 
[1]: http://catmas.com/blog

Should be converted to:

The Monorailcat
One of the earliest pictures of monorail cat found is from the website catmas.com’s blog section, dated from November 2, 2006.
Dedradedric answered 26/12, 2015 at 20:7 Comment(6)
Do you need to implement this yourself? There must be several converters available, worst case markdown-to-html and html-to-text.Gobbet
I'd rather not @MiserableVariable :) I also thought about the two step approach, but it sounds like a lot of overhead - especially because I want to generate the result per pageview, and not cache it (yet) in the database.Dedradedric
Did you look for any direct converters? I am sure they exist, though I haven't checked myself.Gobbet
I tried looking for them, but haven't found them for C#.Dedradedric
Shouldn't the alt text be in the plain text output?Maccabees
Hi Jon, preferably not in this case. You might be able to argue about that, but I think that discussion might distract from the question :)Dedradedric
W
9

You have a few possibilities.

  1. As stated in a comment, you can convert to HTML, then convert the HTML to plain text. This is probably the most reliable and consistent solution cross-platform.

  2. Switch to a library that can convert between multiple formats, including the formats you desire. Pandoc would be an example of such a tool.

  3. Use a Markdown parser which outputs an AST. While such parsers usually provide an HTML renderer (accepts AST as input and outputs HTML), you can create your own renderer which outputs whatever format you want.

Actually, it turns out that Pandoc is also an example of #3. It just happens to already have an existing plain text renderer. Of course, if you are looking for a C# lib, then Pandoc may not meet your needs. And I'm not aware of any C# libs which meet that need (the reference implementation uses regex string substitution and many (most?) parsers have followed that example). That said, I'm not familiar with any of the Markdown libs in C# and this is not an appropriate place to make recommendations. However, there is a lengthy, albeit incomplete, list of parsers here. You may find something of use there.

Waterish answered 28/12, 2015 at 1:53 Comment(1)
I hoped for a more ready-made solution, but I think at this point this is the best solution! Thanks for your answer :)Dedradedric
G
3

Some libraries exist that help you to remove markdown syntax, such as removemarkdown or strip-markdown.

Gyronny answered 7/8, 2019 at 14:50 Comment(1)
While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - From ReviewFormularize

© 2022 - 2024 — McMap. All rights reserved.