Pandoc Markdown to Plain Text Formatting
Asked Answered
C

4

10

There appears to be something amiss with the most recent installed version of Pandoc (pandoc 1.13.2.1) on my machines. With the previously installed version, conversion from markdown to plain text would generate 'Setext-style headers---'=' for H1 and '-' for H2---in plain text output. In addition, I have noticed two more iffy issues:

  • Pandoc now automatically generates uppercase letters for title
  • Pandoc now precedes title with what seems to be two new lines (\n)

I have spent the last few minutes playing around with different pandoc options with little luck.

How do I convert Illustration #1 to Illustration #3

Environment pandoc (pandoc 1.13.2.1) Kubuntu 15.10

Illustration #1: Input markdown file

# Title

## Section
* This is the section.

### Subsection
* This happens to be the subsection

Illustration #2: Output plain text after run pandoc -f markdown -t plain pandoc_markdown_issue.md

TITLE


Section

-   This is the section.

Subsection

-   This happens to be the subsection

Illustration #3: Desired Output

Title
=====

Section
-------
-   This is the section.

Subsection
----------
-   This happens to be the subsection
Causeway answered 7/12, 2015 at 11:31 Comment(0)
F
7

The plain text writer was changed to use the general format of Project Gutenberg plain text books. Of course, no choice will please everyone. For the sample you give, using the markdown writer would work well.

Felicle answered 7/12, 2015 at 17:28 Comment(1)
Thank you so much for this. I see, from the changelog, that this change was effected in 1.13Causeway
M
10

I'm able to achieve your desired output by leaving out the -f and -t flags altogether and letting Pandoc infer the conversion format from the output filename extension:

pandoc file.md -o file.txt

Alternatively, using -t plain also seems to work:

pandoc -f markdown -t plain file.md -o file.txt

Not really sure why the first example works. My guess would be it's one of the markdown readers, since there are multiple.

Mig answered 11/9, 2019 at 22:35 Comment(1)
This doesn't handle markdown links very well.Progress
F
7

The plain text writer was changed to use the general format of Project Gutenberg plain text books. Of course, no choice will please everyone. For the sample you give, using the markdown writer would work well.

Felicle answered 7/12, 2015 at 17:28 Comment(1)
Thank you so much for this. I see, from the changelog, that this change was effected in 1.13Causeway
C
2

It's weird but you can get close to the desired output by exporting to rst reStructuredText since it uses setext-style headings. However you may face other issues, but it was just in case if it could be useful.

$ pandoc pandoc_markdown_issue.md -t rst
Title
=====

Section
-------

-  This is the section.

Subsection
~~~~~~~~~~

-  This happens to be the subsection
Count answered 5/5, 2020 at 20:44 Comment(3)
This doesn't handle markdown links very well.Progress
I think this is close to what I want. I'm not looking for something that is machine-parseable. I just want something that looks nice for human consumption. Not even markdown scores 10/10 for that.Amphibian
For a full demo, see: raw.githubusercontent.com/docutils/docutils/master/docutils/…Amphibian
T
1

Pandoc now automatically generates uppercase letters for title

I had this issue with -t plain turning bold from docx into UPPER, worked around with a litte lua filter. First I did

$ pandoc -t native foo.docx

and saw that the text that was turned UPPER was surrounded in Strong, e.g. [Para [Strong [Str "some text"]]]. Non-bold text was like [Para [Str "moar", Space, Str "text"]]. So the filter becomes:

function Strong(element)
   return element.content
end

I put that in a file weaken.lua and then just

$ pandoc --lua-filter=weaken.lua -f docx -t plain foo.docx -o foo.txt
Touchwood answered 16/12, 2021 at 13:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.