How to convert RTF to Markdown on the UNIX/OSX command line similar to pandoc
Asked Answered
O

3

24

How do I convert RTF (say from stdin) to Markdown with a command line tool under UNIX/OSX.

I am looking for something like pandoc. However pandoc itself does not allow RTF as an input format. :-( So, I'd be happy either with a similar tool to pandoc or a pointer to an external RTF reader for pandoc.

Ochre answered 26/5, 2015 at 1:44 Comment(0)
O
22

On Mac OSX I can use the pre-installed textutil command for the RTF-to-HTML conversion, then convert via pandoc to markdown. So a command line which takes RTF from stdin and writes markdown to stdout looks like this:

textutil -stdin -convert html  -stdout | pandoc --from=html --to=markdown
Ochre answered 27/5, 2015 at 3:49 Comment(7)
This works terribly in my experience. textutil preserves none of my formatting and links, and the HTML is littered with useless classes.Blather
@zool You can avoid (or at least significantly minimise the "class litter" by switching off some Pandoc extensions. I switch off native_divs, native_spans, fenced_divs, header_attributes, auto_identifiers, inline_code_attributes, link_attributes and raw_attribute. HTH, LeoOchre
I tried this script. The links in the clipboard are all stripped off.Agate
@Agate The current version of pandoc seems to support RTF as an input format. Maybe try that. It should be better at preserving links. If it works, post it as an answer here please.Ochre
I think the problem is not with pandoc but with textutil. I've found a script that works (with minor changes). if encoded=`osascript -e 'the clipboard as «class HTML»'` 2>/dev/null; \ then echo $encoded \ | perl -ne 'print chr foreach unpack("C*",pack("H*",substr($_,11,-3)))' \ | pandoc --wrap=none -f HTML -t markdown; else pbpaste; fi. To be host, I don't understand the code. Maybe <<class HTML>> makes a difference. I changed to RTF and the links are tripped off.Agate
@Agate Sorry, I wasn't clear: I'meant you could try using pandoc instead of textutil to read the RTF file. Anyway, I'm glad you found another solution.Ochre
@halloleo. Thanks. I did try pandoc before. The problem is I need to convert the text from the clipboard. padoc will simply strip off the links.Agate
L
6

Using Ted and pandoc together, you should be able to do this:

Ted --saveTo text.rtf text.html
pandoc --from=html --to=markdown --out=text.md < text.html
Libelous answered 26/5, 2015 at 3:13 Comment(9)
Converting rtl to html can also easily be done with Apple's command textutil (see man textutil) And have a look at (#1044268)Stipendiary
@HeinrichGiesen Ups, didn't see your comment! Yes, that's what I found out as well: On OSX textutil is the way to go!Ochre
That sounds like the best answer for OS X; your question said you were looking for a cross platform solution so I didn't consider it. Glad you figured something out.Libelous
Ted 2.23 deb pkg is not installable on Debian 8.11, not even by dpkg command.Coir
@Coir use the sourceLibelous
Also no need to downvote a perfectly good answer just because you can't figure out how to get software installed.Libelous
even if I compile from source, on my debian machine, the dependency does not satisfy ... I should be more explicit in my previous comment. :-(Coir
@Libelous I noticed that my linux apt source.list had some issues, after fixing that I will try Ted again. I should not downvote your answer while being lazy to double-check. :-)Coir
After I fixed the apt source.list, still get the following error, the dependency problem still persists... E: Package 'libjpeg8' has no installation candidate E: Package 'libtiff4' has no installation candidateCoir
C
6

Pandoc now supports RTF as an input format, so you can use:

cat file.rtf | pandoc --from=rtf --to=markdown
Cursive answered 8/2, 2022 at 22:46 Comment(1)
Thanks! That works like a charm. Just for completeness sake - to save the output to a file use cat file.rtf | pandoc --from=rtf --to=markdown --out=file.md.Sculpt

© 2022 - 2024 — McMap. All rights reserved.