Converting HTML markup to a RTF document
Asked Answered
T

1

3

I have an XML document containing embedded HTML content that I am attempting to convert to an RTF output file. I have the XML elements decorated with <li>, <p>, <b> and other HTML markup, that I would like to have transferred into the generated RTF.

Here is what works as of now:

  1. Fetch XML tag content as string (containing HTML tags for line breaks, paragraph breaks, and lists)
  2. Write the XML tag content to an RTF file.

I am using Python scripts to achieve the conversion. Also being used is ElementTree (to parse input XML) PyRTF-NG (to convert from HTML to RTF), a library that handles tables and other special formatting. At the moment, I have managed to get everything I need except the 'markdown' of the HTML (i.e. translating HTML format tags into actual RTF formatting). To clarify, I mean that if my RTF convertor encounters an <ol><li> tag, it should create an ordered list in the RTF, instead of just spitting out <ol><li> tags into the RTF.

Does anyone know if Python has any native calls that will allow me to do this, or any other Python libraries that might have what I need to complete the full-conversion into RTF.

Thanks!

Trochee answered 3/3, 2014 at 15:29 Comment(0)
O
3

The best free conversor is the LibreOffice, and it can be used directly by command line at termimal, see

libreoffice --convert-to

The same conversor is indirectally called by Python using UNO bridge,

Obediah answered 29/10, 2014 at 12:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.