html to .doc converter in Python?
Asked Answered
S

4

15

I am using pisa, which is an HTML to PDF conversion library for Python.

Does there exist the same thing for a Word document: an HTML to .doc conversion library for Python?

Shockheaded answered 19/11, 2010 at 14:48 Comment(4)
Why would you want this? MS Word can read HTML.Pyxidium
I have the same problem: I have a html that uses pisa to convert to pdf and I want to do the same thing with word. its a big document, ~20 pages, using the same piece of code to generate the html and then export thru pisa or something else would be great.Glace
@Eric: Recently, I had the same problem. Just wondering, did you find a solution to convert HTML to Word .docx? Thanks.Sherborn
@tao.hong : Did you manage to solve your problem? I am looking for a suitable open source solution too. ThanksLeta
R
13

You could use win32com from the pywin32 python extensions for windows, to let MS Word convert it for you. A simple example:

import win32com.client

word = win32com.client.Dispatch('Word.Application')

doc = word.Documents.Add('example.html')
doc.SaveAs('example.doc', FileFormat=0)
doc.Close()

word.Quit()
Room answered 19/11, 2010 at 16:26 Comment(0)
L
5

Though I am not aware of a direct module that can allow you to convert this, however:

  1. You can convert HTML to plain text first using the html2text module.
  2. After that, you can use this the python-docx module to convert the text to a doc or a docx file.
Leomaleon answered 19/11, 2010 at 15:12 Comment(0)
R
2

In case anybody else lands here attempting to convert the other way around, the above code works, but you need to modify the FileFormat value.

http://msdn.microsoft.com/en-us/library/ff839952.aspx

Example: Filtered html is 10, instead of 0.

Rodl answered 25/5, 2012 at 14:8 Comment(0)
H
-1

Update with a python3.x fix this:

from htmldocx import HtmlToDocx

new_parser = HtmlToDocx()
new_parser.parse_html_file("html_filename", "docx_filename")
#Files extensions not needed, but tolerated
Huppah answered 11/1, 2021 at 23:32 Comment(1)
any thoughts on how you'd use this if you're converting an HTML string rather than having to save an HTML file first?Rope

© 2022 - 2024 — McMap. All rights reserved.