Reading/Writing MS Word files in Python
Asked Answered
C

4

20

Is it possible to read and write Word (2003 and 2007) files in Python without using a COM object?
I know that I can:

f = open('c:\file.doc', "w")
f.write(text)
f.close()

but Word will read it as an HTML file not a native .doc file.

Coatee answered 9/10, 2008 at 18:6 Comment(0)
T
7

I'd look into IronPython which intrinsically has access to windows/office APIs because it runs on .NET runtime.

Tocantins answered 9/10, 2008 at 18:39 Comment(0)
M
38

See python-docx, its official documentation is available here.

This has worked very well for me.

Note that this does not work for .doc files, only .docx files.

Mice answered 21/10, 2011 at 10:45 Comment(4)
But it is supported .doc format, i tried but it throws me a ValueError ValueError: file '<open file 'file.doc', mode 'r' at 0x7f29a1b5a6f0>' is not a Word file, content type is 'application/vnd.openxmlformats-officedocument.themeManager+xmlWollongong
It is called python-docx not python-doc, so no. :)Mice
@Mice but the question is also about .doc files, so you should note that your answer only applies to .docx files.Tude
Any idea for opening .doc files then?Middlebrow
F
11

If you only what to read, it is simplest to use the linux soffice command to convert it to text, and then load the text into python:

Fetishist answered 8/5, 2015 at 11:11 Comment(1)
+1 I dont know why this gets a negative voting. This is sometimes the only solution and sometimes it's enough.Lustral
T
7

I'd look into IronPython which intrinsically has access to windows/office APIs because it runs on .NET runtime.

Tocantins answered 9/10, 2008 at 18:39 Comment(0)
S
3

doc (Word 2003 in this case) and docx (Word 2007) are different formats, where the latter is usually just an archive of xml and image files. I would imagine that it is very possible to write to docx files by manipulating the contents of those xml files. However I don't see how you could read and write to a doc file without some type of COM component interface.

Sikang answered 9/10, 2008 at 18:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.