Is it possible to read and write Word (2003 and 2007) files in Python without using a COM object?
I know that I can:
f = open('c:\file.doc', "w")
f.write(text)
f.close()
but Word will read it as an HTML file not a native .doc file.
Is it possible to read and write Word (2003 and 2007) files in Python without using a COM object?
I know that I can:
f = open('c:\file.doc', "w")
f.write(text)
f.close()
but Word will read it as an HTML file not a native .doc file.
I'd look into IronPython which intrinsically has access to windows/office APIs because it runs on .NET runtime.
See python-docx, its official documentation is available here.
This has worked very well for me.
Note that this does not work for .doc files, only .docx files.
If you only what to read, it is simplest to use the linux soffice command to convert it to text, and then load the text into python:
I'd look into IronPython which intrinsically has access to windows/office APIs because it runs on .NET runtime.
doc (Word 2003 in this case) and docx (Word 2007) are different formats, where the latter is usually just an archive of xml and image files. I would imagine that it is very possible to write to docx files by manipulating the contents of those xml files. However I don't see how you could read and write to a doc file without some type of COM component interface.
© 2022 - 2024 — McMap. All rights reserved.
ValueError: file '<open file 'file.doc', mode 'r' at 0x7f29a1b5a6f0>' is not a Word file, content type is 'application/vnd.openxmlformats-officedocument.themeManager+xml
– Wollongong