Converting rtf to pdf using python
Asked Answered
G

3

9

I am new to the python language and I am given a task to convert rtf to pdf using python. I googled and found some code- (not exactly rtf to pdf) but I tried working on it and changed it according to my requirement. But I am not able to solve it.

I have used the below code:

import sys
import os
import comtypes.client
#import win32com.client
rtfFormatPDF = 17

in_file = os.path.abspath(sys.argv[1])
out_file = os.path.abspath(sys.argv[2])

rtf= comtypes.client.CreateObject('Rtf.Application')

rtf.Visible = True
doc = rtf.Documents.Open(in_file)
doc.SaveAs(out_file, FileFormat=rtfFormatPDF)
doc.Close()
rtf.Quit()

But its throwing the below error

Traceback (most recent call last):
  File "C:/Python34/Lib/idlelib/rtf_to_pdf.py", line 12, in <module>
    word = comtypes.client.CreateObject('Rtf.Application')
  File "C:\Python34\lib\site-packages\comtypes\client\__init__.py", line 227, in CreateObject
    clsid = comtypes.GUID.from_progid(progid)
  File "C:\Python34\lib\site-packages\comtypes\GUID.py", line 78, in from_progid
    _CLSIDFromProgID(str(progid), byref(inst))
  File "_ctypes/callproc.c", line 920, in GetResult
OSError: [WinError -2147221005] Invalid class string

Can anyone help me with this? I would really appreciate if someone can find the better and fast way of doing it. I have around 200,000 files to convert.

Anisha

Garrotte answered 14/4, 2015 at 21:21 Comment(10)
Where did you get the information that "Rtf.Application" was a valid com object? I would guess you found some code for converting a Word document to PDF and just replaced "Word.Application" by "Rtf.Application".Retral
yes. That is true! tried finding a replacement for this, but no luck!Garrotte
Do you require a python solution or just a solution for your 200,000 files? If python is not a requirement, try LibreOffice: libreoffice --headless -convert-to pdf filename.rtfTestimonial
@Retral so that makes a point, what if the ProgID were set back to "Word.Application", think it would work?Stramonium
Well, python is not mandatory, I can try using LibreOffice. So this means there is no solution in Python?Garrotte
@MarkRansom Yep, just tried it. Works like a charm if you change the com object back to "Word.Application" to let Word handle the conversion. It can open RTFs without problems. Also, OP refers to the same variable once as rtfFormatPDF and once as wdFormatPDF (not sure why) so that would have to be changed as well.Retral
Thanks @Retral !! That was a typo. Sorry! I will try working with "Word.Application" and see how it goes with Rtfs'.Garrotte
It worked! thanks! Will add the working code!Garrotte
Please don't edit your working code into the question. We like to keep questions and answers separate. Please edit it into the answer you provided below instead.Ruffo
ok. changed it! ThanksGarrotte
G
9

I used Marks's advice and changed it back to Word.Application and my source pointing to rtf files. Works perfectly! - the process was slow but still faster than the JAVA application which my team was using. I have attached the final code in my question.

Final Code: Got it done using the code which works with Word application :

import sys
import os,os.path
import comtypes.client

wdFormatPDF = 17

input_dir = 'input directory'
output_dir = 'output directory'

for subdir, dirs, files in os.walk(input_dir):
    for file in files:
        in_file = os.path.join(subdir, file)
        output_file = file.split('.')[0]
        out_file = output_dir+output_file+'.pdf'
        word = comtypes.client.CreateObject('Word.Application')

        doc = word.Documents.Open(in_file)
        doc.SaveAs(out_file, FileFormat=wdFormatPDF)
        doc.Close()
        word.Quit()
Garrotte answered 10/11, 2016 at 16:32 Comment(5)
os.path.splitext can be better choice for getting file name without extensionWaldheim
does it required to install MS word to use this.Heft
@RohitGupta Yes, it doesKnockabout
For processing many files I found it was quicker to init word outside the for loops, then .Close() each doc and open the next one with the same word process, finally .Quit() after loops.Knockabout
make sure to remove space between files nameMcclimans
D
1

If you have Libre Office in your system, you got the best solution.

import os
os.system('soffice --headless --convert-to pdf filename.rtf')
# os.system('libreoffice --headless -convert-to pdf filename.rtf')
# os.system('libreoffice6.3 --headless -convert-to pdf filename.rtf')

Commands may vary to different versions and platforms. But this would be the best solution ever I had.

Deanadeanda answered 25/11, 2020 at 15:51 Comment(0)
C
-1

Probably the easiest way is a simple command line to convert an rtf file to pdf format (if you have libreoffice installed) which you can also easily use in a Python script (using "os.system(...)" example or "subprocess.getoutput(...)" option. The simple command line in Linux is:

libreoffice --headless --convert-to pdf filename.rtf

Cutlery answered 5/1 at 19:22 Comment(1)
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.Ocd

© 2022 - 2024 — McMap. All rights reserved.