How to read .rtf file and convert into python3 strings and can be stored in python3 list?
Asked Answered
D

4

9

I am having a .rtf file and I want to read the file and store strings into list using python3 by using any package but it should be compatible with both Windows and Linux.

I have tried striprtf but read_rtf is not working.

from striprtf.striprtf import rtf_to_text
from striprtf.striprtf import read_rtf
rtf = read_rtf("file.rtf")
text = rtf_to_text(rtf)
print(text)

But in this code, the error is: cannot import name 'read_rtf'

Please can anyone suggest any way to get strings from .rtf file in python3?

Dipterocarpaceous answered 28/3, 2020 at 4:49 Comment(4)
why don't you use file handling to read text from rtf ?Haigh
I am not getting it can you please give some rough code?Dipterocarpaceous
like this with open("file.rtf") as f: print(f.read())Haigh
Thank you @Haigh It worked like a charm.Dipterocarpaceous
H
8

Have you tried this?

with open('yourfile.rtf', 'r') as file:
    text = file.read()
print(text)

For a super large file, try this:

with open("yourfile.rtf") as infile:
    for line in infile:
        do_something_with(line)
Hydrofoil answered 28/3, 2020 at 6:47 Comment(0)
P
8

Using rtf_to_text is enough to convert RTFinto a string in Python. Read the content from a RTFfile and then feed it to the rtf_to_text:

from striprtf.striprtf import rtf_to_text

with open("yourfile.rtf") as infile:
    content = infile.read()
    text = rtf_to_text(content)
print(text)
Phosphatize answered 20/12, 2021 at 15:51 Comment(0)
E
3

Try using this:

from striprtf.striprtf import rtf_to_text

sample_text = "any text as a string you want"
text = rtf_to_text(sample_text)
Eldwen answered 4/2, 2021 at 11:44 Comment(0)
T
1

Reading RTF file and manipulating the data inside that is tricky, it is depending upon the file you have, Hence I have tried all the above nothing worked, finally, the following code worked for me. Hope it will help those who are hunting for the solution.

from win32com.client import Dispatch
 
word = Dispatch('Word.Application') # Open word application
 # word = DispatchEx('Word.Application') # start a separate process
word.Visible = 0 # Run in the background, no display
word.DisplayAlerts = 0 # No warning
 
path = r'C:\Projects\10.1\power.rtf' 
doc = word.Documents.Open(FileName=path, Encoding='gbk')
 
for para in doc.paragraphs:
    print(para.Range.Text)
 
doc.Close()
word.Quit()

If you want to store in a single variable, the following code will solve the problem.

from win32com.client import Dispatch
 
word = Dispatch('Word.Application') # Open word application
 # word = DispatchEx('Word.Application') # start a separate process
word.Visible = 0 # Run in the background, no display
word.DisplayAlerts = 0 # No warning
 
path = r'C:\Projects\10.1\output_5.rtf' # Write absolute path, relative path will dial wrong
doc = word.Documents.Open(FileName=path, Encoding='gbk')

#for para in doc.paragraphs:
#    print(para.Range.Text)


content = '\n'.join([para.Range.Text for para in doc.paragraphs])

print(content)

doc.Close()
word.Quit()
Thrum answered 1/4, 2021 at 8:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.