How do you convert an RTF string to plain text in Java? The obvious answer is to use Swing's RTFEditorKit, and that seems to be the common answer around the Internet. However the write method that claims to return plain text isn't actually implemented... it's hard-coded to just throw an IOException in Java6.
RTF to Plain Text in Java
I use Swing's RTFEditorKit in Java 6 like this:
RTFEditorKit rtfParser = new RTFEditorKit();
Document document = rtfParser.createDefaultDocument();
rtfParser.read(new ByteArrayInputStream(rtfBytes), document, 0);
String text = document.getText(0, document.getLength());
and thats working.
Got it working using this solution, though it didn't work at first either. It turns out that my input data was invalid, and the conversion was failing silently and returning an empty string. –
Coarsegrained
It works for me, but for some reason the text comes out with dropped characters. –
Frill
It works fine in WIndows platform, but for *nix platform, it has dependecy with X11 window server. –
Kryska
it doesn't work with "\line" code –
Visit
Try Apache Tika: http://tika.apache.org/0.9/formats.html#Rich_Text_Format
Tika use 'RTFEditorKit' on the backend –
Weirick
Tika is just for plain text and metadata, am I right? –
Consentaneous
You might consider RTF Parser Kit as a lightweight alternative to the Swing RTFEditorKit. The line below shows plain text extraction from an RTF file. The RTF file is read from the input stream, the extracted text is written to the output stream.
new StreamTextConverter().convert(new RtfStreamSource(inputStream), outputStream, "UTF-8");
(full disclosure: I'm the author of RTF Parser Kit)
Nice job! But "Kotlin: Unresolved reference: MyRtfListener" –
Visit
Does it mean that I must implement IRtfListener myself? –
Visit
@CatherineIvanova the one line example above will extract plain text for you... no need to implement the listener. I think your reference to
MyRtfListener
comes from the RTF Parser Kit readme which does illustrate the case where you'd provide your own listener. –
Harder @JonIles thanks for the parser project! Can you see also this question #74112241 ? –
Kudva
Here is the full code to parse & write RTF as a plain text
import java.io.FileInputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import javax.swing.text.BadLocationException;
import javax.swing.text.Document;
import javax.swing.text.rtf.RTFEditorKit;
public class rtfToJson {
public static void main(String[] args)throws IOException, BadLocationException {
// TODO Auto-generated method stub
RTFEditorKit rtf = new RTFEditorKit();
Document doc = rtf.createDefaultDocument();
FileInputStream fis = new FileInputStream("C:\\SampleINCData.rtf");
InputStreamReader i =new InputStreamReader(fis,"UTF-8");
rtf.read(i,doc,0);
// System.out.println(doc.getText(0,doc.getLength()));
String doc1 = doc.getText(0,doc.getLength());
try{
FileWriter fw=new FileWriter("B:\\Sample INC Data.txt");
fw.write(doc1);
fw.close();
}catch(Exception e)
{
System.out.println(e);
}
System.out.println("Success...");
}
}
© 2022 - 2024 — McMap. All rights reserved.